Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases

Dynamic Decision Learning:
Test-Time Evolution for Abnormality Grounding in Rare Diseases

¹ Technical University of Munich ² Munich Center for Machine Learning ³ University of Trento

⁴ Imperial College London ⁵ Helmholtz Munich ⁶ King's College London

Overview

Large vision-language models struggle with clinical abnormality grounding on rare diseases and long-tailed distributions. Severe data scarcity and distribution shifts make fine-tuning impractical, while single-pass inference is highly sensitive to prompt phrasing and visual perturbations. We propose Dynamic Decisions Learning (DDL), a novel framework that optimizes instruction prompts and consolidates predictions across augmented views using structured matching. Across two brain imaging benchmarks with 281 rare pathologies, DDL yields up to 105% improvement in localization and consistently outperforms other methods.

Top: Static inference with frozen LVLMs suffers from fragility and hallucinations on rare diseases, exhibiting high sensitivity to prompt and visual variations.

Bottom: DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success, achieving up to a 105% improvement in localization reliability.

Method

Overview of the DDL framework. DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success. It optimizes instruction prompts and consolidates predictions across multiple augmented views using structured matching, achieving significant improvements in localization reliability without any additional training.

Experimental Results

Visualization: 3B to 72B Model Scale

Qwen2.5-VL-3B Grounding Performance

Qwen2.5-VL-7B Grounding Performance

Qwen2.5-VL-32B Grounding Performance

Qwen2.5-VL-72B Grounding Performance

Scaling Performance on NOVA (Rare Diseases)

Model Size	Method	mAP@25	mAP@50	mAP@75
3B	Vanilla	0.221	0.088	0.040
	DDL (Ours)	0.298 (+35%)	0.150 (+70%)	0.066 (+66%)
7B	Vanilla	0.286	0.135	0.036
	DDL (Ours)	0.369 (+29%)	0.206 (+52%)	0.075 (+105%)
32B	Vanilla	0.406	0.208	0.058
	DDL (Ours)	0.454 (+12%)	0.266 (+28%)	0.096 (+66%)
72B	Vanilla	0.411	0.245	0.065
	DDL (Ours)	0.500 (+22%)	0.301 (+23%)	0.107 (+65%)

* DDL achieves superior scaling, even enabling the 32B model to surpass the 72B Zero-Shot baseline.

Citation

@inproceedings{li2026dynamic,
  title={Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases}, 
  author={Jun Li and Mingxuan Liu and Jiazhen Pan and Che Liu and Wenjia Bai and Cosmin I. Bercea and Julia A. Schnabel},
  booktitle={arXiv},
  year={2026},
}