Top: Static inference with frozen LVLMs suffers from fragility and hallucinations on rare diseases, exhibiting high sensitivity to prompt and visual variations.
Bottom: DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success, achieving up to a 105% improvement in localization reliability.
| Model Size | Method | mAP@25 | mAP@50 | mAP@75 |
|---|---|---|---|---|
| 3B | Vanilla | 0.221 | 0.088 | 0.040 |
| DDL (Ours) | 0.298 (+35%) | 0.150 (+70%) | 0.066 (+66%) | |
| 7B | Vanilla | 0.286 | 0.135 | 0.036 |
| DDL (Ours) | 0.369 (+29%) | 0.206 (+52%) | 0.075 (+105%) | |
| 32B | Vanilla | 0.406 | 0.208 | 0.058 |
| DDL (Ours) | 0.454 (+12%) | 0.266 (+28%) | 0.096 (+66%) | |
| 72B | Vanilla | 0.411 | 0.245 | 0.065 |
| DDL (Ours) | 0.500 (+22%) | 0.301 (+23%) | 0.107 (+65%) |
* DDL achieves superior scaling, even enabling the 32B model to surpass the 72B Zero-Shot baseline.
@inproceedings{li2026dynamic,
title={Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
author={Jun Li and Mingxuan Liu and Jiazhen Pan and Che Liu and Wenjia Bai and Cosmin I. Bercea and Julia A. Schnabel},
booktitle={arXiv},
year={2026},
}