Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

Jun Li1,2, Su Hwan Kim4, Philip Müller1, Lina Felsner1,
Daniel Rueckert1,2,4,5, Benedikt Wiestler1, 4, Julia A. Schnabel*1,2,3,6, Cosmin I. Bercea*1,3

*Indicates Equal Contribution
1 Technical University of Munich, Germany 2 Munich Center for Machine Learning, Germany 3 Helmholtz AI & Helmholtz Center Munich, Germany
4 Klinikum Rechts der Isar, Munich, Germany 5 Imperial College London, London, UK 6 King’s College London, London, UK
Teaser GIF

Our framework is designed to process questions in conjunction with results from anomaly detection methods aiming to provide clinicians with clear, interpretable responses that render anomaly map analyses more intuitive and clinically actionable.

Abstract

This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model's generalizability to previously unseen medical conditions.

Methods

In this paper, we developed, to the best of our knowledge, the first multi-image question answering benchmark based on unsupervised anomaly detection. Our framework designs different feature fusion strategies for combing the anomaly map, original image, and PH reconstruction. Besides, inspired by the Querying Transformer (Q-Former), we propose a Knowledge Q-Former (KQ-Former) module to assist the framework in extracting visual features related to textual knowledge. Extensive experiments have been conducted to verify the effectiveness of the framework and proposed KQ-Former module. Additionally, we explore the influence of the anomaly map for the framework in facing unknown anomalies.

Dataset compression scale

An overview of our novel framework for VQA-UAD: (a) the multi-image VQA baseline; (b) multi-image feature fusion strategies; (c) the KQ-former module.

Dataset

We collected 440 T1 weighted MRI 2D mid-axial brain images from the fastMRI dataset, containing healthy (N=253) and unhealthy (N=187) data. The dataset features 8 distinct types of anomalies for the primary experiment and an additional 17 cases across 6 different pathologies to assess the model's generalization capabilities to unseen pathological conditions.

Dataset compression scale

Overview of the anomaly dataset. Left: category distribution of anomalies. Right: definition of closed and open questions. For closed questions: blue text shows the answer type, number in parentheses denotes the count of answer types.
Dataset compression scale

Category distribution of unseen anomalies. These unseen anomalies are dural thickening, white matter lesion, sinus opacification, encephalomalacia, intraventricular substance, and absent septum pellucidum.

Results

In the main paper, we have evaluated our methods from three aspects: (1) Language Models Enhance the Explainability of Anomaly Maps. (2) Anomaly Maps Improve Generalizability of Language Models.

We also utiliz Natural Language Inference (NLI) models to evaluate the entailment between ground truth and predicted sentence.

Dataset compression scale

GUI Interface

BibTeX

@misc{li2024multiimage,
        title={Multi-Image Visual Question Answering for Unsupervised Anomaly Detection}, 
        author={Jun Li and Cosmin I. Bercea and Philip Müller and Lina Felsner and Su Hwan Kim and Daniel Rueckert and Benedikt Wiestler and Julia A. Schnabel},
        year={2024},
        eprint={2404.07622},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }