Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance

Jun Li*1,2, Tongkun Su*2, Baoliang Zhao2, Faqin Lv3, Qiong Wang2,
Nassir Navab1, Ying Hu2, Zhongliang Jiang1
*Indicates Equal Contribution
1 Technical University of Munich 2 Shenzhen Institute of Advanced Technology, the Chinese Academy of Sciences 3 The Third Medical Centre of Chinese PLA General Hospital
Teaser GIF

Our framework is designed to automatically generate ultrasound reports from the images provided by the doctors, facilitating accurate diagnosis and treatment.

Abstract

Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets.

Chinese Ultrasound Report Dataset

We have collected three large separate ultrasound image text datasets, covering breast, thyroid, and liver. Specifically, the breast dataset includes 3521 patients, the thyroid dataset includes 2474 patients and the liver dataset includes 1395 patients.

Dataset compression scale

Overview of our datasets.

Dataset compression scale

Word cloud maps on different ultrasound datasets.
Dataset compression scale

Age and gender distribution of our collected ultrasound datasets from three organs.

Methods

This work is a significant extension of our previous conference paper and offers several key contributions.

  • First, we optimized each step in the Knowledge Distiller within our framework to better suit the task of ultrasound report generation, resulting in highly competitive results.
  • Secondly, we validated our method on three large-scale ultrasound report datasets of different organs, showcasing its generalizability.
  • Thirdly, we conducted a comprehensive comparison with the current state-of-the-art methods in each dataset, showing the superior performance of our framework.
  • Lastly, we conducted a thorough discussion of our experimental results, highlighting the strengths and limitations of our proposed method.

Results

(1) Heatmap of Clustering Results with different Dimensionality Reduction and Cluster Numbers. (2) Performance Comparison from three ultrasound datasets. (3) Ablation studies from three ultrasound datasets.