Anatomical Structure-Guided Medical Vision-Language Pre-training

Qingqiu Li¹, Xiaohan Yan², Jilan Xu¹, Runtian Yuan¹, Yuejie Zhang¹,
Rui Feng¹, Quanli Shen³, Xiaobo Zhang³, and Shujun Wang^4,5

¹Fudan University ²Tongji University ³Children's Hospital of Fudan University
⁴The Hong Kong Polytechnic University ⁵Research Institute for Smart Ageing

Paper Code

Two limitations of existing methods: (a) lack of interpretability and clinical relevance and (b) insufficient representation learning of image-report pairs; and our corresponding improvement.

Demo for Anatomical Structure-Sentence Alignment

Three scenarios of anatomical region-sentence alignment.

We provide an example, showing the alignment results under two different methods, i.e., merge bbox, split sentence.

Raw report: mild left basal atelectasis. otherwise unremarkable. ap upright and lateral views the chest were provided. mild left basal atelectasis. lungs are otherwise clear. no signs of pneumonia or edema. no large effusion or pneumothorax. cardiomediastinal silhouette is normal. bony structures are intact. no free air below the right hemidiaphragm.

Anatomical Structure-Guided Medical Vision-Language Pre-training

Two limitations of existing methods: (a) lack of interpretability and clinical relevance and (b) insufficient representation learning of image-report pairs; and our corresponding improvement.

The framework of Anatomical Structure-Guided Medical Vision-Language Pre-training.

Our anatomical region - sentence alignment pipeline.

One case of our re-labeled dataset.

The differences between our re-labeled dataset and Chest ImaGenome.(use left lung as an example)

Demo for Anatomical Structure-Sentence Alignment

Three scenarios of anatomical region-sentence alignment.

We provide an example, showing the alignment results under two different methods, i.e., merge bbox, split sentence.

Merge Bbox

Split Sent