Anatomical Structure-Guided Medical Vision-Language Pre-training

Qingqiu Li1, Xiaohan Yan2, Jilan Xu1, Runtian Yuan1, Yuejie Zhang1,
Rui Feng1, Quanli Shen3, Xiaobo Zhang3, and Shujun Wang4,5

1Fudan University    2Tongji University    3Children's Hospital of Fudan University   
4The Hong Kong Polytechnic University    5Research Institute for Smart Ageing   
Image

Two limitations of existing methods: (a) lack of interpretability and clinical relevance and (b) insufficient representation learning of image-report pairs; and our corresponding improvement.

Image

The framework of Anatomical Structure-Guided Medical Vision-Language Pre-training.

Image

Our anatomical region - sentence alignment pipeline.

Image

One case of our re-labeled dataset.

Image

The differences between our re-labeled dataset and Chest ImaGenome.(use left lung as an example)

Demo for Anatomical Structure-Sentence Alignment


Three scenarios of anatomical region-sentence alignment.

Image

We provide an example, showing the alignment results under two different methods, i.e., merge bbox, split sentence.



Raw report: mild left basal atelectasis. otherwise unremarkable. ap upright and lateral views the chest were provided. mild left basal atelectasis. lungs are otherwise clear. no signs of pneumonia or edema. no large effusion or pneumothorax. cardiomediastinal silhouette is normal. bony structures are intact. no free air below the right hemidiaphragm.

Merge Bbox


Image

Split Sent

Image