Skip to content

haojinw0027/MedFrameQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning

ImageImageImageImage

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
Suhao Yu*, Haojin Wang*, Juncheng Wu*, Cihang Xie, Yuyin Zhou


📢 Breaking News

  • [📄💥 May 22, 2025] Our arXiv paper is released.
  • [💾 May 22, 2025] Full dataset released.

Star 🌟 us if you think it is helpful!!


⚡Introduction

Image

MedFrameQA introduces multi-image, clinically grounded questions that require comprehensive reasoning across all images. Unlike prior benchmarks such as SLAKE and MedXpertQA, it emphasizes diagnostic complexity, expert-level knowledge, and explicit reasoning chains.

  • We develop a scalable pipeline that automatically constructs multi-image, clinically grounded VQA questions from medical education videos.
  • We benchmark ten state-of-the-art MLLMs on MEDFRAMEQA and find that their accuracies mostly fall below 50% with substantial performance across different body systems, organs, and modalities.

We open-sourced our data and code here.

🚀 Dataset construction pipeline

Image

MedFrameQA generation pipeline contains four stages:

  1. Medical Video Collection: Collecting 3,420 medical videos via clinical search queries;
  2. Frame-Caption Pairing: Extracting keyframes and aligning with transcribed captions;
  3. Multi-Frame Merging: Merging clinically related frame-caption pairs into multi-frame clips;
  4. Question-Answer Generation: Generating multi-image VQA from the multi-frame clips.

📚 Statistical overview of MedFrameQA

main

In figure (a), we show the distribution across body systems; (b) presents the distribution across organs; (c) shows the distribution across imaging modalities; (d) provides a word cloud of keywords in MedFrameQA; and (e) reports the distribution of frame counts per question.

🤗 Dataset Download

Dataset 🤗 Huggingface Hub
MedFrameQA SuhaoYu1020/MedFrameQA

🏆 Results

Accuracy by Human Body System on MedFrameQA

main

Accuracy by Modality and Frame Count on MedFrameQA

main


💬 Quick Start

⏬ Install

Using Linux system,

  1. Clone this repository and navigate to the folder
git clone https://github.com/haojinw0027/MedFrameQA.git
cd MedFrameQA
  1. Install Package
conda create -n medframeqa python=3.10 -y
conda activate medframeqa
pip install -r requirements.txt
cd src

🎬 Generate VQA pairs from Video

Download video and audio

python process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv 

# Specify the number of videos to be downloaded
python process.py --process_stage download_process --csv_file ../data/30_disease_video_id.csv --num_ids number(-1 for all)

Extract frame from video and generate transcripts from audio

python process.py --process_stage video_process --csv_file ../data/30_disease_video_id.csv 

Frame-caption pairing

python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv 

# Specify the time intervals for the selection of video frames
python process.py --process_stage pair_process --csv_file ../data/30_disease_video_id.csv --bias_time 20

Multi-frame merging and question-answer generation

python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv 

# Specify the max frame num of one question
python process.py --process_stage vqa_process --csv_file ../data/30_disease_video_id.csv --max_frame_num 5

🧐 Evaluate on MLLMs

python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model"

# Specify the number of questions you want to evaluate
python eval_process.py --input_file "your vqa pairs file path" --output_dir ../eval --model_name "your model" --num_q number(-1 for all)

You can download our datasets to evaluate at SuhaoYu1020/MedFrameQA


📜 Citation

If you find MedFrameQA useful for your research and applications, please cite using this BibTeX:

@misc{yu2025medframeqamultiimagemedicalvqa,
      title={MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning}, 
      author={Suhao Yu and Haojin Wang and Juncheng Wu and Cihang Xie and Yuyin Zhou},
      year={2025},
      eprint={2505.16964},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.16964}, 
}

🙏 Acknowledgement

  • We thank the Microsoft Accelerate Foundation Models Research Program for supporting our computing needs.

About

MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages