| Name | About | Repo | Cite |
|---|---|---|---|
| 🐇 RABBITS | Evaluates performance differences in medical benchmarks after swapping brand and generic drug names. | RABBITS | Citation |
| 🔀 Cross-Care | Assesses biases and real-world knowledge in LLMs, focusing on disease prevalence across demographics. | Cross-Care | Citation |
| 🌐 SDOH | Using LLMs to classify Social Determinants of Health in electronic health records. | SDOH | Citation |
| 🏥 OncQA | Evaluates the use of LLMs in responding to patient messages to reduce documentation burden. | OncQA | Citation |
| 💻 MedBrowseComp | Evaluates medical information-seeking-oriented deep research and computer use tasks. | MedBrowseComp | Citation |
| Paper | Code | Journal/Conference |
|---|---|---|
| Evaluating the Robustness of Medical LLMs with Brand-Generic Swaps | Code | EMNLP 2024 |
| Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names | Code | JCO Clinical Cancer Informatics 2025 |
| Paper | Code | Journal/Conference |
|---|---|---|
| Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | Code | NeurIPS 2024 |
| Paper | Code | Journal/Conference |
|---|---|---|
| The effect of using a large language model to respond to patient messages | Code | Lancet Digital Health 2024 |
| Large language models to identify social determinants of health in electronic health records | Code | Nature Digital Medicine 2024 |
| The TRIPOD-LLM reporting guideline for studies using large language models | App | Nature Medicine 2025 |
| The use of large language models to enhance cancer clinical trial educational materials | Code | JNCI Cancer Spectrum 2025 |