Accurate diagnosis of ocular surface diseases is critical in optometry and ophthalmology,
which hinge on integrating clinical data sources (e.g., meibography imaging and clinical metadata).
Traditional human assessments lack precision in quantifying clinical observations, while current
machine-based methods often treat diagnoses as multi-class classification problems,
limiting the diagnoses to a predefined closed-set of curated answers without reasoning
the clinical relevance of each variable to the diagnosis. To tackle these challenges,
we introduce an innovative multi-modal diagnostic pipeline (MDPipe) by employing large
language models (LLMs) for ocular surface disease diagnosis.
We first employ a visual translator to interpret meibography images by converting them
into quantifiable morphology data, facilitating their integration with clinical metadata
and enabling the communication of nuanced medical insight to LLMs. To further advance this
communication, we introduce a LLM-based summarizer to contextualize the insight from the
combined morphology and clinical metadata, and generate clinical report summaries.
Finally, we refine the LLMs' reasoning ability with domain-specific insight
from real-life clinician diagnoses. Our evaluation across diverse ocular surface disease
diagnosis benchmarks demonstrates that MDPipe outperforms existing standards, including GPT-4,
and provides clinically sound rationales for diagnoses.