About
Pengyuan Li is a Research Staff Member at MIT-IBM Watson AI Lab, leading the development of Granite Vision model, a vision-language model specifically designed for visual document understanding. Previously, he served as the Data Acquisition Lead and collected more than 10PB data for building Large Language, Code, and Multimodal Models at IBM.
Pengyuan’s research focuses on machine learning, multimodal data mining, document analysis, and biomedical informatics. He has served as a visiting scholar at Caltech, UCLA, UBC, JHU, and Tongji University, collaborating with researchers around the world to explore innovative, cross-disciplinary ideas. He is also an Adjunct Faculty member at the Data Science Institute, University of Delaware.
News
- Feb 2026: Glad to present CurateBench: Evaluating LLMs on Evidence-Grounded, Ontology-Aligned Biocuration Tasks at ISB 2026 Virtual Biocuration Conference
- Jan 2026: Published Chart2CSV model and integrated with Docling to enable document intelligence
- Dec 2025: We will organize the DataMFM:Emerging Directions in Data for Multimodal Foundation Models workshop at CVPR 2026
- Aug 2025: Pengyuan gave a talk on Granite Vision Models at OpenCV Live channel
- Jul 2025: Our Granite Vision model yield more than 100K downloads on HuggingFace
- Apr 2025: Pengyuan co-organized ‘AI and Biodata’ workshop at the 18th Annual International Biocuration Conference
- Apr 2025: Pengyuan presented ‘GeneScribe: Leveraging large language models for gene summary generation’ at the 18th Annual International Biocuration Conference
- Mar 2025: Published Large Vision models on behalf of IBM-Research on HuggingFace
- Feb 2023: Pengyuan joined the UD’s Data Science Institue as an adjunct faculty
- Jul 2021: Pengyuan joined the IBM-Research as a Research Staff Member