Hi! 👋
I'm Leon, a Computer Science Ph.D. student at Stanford University. I'm grateful to be advised by Serena Yeung and Ludwig Schmidt at Stanford AI Lab. I currently work as a research scientist intern at Meta Superintelligence Labs.
Before Ph.D., I did my undergrad and worked at Nanyang Technological University, advised by
Ziwei Liu. I've also worked with Alan Yuille and Zongwei Zhou at Johns Hopkins University.
My research focuses on developing scalable frameworks for training and evaluating AI agents. I work on multimodal reasoning agents, data-efficient training methodologies, data collection pipelines, and practical architectures that enable agents to perform complex reasoning tasks. My interests span from foundational model development to building robust pipelines for agent training and evaluation.
I'm passionate about bridging the gap between research and real-world applications of AI systems. Whether you're working on synthetic data, exploring collaborative research opportunities, or developing practical reasoning applications, I'd love to connect and discuss potential synergies.
Summary: Train and test models on native-resolution images. Use mixed-resolution datasets to balance performance with computational resources, and maintain consistent resolution between training and inference to avoid misalignment issues.
Summary: The first open-source model trained on public reasoning data to match DeepSeek-R1-Distill's performance through 1000+ systematic data curation/synthesis experiments.
Summary: The first benchmark evaluating AI agents on evolving real-world websites with 1,050 human-written multihop tasks requiring long-range reasoning.
Summary: An extensive benchmark systematically analyzing generative data across visual recognition tasks with a novel CLER Score metric.
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo Li*, Yuanhan Zhang*, Liangyu Chen*, Jinghao Wang*, Fanyi Pu*, Joshua Adrian Cahyono, Jingkang Yang, Ziwei Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Summary: The first large-scale multi-modal instruction tuning dataset with 2.8 million instruction-response pairs derived from images and videos. A model is trained on this dataset to achieve state-of-the-art performance on vision-language tasks.
Large Language Models are Visual Reasoning Coordinators Liangyu Chen*, Bo Li*, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu
Neural Information Processing Systems (NeurIPS), 2023
ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023
Summary: The first multimodal agent study to use natural language as communication medium for LLMs to coordinate multiple vision-language models for complex reasoning.
Panoptic Video Scene Graph Generation Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo,
Liangyu Chen, Bo Li, Zheng Ma, Wayne Zhang, Kaiyang Zhou, Chen Change Loy, Ziwei Liu
Computer Vision and Pattern Recognition (CVPR), 2023
Summary: Extended scene graph generation from static images to dynamic videos with unified panoptic understanding of objects and stuff.
Making Your First Choice: To Address Cold Start Problem in Vision Active Learning Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou
Medical Imaging with Deep Learning (MIDL), 2023
Radiological Society of North America (RSNA), Abstracts, 2024
NeurIPS Workshop on Human in the Loop Learning, 2022
Summary: A unified open-source framework specifically designed for model-based reinforcement learning research with modular components.
Service
I love teaching AI and was honored to coach Singapore's teams for the first-ever International Olympiad in Artificial Intelligence in 2024. Our two teams excelled on the global stage, securing two of the four gold medals awarded in the Scientific Round.