About Me

Hi there! I am a reseach scientist at Salesforce AI Research, directed by Silvio Savarese and Caiming Xiong.

Previously, I completed my PhD at UC San Diego, working with Julian McAuley. My research interests are in vision & language, with a current focus on building and understanding scalable vision-language models, e.g., multimodal LLMs, video diffusion models.

Selected Research:

Visual Generation, Diffusion

Blip3o-next: Next frontier of native image generation
• Jiuhai Chen et al, Unified model for image generation and editing.

An Empirical Study of Attention Mechanisms in Video Diffusion Models
• An Yan et al, ICCV-2025 long-video workshop

Multimodal LLMs

BLIP-3: A Family of Open Large Multimodal Models
• Le Xue et al, open recipe of data and training for M-LLMs.

Trust but Verify: Programmatic VLM Evaluation in the Wild
• Viraj Prabhu et al, ICCV 2025

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
• An Yan et al, COLM 2024

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
• An Yan et al, an early attempt of multimodal phone agents

Vision and Language Models

Learning Concise and Descriptive Attributes for Visual Recognition
• An Yan et al, ICCV 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
• Wanrong Zhu et al, EACL 2023

RadBERT: Adapting Language Models to Radiology
• An Yan et al, Journal of Radiology 2022

Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
• An Yan et al, EMNLP 2021

Personalization & Recommendation

Bridging Language and Items for Retrieval and Recommendation
• Yupeng Hou et al, Amazon dataset

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
• An Yan et al, Google review dataset

Personalized Complementary Product Recommendation
• An Yan et al, WWW 2022

Work Experience

Research Intern at Microsoft, Redmond, WA.
Hosts: Zhengyuan Yang, Jianwei Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Zicheng Liu, Lijuan Wang.
GPT-4V as Agents. Data recipe and training of Multimodal LLMs.
Sep 2023 - Mar 2024.

Research Intern at Adobe, San Jose, CA.
Hosts: Raghav Addanki, David Arbour, Zhao Song, Tong Yu.
Gradient-based constrained sampling from LMs.
Jun 2023 - Sep 2023.

Research Intern at Meta, Menlo Park, CA.
Hosts: Cem Akkaya, Licheng Yu, Charlie Zhu, Yang Bai.
Multi-modal pre-training for ads understanding and generation.
Jun 2022 - Sep 2022.

Applied Scientist Intern at Amazon, Seattle, WA.
Hosts: Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao.
Personalized complementary recommendation. Top 10 most viewed publications of 2022 at Amazon Science.
Jun 2021 - Sep 2021.

Applied Scientist Intern at Amazon, Santa Barbara, CA.
Hosts: Craig Bennett, Nic Jedema.
QA quality evaluation with BERT.
Jun 2020 - Sep 2020.

Education

University of California San Diego
Ph.D. & M.S. in Computer Science
Sep 2018 - Mar 2024.

University of Science and Technology of China
B.E. in Electronic Engineering & Information Science
Sep 2014 - Jun 2018.