I am a third-year Ph.D. student in the Multi-Media Lab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin. My current research interest spans the vision-language model and object-centric video understanding.
I was a Research Scientist Intern at Meta Superintelligence Labs (Segment Anything Team) under the mentorship of Nicolas Carion.
I expect to graduate in Summer 2027. I am actively seeking internship and full-time Research Scientist opportunities and would love to connect regarding potential openings.
[Jan. 2026] Three paper have been accepted at ICLR 2026.
[Nov. 2025] We released SAM 3, a unified model for detection, segmentation, and tracking of objects in images and video using text, exemplar, and visual prompts.
[Aug. 2025] Invited talk at LSVOS workshop at ICCV 2025.
Outperform SAM 2 by a large margin through a training-free memory tree.
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation Shuangrui Ding*, Zihan Liu*, Xiaoyi Dong, Pan Zhang, Rui Qian, Junhao Huang, Conghui He,
Dahua Lin,
Jiaqi Wang ACL main, 2025
arXiv
/
code
/
invited talk
/
demo page
A language large model that understands and generates melodies and lyrics in symbolic song representations.
Keynote Speaker, LSVOS Workshop at ICCV 2025, Honolulu, Hawaii.
Misc
1. My favorite sports is soccer. I was the captain of UM-SJTU JI soccer team during season 2018. Besides, I am a super fan of Manchester City in Premier League.
2. I am proud that I have graudated from the competition class at Hangzhou No.2 High school, where I make friends with so many talented students and prestigious teachers.
3. It is worth mentioning that Rui is my best friend and has motivated me forward for over ten years as my role model. Best wishes and good luck!