Robert Luo
I am now a third year master student from Tsinghua University , majoring in Artificial Intelligence and supervised by Prof.Yang .
My current research interests include Mathematical Expression Recognition and Multi-Modal Understanding&Generation.
Previously, I obtained my bachelor degree from Southeast University and supervised by Prof.Xue .
Email  / 
CV  / 
Github
News
[2026.04] HappyHorse 1.0 .is released, including Text/Image/Video/Reference to Video Generation
[2025.10] We are very excited to release Emu3.5 !
[2025.09] The Paper SOC++ is accepted by TPAMI
[2025.06] The Papers CoHD and IBQ are accepted by ICCV 2025
[2024.12] The Repository SEED-Voken : A Series of Powerful Visual Tokenizer, is released
[2024.09] The Technical Report Open-MAGVIT2 is released
[2024.05] The Bronze Prize of XIB OCR competition
[2024.02] The paper UVCOM is accepted by CVPR 2024
[2023.09] The paper SOC is accepted by NeurIPS 2023
[2023.09] The first prize of The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object Segmentation
[2023.07] The paper FATE is accepted by the NCAA
[2023.03] The second prize of ICDAR 2023 Multi-line Handwritten Mathematical Expression Recognition Competition
[2022.11] The second prize of iFLYTEK A.I. HMER competition
[2022.09] Obtain the Principal Scholarship
[2022.06] A patent is accepted. Thanks for all help
[2021.11] Obtain National Scholarship
Publications (* equal contribution)
VEN-VL: A Visual Ensemble MoE Framework for Effective and Efficient Multi-Modal Understanding
Yinghao Wu*, Zhuoyan Luo* , Yiyao Yu, Zhaojian Yu, Yujiu Yang and Xiaoping Zhang
Arxiv / paper /code
Emu3.5: Native Multimodal Models are World Learners
Emu Team (The Core Contributor)
Technical Report / code
Semantic-Assisted Object Clustering for Multi-Modal Referring Video Segmentation
Yong Liu, Zhuoyan Luo , Yicheng Xiao, Yitong Wang, Shuyan Li, Xiu Li, Yujiu Yang and Yansong Tang
TPAMI 2025 / code
Scalable Image Tokenization with Index Backpropagation Quantization
Fengyuan Shi*, Zhuoyan Luo* , Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
ICCV 2025 / paper /code
Open-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation
Zhuoyan Luo* , Fengyuan Shi*, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan
Arxiv / paper /code
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo* , Yinghao Wu*, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang
ICCV 2025 / paper /code
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao*, Zhuoyan Luo* , Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li,
CVPR 2024 / paper /code
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
Zhuoyan Luo* , Yicheng Xiao*, Yong Liu*, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang
NeurIPS 2023 / paper /code
FATE: A Three-Stage Method for Arithmetical Exercise Correction
Qipeng Zhu*,Zhuoyan Luo* , Shipeng Zhu*, Qi Jing, Zihang Xu, Hui Xue
Accepted by Journal of NCAA (Neural Computing and Applications) / paper
Awards (* equal contribution)
The First Prize of ICCV 2023 The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object Segmentation
Zhuoyan Luo* , Yicheng Xiao*, Yong Liu*‡, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang.
*equal contribution, ‡Project lead
The Second Prize of ICDAR 2023 Multi-line Handwritten Mathematical Expression Recognition Competition
Zhuoyan Luo* , Yinghao Wu*, Zihang Xu, Qi Jing, Hui Xue
The second Prize of iFLYTEK A.I. HMER competition
Zhuoyan Luo* , Yinghao Wu*
Working Experience
Research Intern, Alibaba Group
Feb.2,2026 - Present
Topic: Video Generation
Research Intern, BAAI Vision
Jan.1,2025 - Feb.2,2026
Supervised by: Xinlong Wang
Research Intern, Tencent ARC Laboratory
Jan.1,2024 - Jan.1,2025
Supervised by: Yixiao Ge , Ying Shan