Robert Luo

I am now a third year master student from Tsinghua University, majoring in Artificial Intelligence and supervised by Prof.Yang. My current research interests include Mathematical Expression Recognition and Multi-Modal Understanding&Generation.

Previously, I obtained my bachelor degree from Southeast University and supervised by Prof.Xue.

Email / CV / Github

News

[2026.04]HappyHorse 1.0.is released, including Text/Image/Video/Reference to Video Generation

[2025.10]We are very excited to release Emu3.5!

[2025.09]The Paper SOC++ is accepted by TPAMI

[2025.06]The Papers CoHD and IBQ are accepted by ICCV 2025

[2024.12]The Repository SEED-Voken: A Series of Powerful Visual Tokenizer, is released

[2024.09]The Technical Report Open-MAGVIT2 is released

[2024.05]The Bronze Prize of XIB OCR competition

[2024.02]The paper UVCOM is accepted by CVPR 2024

[2023.09]The paper SOC is accepted by NeurIPS 2023

[2023.09]The first prize of The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object Segmentation

[2023.07]The paper FATE is accepted by the NCAA

[2023.03]The second prize of ICDAR 2023 Multi-line Handwritten Mathematical Expression Recognition Competition

[2022.11]The second prize of iFLYTEK A.I. HMER competition

[2022.09]Obtain the Principal Scholarship

[2022.06]A patent is accepted. Thanks for all help

[2021.11]Obtain National Scholarship

Publications(* equal contribution)

	VEN-VL: A Visual Ensemble MoE Framework for Effective and Efficient Multi-Modal Understanding Yinghao Wu, Zhuoyan Luo, Yiyao Yu, Zhaojian Yu, Yujiu Yang and Xiaoping Zhang Arxiv / paper /code
	Emu3.5: Native Multimodal Models are World Learners Emu Team (The Core Contributor) Technical Report / code
	Semantic-Assisted Object Clustering for Multi-Modal Referring Video Segmentation Yong Liu, Zhuoyan Luo, Yicheng Xiao, Yitong Wang, Shuyan Li, Xiu Li, Yujiu Yang and Yansong Tang TPAMI 2025 / code
	Scalable Image Tokenization with Index Backpropagation Quantization Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang ICCV 2025 / paper /code
	Open-MAGVIT2: An Open-source Project Toward Democratizing Auto-Regressive Visual Generation Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, Limin Wang, Ying Shan Arxiv / paper /code
	CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang ICCV 2025 / paper /code
	Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection Yicheng Xiao, Zhuoyan Luo, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li, CVPR 2024 / paper /code
	SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation Zhuoyan Luo, Yicheng Xiao, Yong Liu, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang NeurIPS 2023* / paper /code
	FATE: A Three-Stage Method for Arithmetical Exercise Correction Qipeng Zhu,Zhuoyan Luo, Shipeng Zhu, Qi Jing, Zihang Xu, Hui Xue Accepted by Journal of NCAA (Neural Computing and Applications)* / paper

Awards(* equal contribution)

	The First Prize of ICCV 2023 The 5th Large-scale Video Object Segmentation Challenge Track3: Referring Video Object Segmentation Zhuoyan Luo, Yicheng Xiao, Yong Liu‡, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang. equal contribution, ‡Project lead
	The Second Prize of ICDAR 2023 Multi-line Handwritten Mathematical Expression Recognition Competition Zhuoyan Luo, Yinghao Wu, Zihang Xu, Qi Jing, Hui Xue
	The second Prize of iFLYTEK A.I. HMER competition Zhuoyan Luo, Yinghao Wu

Working Experience

	Research Intern, Alibaba Group Feb.2,2026 - Present Topic: Video Generation
	Research Intern, BAAI Vision Jan.1,2025 - Feb.2,2026 Supervised by: Xinlong Wang
	Research Intern, Tencent ARC Laboratory Jan.1,2024 - Jan.1,2025 Supervised by: Yixiao Ge, Ying Shan

Updated at May. 2022

Thanks Jon Barron for this amazing template.