I am currently a research intern at ByteDance Seed, focusing on reinforcement learning in multimodal large language models. I work closely with Dr. Ling Yang from the AI Lab at Princeton University. Previously, I was honoured to work with Prof. Hao Zhu and Prof. Licheng Jiao at IPIU in Xidian University.
Welcome to any types of research collaboration and discussions! I am actively looking for research internship opportunities in either academia or industry. Feel free to contact me via email or WeChat.
Research
My current research focuses on Multimodal Large Language Models (MLLMs), specifically revolutionizing reinforcement learning for the alignment of unified models and continuously pushing the boundaries of their reasoning and generative ability to unlock full potential in complex scenarios.
Previously, my research centered on text-to-image generation, with a particular emphasis on investigating capabilities of diffusion models under complex and compositional prompts, including IterComp (ICLR'25), RealCompo (NeurIPS'24), HermesFlow, Diffusion-Sharpening.
News
[Sep. 2025] Three papers about reinforcement learning and MLLMs are accepted by NeurIPS 2025, including HermesFlow, MMaDA, and PeRL.
[Aug. 2025] RPF-Net is accepted by Pattern Recognition.
HEAR: High-frequency Enhanced Autoregressive Modeling for Identity-Preserving Image Generation
Shiyi Zhang*, Xinchen Zhang*, Youliang Zhang, Yongxin Xiao, Xiu Li, Jian Song, Yujiu Yang
Under Review
SparseAR: Not All Visual Tokens Are Crucial in Autoregressive Image Model Training
Ling Yang*, Zhaochen Yu*, Xinchen Zhang*, Peng Cao, Yujiu Yang, Bin Cui, Shuicheng Yan
Under Review
Compositional Generalization through Brain-inspired Geometric Constraints on Representation Structure
Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinchen Zhang, Lei Ke, Yuwang Wang, Yujiu Yang
Under Review