Xinchen Zhang

I am a second-year master student at IIGroup in Tsinghua University, supervised by Prof. Yujiu Yang. I received my Bachelor's degree at the School of Artificial Intelligence, Xidian University.

I am currently a research intern at ByteDance Seed, focusing on reinforcement learning in multimodal large language models. I work closely with Dr. Ling Yang from the AI Lab at Princeton University.

Email / WeChat / Github / Google Scholar

I will graduate at 2027. Feel free to contact me via email or WeChat if you are recruiting!

Research

My current research focuses on Multimodal Large Language Models (MLLMs), specifically revolutionizing reinforcement learning for the alignment of unified models and continuously pushing the boundaries of their reasoning and generative ability to unlock full potential in complex scenarios, including OmniVerifier (ICLR'26 Oral), HermesFlow (NeurIPS'25), MMaDA (NeurIPS'25), and Seed1.5-VL.

Previously, my research centered on text-to-image generation, with a particular emphasis on investigating the capabilities of diffusion models under complex and compositional prompts, including IterComp (ICLR'25) and RealCompo (NeurIPS'24).

News

[May. 2026] OmniVerifier-M1 is accepted by ICML 2026.
[May. 2026] I propose OmniVerifier-M1, advancing multimodal verifier with symbolic meta-verification.
[Jan. 2026] OmniVerifier is accepted by ICLR 2026 (Oral Paper, Top 1%).
[Dec. 2025] We release Seed1.8, a generalized agentic model in real-world scenarios.
[Oct. 2025] I propose OmniVerifier, a universal verifier for generalist multimodal foundation models.
[Sep. 2025] Three papers about reinforcement learning and MLLMs are accepted by NeurIPS 2025, including HermesFlow, MMaDA, and PeRL.
[Aug. 2025] RPF-Net is accepted by Pattern Recognition.
[May. 2025] We release Seed1.5-VL, a series of state-of-the-art vision-language models.
[Feb. 2025] I started as a research intern at ByteDance Seed, focusing on MLLM Post-training.
[Jan. 2025] IterComp is accepted by ICLR 2025.
[Nov. 2024] I gave a talk at TechBeat about compositional text-to-image generation.
[Oct. 2024] I propose IterComp, leveraging iterative RLHF to achieve fast and realistic text-to-image generation.
[Sep. 2024] RealCompo is accepted by NeurIPS 2024.
[Feb. 2024] I propose RealCompo, achieving the balance of compositionality and realism in controllable text-to-image generation.
[Sep. 2023] Qualified to be exempted from Tsinghua University for postgraduate studies.
[May. 2023] Check out our recent work, RPF-Net.

Technical Reports

	Seed1.8 Model Card: Towards Generalized Real-World Agency Bytedance Seed: ...Xinchen Zhang (Contributor), ... Technical Report Project Page / Preprint
	Seed1.5-VL Technical Report Bytedance Seed: ...Xinchen Zhang (Contributor), ... Technical Report Project Page / Preprint

Publications

(* denotes equal contribution.)

	OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration Xinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi, Yizhen Zhang, Junhong Liu, Youliang Zhang, Zhiheng Li, Yujiu Yang, Ling Yang ICML, 2026 Preprint / Code
	Generative Universal Verifier as Multimodal Meta-Reasoner Xinchen Zhang, Xiaoying Zhang, Youbin Wu, Yanbin Cao, Renrui Zhang, Ruihang Chu, Ling Yang, Yujiu Yang, Guang Shi ICLR, 2026 (Oral Paper, Top 1%) Preprint / Code / Checkpoint / Benchmark
	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Ling Yang, Xinchen Zhang, Ye Tian, Chenming Shang, Minghao Xu, Wentao Zhang, Bin Cui NeurIPS, 2025 Preprint / Code / Checkpoints
	IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui ICLR 2025 Preprint / Code / Checkpoints (Over 3.3W downloads)
	RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kaini Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui NeurIPS 2024 Project page / Preprint / Code
	MMaDA: Multimodal Large Diffusion Language Models Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang NeurIPS, 2025 Preprint / Code / Checkpoints
	HEAR: High-frequency Enhanced Autoregressive Modeling for Identity-Preserving Image Generation Shiyi Zhang, Xinchen Zhang, Youliang Zhang, Yongxin Xiao, Xiu Li, Jian Song, Yujiu Yang Under Review
	SparseAR: Not All Visual Tokens Are Crucial in Autoregressive Image Model Training Ling Yang, Zhaochen Yu, Xinchen Zhang, Peng Cao, Yujiu Yang, Bin Cui, Shuicheng Yan Under Review*
	PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning Yizhen Zhang, Yang Ding, Shuoshuo Zhang, Xinchen Zhang, Haoling Li, Zhong-zhi Li, Peijie Wang, Jie Wu, Lei Ji, Yelong Shen, Yujiu Yang, Yeyun Gong NeurIPS, 2025 Preprint / Code
	Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui arXiv, 2025 Preprint / Code
	Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification Xinchen Zhang, Hao Zhu, Xiaotong Li, Biao Hou, Wenhao Zhao, Xiaoyu Yi, Wenping Ma, Licheng Jiao Pattern Recognition Paper / Code

Education

Tsinghua University
M.Eng. in Big Data Technology and Engineering (2024 - )
Advisor: Prof. Yujiu Yang

Xidian University
B.Eng. in Artificial Intelligence (2020 - 2024)
Advisor: Prof. Hao Zhu, Prof. Licheng Jiao

Experience

ByteDance Seed
Multimodel Interaction and World Model Team
Research Intern (Feb. 2025 - Present)
Topic: VLM Foundation Model Post-training
Advisor: Xiaoying Zhang, Youbin Wu, Guang Shi

Services

Conference Reviewer:
- International Conference on Computer Vision (ICCV) 2025
- International Conference on Machine Learning (ICML) 2025, 2026
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025, 2026
- International Conference on Learning Representations (ICLR) 2025, 2026
- Conference on Neural Information Processing Systems (NeurIPS) 2025
Journal Reviewer:
- International Journal of Computer Vision (IJCV)

Talks

IterComp, RealCompo: Towards Compositional Text-to-Image Generation, TechBeat, 2024

Honors & Awards

Special Prize Scholarship, 2022
First Prize Scholarship, 2021
First Prize, The Chinese Mathematics Competitions (CMC), 2021
First Prize, China Undergraduate Mathematical Contest in Modeling (CUMCM), 2021
First Prize (Meritorious Winner), International Mathematical Contest in Modeling (MCM/ICM), 2022