Yixuan Zhu

I am a fourth-year Ph.D. student at Tsinghua University, advised by Prof. Jie Zhou, Prof. Jiwen Lu, and Prof. Yansong Tang.

Prior to my doctoral journey, I received my B.S. degree in Electronic Engineering with honours from Tsinghua University in 2022.

My academic pursuits revolve around the dynamic intersection of multimodal world models and unified multimodal understanding and generation models. This compelling research area fuels my passion for exploring innovative solutions and contributing to the cutting-edge advancements in the field.

Email / Github

News

2026-01: Two papers on world models and image enhancement, accepted by ICLR 2026.

2025-11: We present our new work, Astra, an interactive video world model!

2025-02: A papers on multimodal video editing, accepted by CVPR 2025. Papers and codes coming soon!

2025-01: A papers on AIGC image enhancement, accepted by ICLR 2025. Papers and codes coming soon!

2024-04: We are happy that our work, FlowIE, has been nominated as an oral in CVPR 2024!

2024-02: Two papers on AIGC image enhancement and 3D human recovery, accepted by CVPR 2024. Papers and codes coming soon!

2024-02: A paper on face swapping, accepted by IEEE Transactions on Multimedia.

Recent Selected Publications [ Full List ]

(*Equal Contribution, #Corresponding Author)

	Astra: General Interactive World Model with Autoregressive Denoising Yixuan Zhu, Jiaqi Feng, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Jie Zhou, Jiwen Lu The Fourteenth International Conference on Learning Representations (ICLR), 2026 [Paper] [Code] [Project Page] We introduce Astra, an interactive world model that delivers realistic long-horizon video rollouts under a wide range of scenarios and action inputs.
	VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution Yixuan Zhu, Shilin Ma, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang, Lei Chen, Jiwen Lu, Jie Zhou The Fourteenth International Conference on Learning Representations (ICLR), 2026 [Coming soon] We introduce VARestorer, a one-step VAR distillation framework for real-world image super-resolution that mitigates error accumulation.
	FADE: Frequency-Aware Diffusion Model Factorization for Video Editing Yixuan Zhu, Haolin Wang, Shilin Ma, Wenliang Zhao, Yansong Tang, Lei Chen#, Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [Paper] [Code] We introduce FADE—a training-free yet highly effective video editing approach that fully leverages the inherent priors from pre-trained video diffusion models via frequency-aware factorization.
	InstaRevive: One-Step Image Enhancement via Dynamic Score Matching Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen#, Jie Zhou, Jiwen Lu The Thirteenth International Conference on Learning Representations (ICLR), 2025 [Paper] [Code] We propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps.
	FlowIE: Efficient Image Enhancement via Rectified Flow Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang#, Jie Zhou, Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Oral), 2024 [Paper] [Code] We proposed a unified framework for various efficient image enhancement tasks with generative diffusion priors.
	DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery Yixuan Zhu, Ao Li, Yansong Tang#, Wenliang Zhao, Jie Zhou, Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [Paper] [Code] [Project Page] We propose a new method to exploit diffusion priors for human mesh recovery (HMR) in occlusion and crowded scenarios.

Website Template