I am currently a postdoctoral researcher at Tsinghua University. I received my Ph.D. from the Institute of Automation, Chinese Academy of Sciences (CASIA), in 2025.
I serve as a Partner at GigaAI, collaborating closely with Zheng Zhu. My research experience includes research collaborations with Megvii, PhiGent, and Tongyi Wan Team.
My research interests revolve around Embodied AI , World Models, aiming to develop understanding of physics and motion in AI systems. Please feel free to reach out if you have any questions or would like to discuss further.
A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation Guosheng Zhao*, Xiaofeng Wang*, Zheng Zhu*, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang
AAAI, 2025
[arXiv][Page][Code]
DriveDreamer-2 is the first world model to generate customized driving videos in a user-friendly manner.
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Xiaofeng Wang*, Zheng Zhu*, Guan Huang*, Boyuan Wang, Xinze Chen, Jiwen Lu arXiv, 2024
[arXiv][Page]
WorldDreamer, a pioneering world model to foster a comprehensive comprehension of general world physics and motions, which significantly enhances the capabilities of video generation.
This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios.
Drivedreamer: Towards real-world-driven world models for autonomous driving Xiaofeng Wang*, Zheng Zhu*, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu European Conference on Computer Vision (ECCV), 2024
[arXiv][page][Code]
DriveDreamer is the first world model established from real-world driving scenarios. It empowers controllable driving video generation and enables the prediction of reasonable driving policies.
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception Xiaofeng Wang*, Zheng Zhu*, Wenbo Xu*, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu , Xingang Wang IEEE International Conference on Computer Vision (ICCV), 2023
[arXiv][Code]
Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
We propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors.
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark Xiaofeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen , Xingang Wang IEEE Conference on Computer Vision and Pattern Recogintion (CVPR), 2023
[arXiv][Code]
We propose the Autonomousdriving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.
We propose a novel end-to-end Transformer-based method for multi-view stereo, named MVSTER. It leverages the proposed epipolar Transformer to efficiently learn 3D associations along epipolar line.