Yuqi Wang

Yuqi Wang (Robert)

I am currently a researcher at Bytedance. I received my Ph.D. degree in the NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA), supervised by Prof. Zhaoxiang Zhang. Prior to that, I obtained my Bachelor's degree in Automation (Robotics) from the College of Control Science and Engineering at Zhejiang University (ZJU) in 2020. Additionally, I interned at Meituan, under the supervision of Fei Xia, and at BAAI, where I was mentored by Dr. Xinlong Wang.

My overarching research passion lies in embodied understanding and planning in open-world environments, with a particular focus on exploring world models and their applications in the physical world. My research interests span computer vision, unsupervised learning, 3D perception, world models, and video generation, all aiming toward comprehensive open-world 3D scene perception and understanding.

Email / Google Scholar / Github / Curriculum Vitae

We are hiring interns for world model and multimodal agent related work. Please contact me at Email

News

2025-09: One paper is accepted to NeurIPS 2025.

2025-06: Two papers are accepted to ICCV 2025.

2025-01: Two papers are accepted to ICLR 2025.

2024-09: One paper on driving world model is accepted to NeurIPS 2024 Dataset Track.

2024-07: One paper on indoor monocular occupancy is accepted to ECCV 2024.

2024-02: Two papers on driving world model and occupancy prediction is accepted to CVPR 2024.

2023-12: One paper on multi-agent representation learning is accepted to Nation Science Review.

2023-07: One paper on unsupervised instance segmentation is accepted to TPAMI.

2023-02: One paper on 3D object perception is accepted to CVPR 2023.

2022-09: One paper on unsupervised object discovery is accepted to NeurIPS 2022.

Research

* indicates equal contribution

	DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving Shuyao Shang, Yuntao Chen, Yuqi Wang, Yingyan Li, Zhaoxiang Zhang NeurIPS, 2025 [paper] DPO for autonomous driving safety
	Unified Vision-Language-Action Model Yuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, Zhaoxiang Zhang arXiv, 2025 [paper] [Page] [Code] unified vision-language-action model for embodied intelligence
	DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang ICCV, 2025 [paper] [Page] Unifying world model and planning in autonomous driving
	End-to-End Driving with Online Trajectory Evaluation via BEV World Model Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang ICCV, 2025 [paper] [Code] An end-to-end autonomous driving framework that leverages a BEV-based world model to predict future agent states, enabling online trajectory evaluation and selection.
	FreeVS: Generative View Synthesis on Free Driving Trajectory Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang ICLR, 2025 [paper] [Page] [Code] Generative view synthesis on free driving trajectory
	Enhancing End-to-End Autonomous Driving with Latent World Model Yingyan Li, Lue Fan,Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang ICLR, 2025 [paper] [Code] Latent world model as a self-supervised learning proxy for end-to-end autonomous driving
	DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model Yuqi Wang, Ke Cheng, Jiawei He, Qitai Wang, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang NeurIPS, 2024, D&B Track [paper] [Page] [code] DrivingDojo dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge, laying a stepping stone for future world model development.
	Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang*, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang arXiv, 2024 [paper] [code] A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.
	Monocular Occupancy Prediction for Scalable Indoor Scenes Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang ECCV, 2024 [paper] [page] [code] ISO, a method for monocular occupancy prediction in indoor scenes.
	Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang CVPR, 2024 [paper] [Page] [code] Drive-WM, a pioneering multi-view world model for end-to-end autonomous driving.
	PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation Yuqi Wang, Yuntao Chen, Xingyu Liao, Lue Fan, Zhaoxiang Zhang CVPR, 2024 [paper] [code] PanoOcc, a method for camera-based 3D panoptic scene understanding.
	Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks Yuqi Wang, Xu-Yao Zhang, Cheng-Lin Liu, Tieniu Tan, Zhaoxiang Zhang National Science Review (NSR), 2024 [paper] Emergence of machine language.
	Object Affinity Learning: Towards Annotation-Free Instance Segmentation Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 [paper] [Code] 2D object discovery through depth and flow cues.
	FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang CVPR, 2023 [paper] [Code] [Bilibili] FrustumFormer, enhancing vision-based 3D object detection through 2D prior.
	4D Unsupervised Object Discovery Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang NeurIPS, 2022, (Spotlight) [paper] [Code] 4D unsupervised object discovery using camera and LiDAR raw information.