Yuqi Wang (Robert)
I am currently a researcher at Bytedance . I received my Ph.D. degree in the NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA), supervised by Prof. Zhaoxiang Zhang . Prior to that, I obtained my Bachelor's degree in Automation (Robotics) from the College of Control Science and Engineering at Zhejiang University (ZJU) in 2020.
Additionally, I interned at Meituan , under the supervision of Fei Xia, and at BAAI , where I was mentored by Dr. Xinlong Wang .
My overarching research passion lies in
embodied understanding and planning in open-world environments ,
with a particular focus on exploring world models and their applications in the physical world.
My research interests span
computer vision ,
unsupervised learning ,
3D perception ,
world models , and
video generation ,
all aiming toward comprehensive open-world 3D scene perception and understanding.
Email  / 
Google Scholar  / 
Github  / 
Curriculum Vitae
We are hiring interns for world model and multimodal agent related work. Please contact me at Email
News
2025-09: One paper is accepted to NeurIPS 2025 .
2025-06: Two papers are accepted to ICCV 2025 .
2025-01: Two papers are accepted to ICLR 2025 .
2024-09: One paper on driving world model is accepted to NeurIPS 2024 Dataset Track .
2024-07: One paper on indoor monocular occupancy is accepted to ECCV 2024 .
2024-02: Two papers on driving world model and occupancy prediction is accepted to CVPR 2024 .
2023-12: One paper on multi-agent representation learning is accepted to Nation Science Review .
2023-07: One paper on unsupervised instance segmentation is accepted to TPAMI .
2023-02: One paper on 3D object perception is accepted to CVPR 2023 .
2022-09: One paper on unsupervised object discovery is accepted to NeurIPS 2022 .
Research
* indicates equal contribution
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Shuyao Shang* , Yuntao Chen* , Yuqi Wang , Yingyan Li , Zhaoxiang Zhang
NeurIPS, 2025
[paper]
DPO for autonomous driving safety
Unified Vision-Language-Action Model
Yuqi Wang , Xinghang Li , Wenxuan Wang , Junbo Zhang , Yingyan Li , Yuntao Chen , Xinlong Wang , Zhaoxiang Zhang
arXiv, 2025
[paper] [Page] [Code]
unified vision-language-action model for embodied intelligence
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen , Yuqi Wang , Zhaoxiang Zhang
ICCV, 2025
[paper] [Page]
Unifying world model and planning in autonomous driving
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
Yingyan Li* , Yuqi Wang* , Yang Liu , Jiawei He , Lue Fan , Zhaoxiang Zhang
ICCV, 2025
[paper] [Code]
An end-to-end autonomous driving framework that leverages a BEV-based world model to predict future agent states, enabling online trajectory evaluation and selection.
FreeVS: Generative View Synthesis on Free Driving Trajectory
Qitai Wang , Lue Fan , Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
ICLR, 2025
[paper] [Page] [Code]
Generative view synthesis on free driving trajectory
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li , Lue Fan ,Jiawei He , Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
ICLR, 2025
[paper] [Code]
Latent world model as a self-supervised learning proxy for end-to-end autonomous driving
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
Yuqi Wang *, Ke Cheng *, Jiawei He *, Qitai Wang *, Hengchen Dai , Yuntao Chen , Fei Xia , Zhaoxiang Zhang
NeurIPS, 2024, D&B Track
[paper] [Page] [code]
DrivingDojo dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge, laying a stepping stone for future world model development.
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu* ,
Xiaofeng Wang* ,
Wangbo Zhao* ,
Chen Min* ,
Nianchen Deng* ,
Min Dou* ,
Yuqi Wang* ,
Botian Shi ,
Kai Wang ,
Chi Zhang ,
Yang You ,
Zhaoxiang Zhang ,
Dawei Zhao ,
Liang Xiao ,
Jian Zhao ,
Jiwen Lu ,
Guan Huang
arXiv, 2024
[paper] [code]
A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu , Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
ECCV, 2024
[paper] [page] [code]
ISO, a method for monocular occupancy prediction in indoor scenes.
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang* , Jiawei He* , Lue Fan* , Hongxin Li* , Yuntao Chen , Zhaoxiang Zhang
CVPR, 2024
[paper] [Page] [code]
Drive-WM, a pioneering multi-view world model for end-to-end autonomous driving.
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang , Yuntao Chen , Xingyu Liao , Lue Fan , Zhaoxiang Zhang
CVPR, 2024
[paper] [code]
PanoOcc, a method for camera-based 3D panoptic scene understanding.
Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks
Yuqi Wang , Xu-Yao Zhang , Cheng-Lin Liu , Tieniu Tan , Zhaoxiang Zhang
National Science Review (NSR), 2024
[paper]
Emergence of machine language.
Object Affinity Learning: Towards Annotation-Free Instance Segmentation
Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
[paper] [Code]
2D object discovery through depth and flow cues.
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
CVPR, 2023
[paper] [Code] [Bilibili]
FrustumFormer, enhancing vision-based 3D object detection through 2D prior.
4D Unsupervised Object Discovery
Yuqi Wang , Yuntao Chen , Zhaoxiang Zhang
NeurIPS, 2022, (Spotlight)
[paper] [Code]
4D unsupervised object discovery using camera and LiDAR raw information.
Honors and Awards
2025 中科院院长奖
2025 北京市优秀毕业生
2024 National Scholarship / 国家奖学金
2023 朱李月华奖学金
2020 浙江省优秀毕业生
2018 National Scholarship / 国家奖学金
2018 中控奖学金
© Yuqi Wang | Last updated: September 25, 2025