NeoVerse

Enhancing 4D World Model with in-the-wild Monocular Videos

CVPR 2026

Yuxue Yang^{1, 2} Lue Fan^{1 †} Ziqi Shi¹ Junran Peng¹ Feng Wang² Zhaoxiang Zhang¹

¹NLPR & MAIS, CASIA ²CreateAI

Corresponding Authors. † Project Lead.

arXiv PDF Code

Hugging Face

ModelScope

BiliBili YouTube

Counterfactual Simulation Image to World Diverse Camera Control Bullet Time
Video Editing Application in Robotics Application in Driving Single-view to Multi-view Camera Shake Control Zoom Control

News

2026-02-21

🎉 NeoVerse has been accepted to CVPR 2026!

2026-02-16

🔥 We have released the inference script, Gradio-based web demo, and model checkpoints on Hugging Face and ModelScope. Check out the code on GitHub!

2026-01-05

🔍 We are recruiting Ph.D. students and interns interested in world model research. If you are passionate, please contact: Lue Fan (lue.fan@ia.ac.cn)

TL; DR

NeoVerse is a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications.

Counterfactual Simulation

We are excited to see that Waymo World Model shares similar vision with NeoVerse. NeoVerse enables generating multi-view videos from dashcam video and diverse counterfactual scenarios with long-tail objects.

Reference Dashcam Video

Image to World

Diverse Camera Control

Bullet Time

Video Editing

Application in Robotics

Application in Driving

Single-view to Multi-view

Starting from a single front-view video, NeoVerse can generate multi-view consistent videos.

Camera Shake Control

NeoVerse can re-plan camera trajectories to either stabilize shaky dashcam videos or inject realistic shake into smooth ones — enabling both directions of control beyond simple pixel-level transforms.

Original Shaky Video

Stabilized Result

Original Smooth Video

Shake-Injected Result

Zoom Control

NeoVerse supports dynamic focal length adjustment, enabling continuous zoom from wide-angle to close-up views.

Original Video

Zoom Effect: Default → Wide-angle → Close-up

Abstract

In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks.

Method

Framework of NeoVerse. In the reconstruction part, we propose a pose-free feed-forward 4DGS reconstruction model with bidirectional motion modeling. The degraded renderings in novel viewpoints from 4DGS are input to the generation model as conditions. During training, the degraded rendering conditions are simulated from monocular videos, and the original videos themselves serve as targets.

Runtime Comparison

NeoVerse is accessible to powerful distillation LoRAs, enabling a fast inference speed less than 30 seconds. The runtime evaluation is conducted on a single A800 GPU.

BibTeX

@article{yang2026neoverse,
  author    = {Yang, Yuxue and Fan, Lue and Shi, Ziqi and Peng, Junran and Wang, Feng and Zhang, Zhaoxiang},
  title     = {NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos},
  journal   = {arXiv preprint arXiv:2601.00393},
  year      = {2026},
}

Acknowledgements

We sincerely thank the great work VGGT, WorldMirror, Depth Anything 3, Wan-Video, TrajectoryCrafter, ReCamMaster, and DiffSynth-Studio for their inspiring work and contributions to the 3D and video generation community.