DreamX-World is a general-purpose world model for interactive world simulation. It generates diverse, high-fidelity worlds that users can explore, control, and transform with event prompts.
The model is trained with a scalable data engine on Unreal Engine data, gameplay footage, and real-world videos, combined with camera estimation and strict data filtering to learn realistic dynamics and interactions. It follows a progressive training pipeline: learning fine-grained action control first, then open-ended event response, and using Reinforcement Learning to improve action following, interaction consistency, and visual fidelity. Finally, through forcing and distillation, DreamX-World achieves efficient inference, making interactive generation practical at scale.
- 2026.06.15: We released DreamX-World 1.0 technical report.
- 2026.06.15: We open-sourced DreamX-World-5B that supports 1-min video generation.
- 2026.05.11: We open-sourced DreamX-World-5B-Cam and inference codes.
- ✔️ DreamX-World-5B-Cam Model.
- ✔️ Long-horizon DreamX-World-5B Model.
- ✔️ Release Technical Report.
- DreamX-World-14B-Cam Model.
- Audio-Video Joint Generation Model.
- Install dependencies
pip install -r requirements.txt- Download Wan2.2-5B-TI2V checkpoints from https://huggingface.co/Wan-AI
Please check out inference_README.md for detailed instructions.
| Model | Download Link | Details | Instrutions |
|---|---|---|---|
| DreamX-World-5B-Cam | Huggingface, ModelScope | Bidrectional, Supports 5s Video Generation | inference_README.md |
| DreamX-World-5B | Huggingface, ModelScope | Autoregressive, Supports Long-horizon Video Generation | inference_README.md |
Note: The demo videos are intentionally compressed to ensure smooth playback, which may result in a slight loss of visual quality.
DreamX-World supports long-horizon autoregressive generation with precise camera control. Progressive training on long rollouts mitigates identity, background, style, and color drift, enabling coherent world exploration over hundreds of frames.
long_01.mp4 |
long_05.mp4 |
long_03.mp4 |
long_04.mp4 |
DreamX-World uses geometry-guided memory retrieval to recover non-local visual evidence from earlier observations. This improves scene persistence when the camera revisits a previously explored region, preserving its layout, object identities, and local appearance.
memory_01.mp4 |
memory_02.mp4 |
memory_03.mp4 |
memory_04.mp4 |
memory_05.mp4 |
DreamX-World enables high-fidelity, controllable exploration across diverse realistic environments, including indoor, urban, natural, and architectural scenes.
01.mp4 |
02.mp4 |
03.mp4 |
04.mp4 |
05.mp4 |
06.mp4 |
07.mp4 |
08.mp4 |
Beyond realistic scenes, DreamX-World also generates fantasy, game-like, sci-fi, and stylized worlds.
01.mp4 |
02.mp4 |
03.mp4 |
04.mp4 |
06.mp4 |
07.mp4 |
08.mp4 |
09.mp4 |
DreamX-World supports both first-person interaction and coherent third-person generation. It keeps camera-follow behavior stable while preserving controllable agent motion and scene consistency.
01.mp4 |
02.mp4 |
03.mp4 |
04.mp4 |
05.mp4 |
07.mp4 |
08.mp4 |
10.mp4 |
DreamX-World supports prompt-driven world events that dynamically change the environment, including flexible and compositional event generation with consistent temporal evolution.
- Single Event: A single event prompt triggers a specific world-changing interaction.
- Compositional Events: Multiple events compose together to create complex, multi-step world transformations.
01.mp4 |
02.mp4 |
03.mp4 |
04.mp4 |
05.mp4 |
06.mp4 |
07.mp4 |
08.mp4 |
Join our WeChat group for discussion:
This project is licensed under Apache 2.0. See LICENSE for details.
We thank the Wan Team for open-sourcing their code and models.

