Simulation Distillation

Pretraining World Models in Simulation for Rapid Real-World Adaptation

Jacob Levy^*1, Tyler Westenbroek^*2, Kevin Huang², Fernando Palafox¹, Patrick Yin²,
Shayegan Omidshafiei³, Dong-Ki Kim³, Abhishek Gupta^†2, David Fridovich-Keil^†1 ¹UT Austin, ²UW, ³FieldAI, ^*Equal Contribution, ^†Equal Advising

Paper Code

RSS 2026 (Accepted)

3D Planning DemoInteractive

SimDist plans in a latent world model pretrained in simulation. Below we reconstruct and visualize the latent plans — play with them yourself!

The Simulation Distillation Pipeline

SimDist distills structural priors from large-scale mixed-quality simulation data into a latent world model, then rapidly improves real-world performance by planning with the model while finetuning its dynamics predictions.

Reliable improvement with simple, supervised system identification!

Step 1 — Expert Policy Training

Train state-based expert policy.

Step 2 — Data Generation

Perturb expert actions to generate large-scale diverse dataset.

Step 3 — World Model Pretraining and Deployment

Distill simulation data into a world model from raw perception and deploy it with online planning.

Step 4 — Adaptation

Finetune dynamics predictions with real-world data to improve planning performance.

Robust Rapid Real-World Improvement

Rapid Performance Improvement

Simulation Distillation (SimDist) rapidly overcomes the sim-to-real dynamics gap through adaptation in the real world, resulting in substantial gains in task execution on both precise manipulation and quadrupedal locomotion tasks.

Manipulation - Peg Insertion

Manipulation - Table Leg Threading

Quadruped - Slippery Slope

Quadruped - Memory Foam

Higher Task Throughput

SimDist increases task throughput by improving both reliability and execution speed after real-world adaptation, enabling more successful task completions in less time.

Added Robustness

SimDist improves robustness to external disturbances, allowing the adapted planner to recover from unexpected physical perturbations while continuing toward the task goal.

Why End-to-end RL Finetuning is Hard

Catastrophic Forgetting

Existing end-to-end reinforcement learning methods often collapse when finetuning policies in new domains, indicating catastrophic forgetting of pretraining priors. These algorithms entangle learning representations, dynamics, and returns, forcing relearning of the entire task structure in the new domain.

Preserving Simulator Priors During Finetuning

Task Structure

Our Key Insight: world models automatically decompose task structure in a form that we can exploit to target adaptation where it’s needed. We argue that the encoder, rewards, and value function capture the global structure of the problem in a form that is largely invariant sim-to-real. Thus, we freeze these components during the real world finetuning phase, and focus on finetuning only the dynamics model. This sidesteps the need for end-to-end learning with sparse real-world data and avoids long-horizon credit assignment, which is a central challenge for existing RL approaches.

World model architecture showing encoder, latent dynamics, and value components

Transferring State Representations

In order to reliably transfer from sim-to-real, the encoder must learn a valid state representation for the real world environment. Below, we display images which are reconstructed from the latent states predicted by the world model. This demonstrates how the encoder — trained entirely in simulation — captures a robust and accurate representation for the real world.

Note: we do not train the world model with a reconstruction loss. These images are produced by an auxiliary probe that was trained to predict real images from encoded latent states.

Transferring Value Functions

In order to be useful for planning, the value function we transfer from simulation does not need to exactly model real-world returns. Instead, the planner only needs the value function to accurately discriminate between high quality and low-quality states in the real world. This enables the planner to reason counterfactually at test time to improve performance. Below, we see that the value function is able to accurately discriminate successful and failed trajectories in the real world.

Success

Failure

Adapting Dynamics Prediction

Adapting the dynamics model is essential for effective planning, as both the reward and value estimates are computed over predicted trajectories. By freezing the encoder, we reduce adaptation to a simple supervised learning problem in a low-dimensional latent space. This yields an extremely simple learning problem which can be reliably solved in low-data regimes.

Finetuning drastically lowers dynamics prediction loss for a held out quadruped slippery slope trajectory.

During this trajectory, the front-left foot slips.

Foot prediction comparison visualization during slip event

Real-World Results Interactive

Success rate for two manipulation tasks, computed over 20 trials, and average forward progress for two quadruped locomotion tasks, averaged across all 15 trials (3 speeds, 5 trials each), as a function of real-world finetuning data. For manipulation, we consider two difficulties: initial conditions drawn from a Narrow or Wide grid.

Hover a legend item to highlight a curve.
Click to toggle it on/off.

Static fallback plot for real-world results

Interactive charts load here from assets/data/results.json.

BibTeX

@article{2026simdist,
  title={Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation},
  author={Levy, Jacob and Westenbroek, Tyler and Huang, Kevin and Palafox, Fernando and Yin, Patrick and Omidshafiei, Shayegan and Kim, Dong-Ki and Gupta, Abhishek and Fridovich-Keil, David},
  journal={arXiv preprint arXiv:2603.15759},
  year={2026},
  url={https://arxiv.org/abs/2603.15759}
}

Simulation Distillation

3D Planning Demo

The Simulation Distillation Pipeline

Rapid Performance Improvement

Higher Task Throughput

Added Robustness

Catastrophic Forgetting

World Models

Task Structure

Transferring State Representations

Transferring State Representations

Freezing and Transferring Task Structure

Transferring Value Functions

World Models

Adapting Dynamics Prediction

Real-World Results