New work with @nvidia: evaluating robot policies entirely inside a world model. The policy acts, the model imagines the consequences, and the imagined evals predict real-world results. 🧵
real vs world-model rollout side by side📷
The model also knows when not to trust itself: its uncertainty estimates track how far generations deviate from reality.
The uncertainty accurately predicts when model generation degrades
Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!
π, But Make It Fly ✈️
We fine-tuned π0, a VLA model pretrained entirely on manipulators, to fly a drone that picks up objects, navigates through gates, and composes both skills from language commands.
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
With RL, the robot can learn very precise tasks, like fastening a zip tie, and can actually do it more consistently and more quickly than even human teleoperation.