Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".
MolmoBot policies have strong demonstrated sim-to-real transfer to a wide variety of novel scenes, objects, and camera viewpoints. Try it out for yourself on your DROID platform with MolmoBot-DROID!
MolmoBot-DROID uses only the wrist camera and 1 exo camera. Don't worry about camera placement, MolmoBot policies are robust to arbitrary camera viewpoints!
See here to try out MolmoBot interactively! Modify the scene and task to test policy behavior.
-
Set up MolmoBot-DROID by following the installation instructions.
-
See these instructions for detailed instructions on setting up and running the policy on your DROID! Any existing DROID or polymetis setups will work easily.
Briefly, after starting the polymetis robot and gripper servers:
# In one terminal cd MolmoBot/MolmoBot source .venv/bin/activate PYTHONPATH=. python launch_scripts/serve_molmo.py --hf-repo allenai/MolmoBot-DROID --action-type joint_pos
# in another terminal cd MolmoBot/robot_eval conda activate molmobot python scripts/droid/run_policy.py robot.robot_host=<nuc_ip> robot.cameras.wrist_camera.id=<wrist_id> robot.cameras.exo_camera_1.id=<exo_id> task="put the red mug in the black bowl"
To use MolmoBot-Data for training experiments, you will need to download it from hugging face using bulk_download.py.
Before using any dataset implementations in this repo, you will need to run a postprocessing script. This filters out any corrupted trajectories, and can optionally check for visibility of certain objects in a given camera. Below is some example usage of the script.
Example usage:
python validate_trajectories.py RBY1OpenDataGenConfig/part0/train --check-visibility head_camera door_handle
python validate_trajectories.py RBY1PickAndPlaceDataGenConfig/part0/train --check-visibility head_camera pickup_obj --check-visibility head_camera place_receptacle
python validate_trajectories.py FrankaPickAndPlaceOmniCamConfig/part0/train --check-visibility droid_shoulder_light_randomization pickup_obj --check-visibility droid_shoulder_light_randomization place_receptacleBefore training (and after data postprocessing), you should also calculate aggregate statistics with calculate_stats.py. Example usage:
python calculate_stats.py FrankaPickAndPlaceOmniCamConfig/part0/train --keys actions obs/agent/qpos
python calculate_stats.py RBY1OpenDataGenConfig/part0/train --keys actions obs/agent/qpos
python calculate_stats.py RBY1PickAndPlaceDataGenConfig/part0/train --keys actions obs/agent/qpos@misc{deshpande2026molmobot,
title={MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation},
author={Abhay Deshpande and Maya Guru and Rose Hendrix and Snehal Jauhri and Ainaz Eftekhar and Rohun Tripathi and Max Argus and Jordi Salvador and Haoquan Fang and Matthew Wallingford and Wilbert Pumacay and Yejin Kim and Quinn Pfeifer and Ying-Chun Lee and Piper Wolters and Omar Rayyan and Mingtong Zhang and Jiafei Duan and Karen Farley and Winson Han and Eli Vanderbilt and Dieter Fox and Ali Farhadi and Georgia Chalvatzaki and Dhruv Shah and Ranjay Krishna},
year={2026},
eprint={2603.16861},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.16861},
}
