DexWild: Policy Training Repository

Tony Tao^, Mohan Kumar Srirama^, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

Robotics: Science and Systems (RSS) 2025

[Main Repository] [Hardware Guide] [Data Collection Guide]

This repository is the official implementation of the training code for DexWild. It includes the training pipeline for both single data source models and cotraining with many different data buffers (e.g. human and robot).

It is a modified version the DiT-Policy repository. For detailed description of the source codebase, please refer to The Ingredients for Robotic Diffusion Transformers

Installation

Our repository is easy to install using miniconda or anaconda:

conda env create -f env.yml
conda activate dexwildtrain
pip install git+https://github.com/AGI-Labs/robobuf.git
pip install git+https://github.com/facebookresearch/r3m.git
pip install -e ./
pre-commit install  # required for pushing back to the source git

# download the visual features

./download_features.sh

Next, set up the accelerate config.

accelerate config
# choose:
This machine
No distributed training
No to CPU only
No to torch dynamo
No to Deepspeed
all GPUs
Yes to numa efficiency
bf16 mixed precision

Training Policies for DexWild

To train policies, data must be in Robobuf format.

To launch training on a single dataset, simply use `launch_training.sh`

An example shell script is below.

accelerate launch finetune_accelerate.py --multirun exp_name=example_single_data_source batch_size=30 agent=transformer \
    agent.use_obs=add_token \
    agent.use_lang=false\
    task=umi_hand_fourcam_hybrid \
    task.train_buffer.hist_augment=false \
    task.train_buffer.img_hist_frame_indices="[0]" \
    task.train_buffer.state_hist_frame_indices="[0, 9, 18]" \
    agent/features=vit_base \
    task.obs_dim=9 \
    lr=0.0001 \
    ac_chunk=48 \
    trainer=bc_cos_sched \
    buffer_path=path_to_buffer \
    max_iterations=500000 \
    agent.features.restore_path=/home/$USER/dit-policy/visual_features/vit_base/SOUP_1M_DH.pth

To launch training on multiple datasets, simply use `launch_cotraining.sh`

An example shell script is below.

accelerate launch finetune_accelerate.py --multirun exp_name=example_multiply_data_source batch_size=30 agent=transformer_cotrain \
    agent.use_obs=add_token \
    agent.use_lang=false\
    task=cotrain_umi_hand_fourcam_hybrid \
    agent/features=vit_base \
    task.obs_dim=27 \
    lr=0.00015 \
    ac_chunk=48 \
    trainer=bc_cos_sched \
    buffer_paths_list="[human buffer path, robot buffer path]" \
    task.cotrain_weights="[0.6667,0.3333]" \
    max_iterations=500000 \
    agent.features.restore_path=/home/$USER/dit-policy/visual_features/vit_base/SOUP_1M_DH.pth

Explanation of parameters

In each case, the main parameters that must be adjusted are listed below

agent=transformer                        # Training Model, transformer or diffusion_unet
agent.use_obs=add_token \                # How observations are embedded (e.g., add_token for visual tokens)
agent.use_lang=false \                   # Whether to include language input (usually false for DexWild)
task=umi_hand_fourcam_hybrid \           # Task config defining modalities, frame history, and dataset structure
task.obs_dim=9 \                         # Dimensionality of the observation vector (e.g., proprio + hand states)
lr=0.0001 \                              # Learning rate for optimization
ac_chunk=48 \                            # Number of timesteps the policy predicts per inference (action chunk size)
buffer_path=path_to_buffer \            # Path to single Robobuf dataset

# If cotraining
agent=transformer_cotrain                # Training Model, transformer_cotrain or diffusion_unet_cotrain
buffer_paths_list="[human, robot]" \     # List of Robobuf dataset paths (e.g., [human, robot])
task.cotrain_weights="[0.6667,0.3333]"   # Weighting of each dataset for loss blending (should sum to 1.0)

If you want to change the inputs and outputs of the policies, this can be found under:

cd experiments/task

# yaml config files for cotraining
cotrain_dexwild_fourcam.yaml
cotrain_dexwild_twocam.yaml

# yaml config files for single data source
dexwild_fourcam.yaml
dexwild_twocam.yaml

Using Pre-Trained Features

You can easily download our pre-trained represenations using the provided script: ./download_features.sh. You may also download the features individually on our release website.

Citations

If you find this codebase or the diffusion transformer useful, please cite our works:

@article{tao2025dexwild,
      title={DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies},
      author={Tao, Tony and Srirama, Mohan Kumar and Liu, Jason Jingzhou and Shaw, Kenneth and Pathak, Deepak},
      journal={Robotics: Science and Systems (RSS)},
      year={2025}}

@article{dasari2024ditpi,
    title={The Ingredients for Robotic Diffusion Transformers},
    author = {Sudeep Dasari and Oier Mees and Sebastian Zhao and Mohan Kumar Srirama and Sergey Levine},
    journal = {arXiv preprint arXiv:2410.10088},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data4robotics		data4robotics
experiments		experiments
utils		utils
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
download_features.sh		download_features.sh
env.yml		env.yml
finetune.py		finetune.py
finetune_accelerate.py		finetune_accelerate.py
launch_cotraining.sh		launch_cotraining.sh
launch_training.sh		launch_training.sh
pretrained_networks_example.py		pretrained_networks_example.py
setup.py		setup.py
setup_on_cluster.sh		setup_on_cluster.sh
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DexWild: Policy Training Repository

Tony Tao^, Mohan Kumar Srirama^, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

Robotics: Science and Systems (RSS) 2025

Installation

Training Policies for DexWild

To launch training on a single dataset, simply use `launch_training.sh`

To launch training on multiple datasets, simply use `launch_cotraining.sh`

Explanation of parameters

Using Pre-Trained Features

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

dexwild/dexwild-training

Folders and files

Latest commit

History

Repository files navigation

DexWild: Policy Training Repository

Tony Tao*, Mohan Kumar Srirama*, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

Robotics: Science and Systems (RSS) 2025

Installation

Training Policies for DexWild

To launch training on a single dataset, simply use launch_training.sh

To launch training on multiple datasets, simply use launch_cotraining.sh

Explanation of parameters

Using Pre-Trained Features

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Tony Tao^, Mohan Kumar Srirama^, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

To launch training on a single dataset, simply use `launch_training.sh`

To launch training on multiple datasets, simply use `launch_cotraining.sh`

Packages