3D Equivariant Visuomotor Policy Learning via Spherical Projection

Boce Hu¹, Dian Wang², David Klee¹, Heng Tian¹, Xupeng Zhu¹, Haojie Huang¹,
Robert Platt^†1, Robin Walters^†1

¹Northeastern University, ²Stanford University

NeurIPS 2025 (Spotlight)

Installation

Install the following apt packages for mujoco:

sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf

Install gfortran (dependency for escnn)
```
sudo apt install -y gfortran
```
Install Mambaforge

Clone this repo

git clone https://github.com/BoceHu/ISP.git
cd ISP

Install environment:
```
mamba env create -f conda_environment.yaml
conda activate isp
```
If you are using a Blackwell-architecture GPU (e.g., RTX 5090), please follow the steps below:
```
mamba env create -f env_blackwell.yaml
conda activate isp
pip install --no-build-isolation "git+https://github.com/facebookresearch/pytorch3d.git"
```
This will create the appropriate environment and build PyTorch3D from source, which is required for Blackwell GPUs.

Install mimicgen:

cd ..
git clone https://github.com/NVlabs/mimicgen_environments.git
cd mimicgen_environments
git checkout 45db4b35a5a79e82ca8a70ce1321f855498ca82c
pip install -e .
cd ../ISP

Make sure mujoco version is 2.3.2 (required by mimicgen)
```
pip list | grep mujoco
```

Dataset

Download Dataset

Please visit the link below to download the datasets.

https://huggingface.co/datasets/amandlek/mimicgen_datasets/tree/main/core

Make sure the dataset is kept under /path/to/ISP/data/robomimic/datasets/[dataset]/[dataset].hdf5

Generating a larger FOV observation

# Template
python isp/scripts/dataset_states_to_obs.py --input data/robomimic/datasets/[dataset]/[dataset].hdf5 --output data/robomimic/datasets/[dataset]/[dataset]_fisheye.hdf5 --num_workers=[n_worker]

# Replace [dataset] and [n_worker] with your choices.
# E.g., use 24 workers to generate point cloud and voxel observation for stack_d1

python isp/scripts/dataset_states_to_obs.py --input data/robomimic/datasets/stack_d1/stack_d1.hdf5 --output data/robomimic/datasets/stack_d1/stack_d1_fisheye.hdf5 --num_workers=16

Convert Action Space in Dataset

The downloaded dataset has a relative action space. To train with an absolute action space, the dataset needs to be converted accordingly

# Template
python isp/scripts/robomimic_dataset_conversion.py -i data/robomimic/datasets/[dataset]/[dataset]_fisheye.hdf5 -o data/robomimic/datasets/[dataset]/[dataset]_fisheye_abs.hdf5 -n [n_worker]

# Replace [dataset] and [n_worker] with your choices.
# E.g., convert stack_d1_fisheye with 16 workers

python isp/scripts/robomimic_dataset_conversion.py -i data/robomimic/datasets/stack_d1/stack_d1_fisheye.hdf5 -o data/robomimic/datasets/stack_d1/stack_d1_fisheye_abs.hdf5 -n 16

put the processed dataset under ISP/data/robomimic/datasets/[task_name]/

e.g. ISP/data/robomimic/datasets/stack_d1/stack_d1_fisheye_abs.hdf5

Training

To train our method on Stack D1 task:

SO(2) version

python train.py --config-name=train_isp_so2 task_name=stack_d1 n_demo=100

SO(3) version

python train.py --config-name=train_isp_so3 task_name=stack_d1 n_demo=100

SO(2) pretrained encoder

python train.py --config-name=train_isp_so2_pretrain task_name=stack_d1 n_demo=100

We recommend starting with the SO(2) version, as it is faster and more GPU-friendly.

To train on other tasks, replace stack_d1 with one of the following: stack_three_d1, square_d2, threading_d0, coffee_d2, three_piece_assembly_d0, hammer_cleanup_d1, mug_cleanup_d1, kitchen_d1, nut_assembly_d0, pick_place_d0, coffee_preparation_d1. Please ensure that the corresponding dataset has been downloaded in advance.

To run environments on CPU (to save GPU memory), use osmesa instead of egl through MUJOCO_GL=osmesa PYOPENGL_PLATTFORM=osmesa, e.g.,

MUJOCO_GL=osmesa PYOPENGL_PLATTFORM=osmesa python train.py --config-name=train_isp_so3 task_name=stack_d1

Note that this will take longer to roll out the policy.

For inference, the SO(3) model itself requires approximately 16 GB of GPU memory with the default batch size of 64, while the SO(2) variant requires around 10 GB under the same settings.

If you want to assign both the environments and the policy to a specific GPU, you can do the following:

EGL_DEVICE_ID=1 MUJOCO_EGL_DEVICE_ID=1 HYDRA_FULL_ERROR=1 python train.py --config-name=train_isp_so2_pretrain task_
name=stack_d1 n_demo=100 training.device=1

EGL_DEVICE_ID and MUJOCO_EGL_DEVICE_ID control which GPU is used for MuJoCo rendering, while training.device specifies the GPU used for policy training.

📜 Citation

@article{hu20253d,
  title   = {3D Equivariant Visuomotor Policy Learning via Spherical Projection},
  author  = {Hu, Boce and
             Wang, Dian and
             Klee, David and
             Tian, Heng and
             Zhu, Xupeng and
             Huang, Haojie and
             Platt, Robert and
             Walters, Robin},
  journal = {arXiv preprint arXiv:2505.16969},
  year    = {2025}
}

License

This project is released under the Academic Research and Educational Use License. The software is intended for academic research and educational purposes only. Commercial use is strictly prohibited without prior written permission from the authors. Please see the LICENSE file for the full license text and detailed terms.

Acknowledgement

Our repo is built upon the original Equivariant Diffusion Policy.
Our Diffusion Policy baseline is adapted from the codebase of Diffusion Policy.
Our ACT baseline is adapted from its original repo.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
isp		isp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_environment.yaml		conda_environment.yaml
env_blackwell.yaml		env_blackwell.yaml
eval.py		eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Equivariant Visuomotor Policy Learning via Spherical Projection

Installation

Dataset

Download Dataset

Generating a larger FOV observation

Convert Action Space in Dataset

Training

SO(2) version

SO(3) version

SO(2) pretrained encoder

📜 Citation

License

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

3D Equivariant Visuomotor Policy Learning via Spherical Projection

Installation

Dataset

Download Dataset

Generating a larger FOV observation

Convert Action Space in Dataset

Training

SO(2) version

SO(3) version

SO(2) pretrained encoder

📜 Citation

License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages