Skip to content

Robot-MA/manipulate-anything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

A scalable automated generation method for real-world robotic manipulation.

Project Page | Data | Paper


Overview

Manipulate-Anything is a scalable automated generation method for real-world robotic manipulation. Unlike prior work, this method operates in real-world environments without privileged state information or hand-designed skills, enabling manipulation of any static object.

Overview

Authors

Jiafei Duan*, Wentao Yuan*, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

Table of Contents


Environment Setup

To set up the Manipulate-Anything environment, you will need four repositories, including this one.

1. Create Conda Environment

conda env create -n manip_any python=3.11
conda install cuda -c nvidia/label/cuda-11.7.0
conda activate manip_any
  1. Setup and install Manipulate-Anything-QWenVL Go into the QWen-VL-MA and follow the steps.

  2. Install PyRep PyRep requires version 4.1 of CoppeliaSim. Download:

Once you have downloaded CoppeliaSim, you can pull PyRep from git:

cd <install_dir>
git clone https://github.com/stepjam/PyRep.git
cd PyRep

Add the following to your ~/.bashrc file: (NOTE: the 'EDIT ME' in the first line)

export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT

Remember to source your bashrc (source ~/.bashrc) or zshrc (source ~/.zshrc) after this.

Warning: CoppeliaSim might cause conflicts with ROS workspaces.

  1. Install YARR Manipulate-Anything uses my YARR fork.
cd <install_dir>
git clone -b peract https://github.com/MohitShridhar/YARR.git # note: 'peract' branch

cd YARR
pip install -r requirements.txt
python setup.py develop
  1. Install current repo
pip install pointnet2_ops/
cd pointnet2_ops
pip install -r requirements.txt
pip install .
git clone https://github.com/Robot-MA/manipulate-anything.git
cd RLBench
pip install -r requirements.txt
python setup.py develop

Data Generation

  1. Download checkpoint.
  2. Setup GPT4V API-key.
meshcat-server
  1. Run meshcat server.
export OPENAI_API_KEY="your_api_key_here"
  1. Zero-shot data generation. Example task (play_jenga):
python dataset_generator.py \
    eval.checkpoint=<PATH_TO_M2T2_CHECKPOINT> \
    eval.mask_thresh=0.0 \
    eval.retract=0.20 \
    rlbench.task_name=<TASK_NAME>
  1. Open http://127.0.0.1:7000/static to see the visualization. Press enter in terminal to see the next pose generated.

What it should look like if everything has been setup correctly.

Overview

Evaluation

To reproduce the tasks from the paper, please swap out the .py and .ttm tasks folder from your RLBench task env with eval_tasks.

TODO List

Future improvement

  • Include multi-process functionality for searching the best MA plans.
  • Set up interactive mode on Gradio.
  • Include policy training code

Citation

If you find Manipulate-Anything useful for your research and applications, please consider citing our paper:

@article{duan2024manipulate,
  title={Manipulate-anything: Automating real-world robots using vision-language models},
  author={Duan, Jiafei and Yuan, Wentao and Pumacay, Wilbert and Wang, Yi Ru and Ehsani, Kiana and Fox, Dieter and Krishna, Ranjay},
  journal={arXiv preprint arXiv:2406.18915},
  year={2024}
}

About

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models [CoRL 2024]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors