Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

A scalable automated generation method for real-world robotic manipulation.

Overview

Manipulate-Anything is a scalable automated generation method for real-world robotic manipulation. Unlike prior work, this method operates in real-world environments without privileged state information or hand-designed skills, enabling manipulation of any static object.

Authors

Jiafei Duan*, Wentao Yuan*, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

Environment Setup

To set up the Manipulate-Anything environment, you will need four repositories, including this one.

1. Create Conda Environment

conda env create -n manip_any python=3.11
conda install cuda -c nvidia/label/cuda-11.7.0
conda activate manip_any

Setup and install Manipulate-Anything-QWenVL Go into the QWen-VL-MA and follow the steps.
Install PyRep PyRep requires version 4.1 of CoppeliaSim. Download:

Once you have downloaded CoppeliaSim, you can pull PyRep from git:

cd <install_dir>
git clone https://github.com/stepjam/PyRep.git
cd PyRep

Add the following to your ~/.bashrc file: (NOTE: the 'EDIT ME' in the first line)

export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT

Remember to source your bashrc (source ~/.bashrc) or zshrc (source ~/.zshrc) after this.

Warning: CoppeliaSim might cause conflicts with ROS workspaces.

Install YARR Manipulate-Anything uses my YARR fork.

cd <install_dir>
git clone -b peract https://github.com/MohitShridhar/YARR.git # note: 'peract' branch

cd YARR
pip install -r requirements.txt
python setup.py develop

Install current repo

pip install pointnet2_ops/
cd pointnet2_ops
pip install -r requirements.txt
pip install .

git clone https://github.com/Robot-MA/manipulate-anything.git
cd RLBench
pip install -r requirements.txt
python setup.py develop

Data Generation

Download checkpoint.
Setup GPT4V API-key.

meshcat-server

Run meshcat server.

export OPENAI_API_KEY="your_api_key_here"

Zero-shot data generation. Example task (play_jenga):

python dataset_generator.py \
    eval.checkpoint=<PATH_TO_M2T2_CHECKPOINT> \
    eval.mask_thresh=0.0 \
    eval.retract=0.20 \
    rlbench.task_name=<TASK_NAME>

Open http://127.0.0.1:7000/static to see the visualization. Press enter in terminal to see the next pose generated.

What it should look like if everything has been setup correctly.

Evaluation

To reproduce the tasks from the paper, please swap out the .py and .ttm tasks folder from your RLBench task env with eval_tasks.

TODO List

Future improvement

Include multi-process functionality for searching the best MA plans.
Set up interactive mode on Gradio.
Include policy training code

Citation

If you find Manipulate-Anything useful for your research and applications, please consider citing our paper:

@article{duan2024manipulate,
  title={Manipulate-anything: Automating real-world robots using vision-language models},
  author={Duan, Jiafei and Yuan, Wentao and Pumacay, Wilbert and Wang, Yi Ru and Ehsani, Kiana and Fox, Dieter and Krishna, Ranjay},
  journal={arXiv preprint arXiv:2406.18915},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Prompts		Prompts
assets/panda		assets/panda
best_plans_MA		best_plans_MA
eval_tasks		eval_tasks
examples		examples
generated_data_output		generated_data_output
m2t2		m2t2
pointnet2_ops		pointnet2_ops
prompts_MA		prompts_MA
readme_files		readme_files
rlbench		rlbench
save_frames		save_frames
save_frames_left		save_frames_left
save_frames_overhead		save_frames_overhead
save_frames_right		save_frames_right
save_frames_wrist		save_frames_wrist
tests		tests
tools		tools
tutorials		tutorials
.gitignore		.gitignore
1.png		1.png
LICENSE		LICENSE
MA_utils.py		MA_utils.py
README.md		README.md
SAM_test.py		SAM_test.py
action_code.txt		action_code.txt
action_info.txt		action_info.txt
combined_all_image.py		combined_all_image.py
combined_image.jpg		combined_image.jpg
combined_image_all.png		combined_image_all.png
config.yaml		config.yaml
config_new.yaml		config_new.yaml
dataset_generator.py		dataset_generator.py
demo_rlbench.py		demo_rlbench.py
frame2video.py		frame2video.py
log.txt		log.txt
m2t2_agent.py		m2t2_agent.py
output_image.jpg		output_image.jpg
overview.gif		overview.gif
requirements.txt		requirements.txt
requirements_new.txt		requirements_new.txt
save1.jpg		save1.jpg
save2.jpg		save2.jpg
save3.jpg		save3.jpg
save4.jpg		save4.jpg
saved_image.png		saved_image.png
segment.py		segment.py
setup.py		setup.py
simplescreenrecorder-2024-08-26_00.18.10-ezgif.com-video-to-gif-converter.gif		simplescreenrecorder-2024-08-26_00.18.10-ezgif.com-video-to-gif-converter.gif
single_coordinate.txt		single_coordinate.txt
skill_library.py		skill_library.py
travisci_generate_index.py		travisci_generate_index.py
travisci_run_tests.py		travisci_run_tests.py
update_m2t2.py		update_m2t2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Overview

Authors

Table of Contents

Environment Setup

1. Create Conda Environment

Data Generation

What it should look like if everything has been setup correctly.

Evaluation

TODO List

Future improvement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Overview

Authors

Table of Contents

Environment Setup

1. Create Conda Environment

Data Generation

What it should look like if everything has been setup correctly.

Evaluation

TODO List

Future improvement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages