Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
[Website]
Authors:
Mingxi Jia*1, Haojie Huang*2, Zhewen Zhang+2, Chenghao Wang+2, Linfeng Zhao2, Dian Wang2, Jason Xinyu Liu1, Robin Walters2, Robert Platt^2, Stefanie Tellex^1
1 Brown University 2 Northeastern University
*+ Equal contribution ^ Equal advising
- create conda env
conda create -n GEM_code python==3.8.10 conda activate GEM_code - clone
git clone --recursive git@github.com:SaulBatman/GEM_code_code.git - install dependencies
cd GEM_code # export env variable export PYTHONPATH=/YOURPATH_TO_REPO:$PYTHONPATH export CLIPORT_ROOT=$(pwd) # tested on ubuntu20.04, cuda11.8, python3.8.10 # conda install pytorch 2.1 conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia # install other dependencies pip install -r requirements.txt # install GroudingDINO for baselines cd GroundingDINO pip install -e . cd .. - Set PYTHONPATH
export CLIPORT_ROOT=$(pwd) export PYTHONPATH=/YOURPATH/GEM_code:$PYTHONPATH
-
collect simulation dataset Here, we use block-in-bowl-unseen-color task as an example.
python cliport/demos.py data_root=/media/mingxi/home2/data/data/GEM_code_viewpoint n=50 task=separating-piles-seen-colors mode=train disp=True
-
run our method
python train.py train.task=put-block-in-bowl-unseen-colors train.agent=GEM dataset.topdown=False dataset.num_cameras=3 gem.model_name=unetl-score-vit-postLinearMul gem.pick_kernel_name=unetl gem.place_kernel_name=eunet train.save_freq_step=100 train.data_root=/media/mingxi/home2/data/data/gem wandb.enable=False train.n_demos=100 train.n_steps=20001 train.exp_folder=test dataset.cache=True dataset.dataset_dist=standard
-
run baseline (cliport) training and eval
python cliport/train_new.py train.task=stack-block-pyramid-seq-seen-colors train.agent=cliport train.n_demos=100 train.n_steps=200000 train.save_freq_step=10000 train.exp_folder=exps dataset.cache=False python cliport/eval.py eval_task=stack-block-pyramid-seq-seen-colors agent=cliport mode=val n_demos=100 train_demos=100 checkpoint_type=val_missing
- Collect data Please refer to our open-loop collector for real-world robot data collection.
- train
python train.py train.task=pick-part-in-box-real dataset.type=realtable dataset.dataset_dist=standard train.agent=GEM GEM_code.model_name=unetl-score-vit-postLinearMul GEM_code.pick_kernel_name=unetl GEM_code.place_kernel_name=eunet train.save_freq_step=1000 train.data_root=/home/mingxi/mingxi_ws/GEM_code/raw_data wandb.enable=False train.n_demos=1 train.n_steps=20000 train.exp_folder=exps dataset.cache=False
- deploy on real robot Please refer to our open-loop controller for real-world robot control.
The code structure is based on CLIPort.