Skip to content

mingxi-jia/GEM_code

Repository files navigation

Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping

[Website]
Authors:
Mingxi Jia*1, Haojie Huang*2, Zhewen Zhang+2, Chenghao Wang+2, Linfeng Zhao2, Dian Wang2, Jason Xinyu Liu1, Robin Walters2, Robert Platt^2, Stefanie Tellex^1

1 Brown University 2 Northeastern University

*+ Equal contribution ^ Equal advising

Installation

  1. create conda env
    conda create -n GEM_code python==3.8.10
    conda activate GEM_code
    
  2. clone
    git clone --recursive git@github.com:SaulBatman/GEM_code_code.git
    
  3. install dependencies
    cd GEM_code
    # export env variable
    export PYTHONPATH=/YOURPATH_TO_REPO:$PYTHONPATH
    export CLIPORT_ROOT=$(pwd)
    # tested on ubuntu20.04, cuda11.8, python3.8.10
    # conda install pytorch 2.1
    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
    # install other dependencies
    pip install -r requirements.txt
    # install GroudingDINO for baselines
    cd GroundingDINO
    pip install -e .
    cd ..
    
  4. Set PYTHONPATH
    export CLIPORT_ROOT=$(pwd)
    export PYTHONPATH=/YOURPATH/GEM_code:$PYTHONPATH
    

Experiments

  1. collect simulation dataset Here, we use block-in-bowl-unseen-color task as an example.

        python cliport/demos.py data_root=/media/mingxi/home2/data/data/GEM_code_viewpoint n=50 task=separating-piles-seen-colors mode=train disp=True
  2. run our method

    python train.py train.task=put-block-in-bowl-unseen-colors train.agent=GEM dataset.topdown=False dataset.num_cameras=3 gem.model_name=unetl-score-vit-postLinearMul gem.pick_kernel_name=unetl gem.place_kernel_name=eunet train.save_freq_step=100 train.data_root=/media/mingxi/home2/data/data/gem wandb.enable=False train.n_demos=100 train.n_steps=20001 train.exp_folder=test dataset.cache=True dataset.dataset_dist=standard
    
  3. run baseline (cliport) training and eval

    python cliport/train_new.py train.task=stack-block-pyramid-seq-seen-colors train.agent=cliport train.n_demos=100  train.n_steps=200000 train.save_freq_step=10000 train.exp_folder=exps  dataset.cache=False 
    
    python cliport/eval.py eval_task=stack-block-pyramid-seq-seen-colors agent=cliport mode=val n_demos=100 train_demos=100 checkpoint_type=val_missing 
    

Real-world experiments

  1. Collect data Please refer to our open-loop collector for real-world robot data collection.
  2. train
    python train.py train.task=pick-part-in-box-real dataset.type=realtable dataset.dataset_dist=standard train.agent=GEM GEM_code.model_name=unetl-score-vit-postLinearMul GEM_code.pick_kernel_name=unetl GEM_code.place_kernel_name=eunet train.save_freq_step=1000 train.data_root=/home/mingxi/mingxi_ws/GEM_code/raw_data wandb.enable=False train.n_demos=1 train.n_steps=20000 train.exp_folder=exps dataset.cache=False
  3. deploy on real robot Please refer to our open-loop controller for real-world robot control.

AcknowledGEM_codeent

The code structure is based on CLIPort.

About

Offician code for paper "Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping" (RAL 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages