Robust Imitation of a Few Demonstrations with a Backwards Model

This repository implements code for the paper Robust Imitation of a Few Demonstrations with a Backwards Model.

Requirements

Python 3.7+ (used 3.9)
MuJoCo 2.1.0 (see https://github.com/openai/mujoco-py for installation instructions)
See requirements.txt

Installation

pip install -r requirements.txt
pip install -e .

The requirements include additional libraries such as mujoco-maze and d4rl for the Maze and Adroit environments respectively. If the packages do not install, clone the repositories directly and run pip install <path to repo>.

mujoco-maze is forked from https://github.com/kngwyu/mujoco-maze and edited to include the custom environments and their goal-oriented versions. See its README.md for more details.

We use Weights and Biases to track experiment results and Hydra for configuration management.

Expert Policies/Demonstrations

Fetch

For the Fetch environments, we use the RL Baselines3 Zoo repository . We use the same hyperparameter settings as in the repo and train HER+TQC until convergence on gym==0.21.0.

Some small modifications need to be made to the rl-baselines3-zoo repo. Copy scripts scripts/fetch/gen_demos.py and scripts/fetch/gen_demos.sh to the top-level folder containing the rl-baselines3-zoo code. Place custom environments for Push-v2 and PickAndPlace-v2 inside the rl-baselines3-zoo repo and modify the hyperparameter config file (see the rl-baselines3-zoo README for more details). Then one can train expert policies using the train.py script. To generate demonstrations, run save_demos.sh inside the rl-baselines3-zoo folder.

Maze

For the Maze environments, we use goal-oriented SAC [1], where the goal is given as part of the observation. We use the GoalEnv interface provided in gym.

To train the expert policy, run the provided script:

bash scripts/maze/sac_expert.sh

Change the variable env_id to the desired environment ID. The prefix 'Goal' is used to indicate that the environment is goal-oriented. The specific environment IDs are:

GoalPointRegionUMaze-v2
GoalPointRoom5x11-v1
GoalPointCorridor7x7-v2
GoalAntRegionUMaze-v2
GoalAntRoom5x11-v1
GoalAntCorridor7x7-v2

To generate demonstrations, see files scripts/maze/gen_demos.py and scripts/maze/gen_demos.sh.

Adroit

For the Adroit environment, we use the pre-trained policy given in DAPG [2] policy_checkpoint. To generate demonstrations, see scripts/adroit/gen_demos.py and scripts/adroit/gen_demos.sh.

Experiments

Fetch

To train and evaluate BMIL, run the following command:

python experiments/bmil.py +experiments=bmil/fetch_pick

A run script is provided in scripts/fetch/run.sh. Change the TASK and METHOD variables accordingly.

Maze

To train and evaluate BMIL, run the following command:

python experiments/bmil.py +experiments=bmil/maze_point5x11

A run script is provided in scripts/maze/run.sh. Change the TASK and METHOD variables accordingly.

Adroit

To train and evaluate BMIL, run the following command:

python experiments/bmil.py +experiments=bmil/adroit_relocate

A run script is provided in scripts/adroit/run.sh. Change the METHOD variable accordingly.

References

[1] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018.

[2] Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations, Rajeswaran et al, 2018.

Citation

If you find this work useful, please cite the paper as follows:

@inproceedings{park2022bmil,
  title={Robust Imitation of a Few Demonstrations with a Backwards Model},
  author={Park, Jung Yeon and Wong, Lawson L.S.},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022},
  url={https://arxiv.org/abs/2210.09337}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
conf		conf
experiments		experiments
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Imitation of a Few Demonstrations with a Backwards Model

Requirements

Installation

Expert Policies/Demonstrations

Fetch

Maze

Adroit

Experiments

Fetch

Maze

Adroit

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jypark0/bmil

Folders and files

Latest commit

History

Repository files navigation

Robust Imitation of a Few Demonstrations with a Backwards Model

Requirements

Installation

Expert Policies/Demonstrations

Fetch

Maze

Adroit

Experiments

Fetch

Maze

Adroit

References

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages