Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 1.63 KB

File metadata and controls

42 lines (30 loc) · 1.63 KB

Training

Latent Action Autoencoder

To train the latent action autoencoder (suppose you train on 1 node with 8 GPUs):

cd lam
train.sh

Note

The checkpoint of our latent action autoencoder can be found at Hugging Face.

World Model Pretraining

  1. Download the pretrained Stable Video Diffusion checkpoint svd.safetensors from Hugging Face.
  2. Reset default_ckpt in worldmodel/train.py with the path of svd.safetensors.
  3. Reset ckpt_path in worldmodel/configs/training/adaworld.yaml with the last checkpoint path of the latent action autoencoder.

To pretrain the autoregressive world model (suppose you train on 1 node with 8 GPUs):

cd worldmodel
run_train.sh

After training:

  1. Convert the DeepSpeed checkpoints (depends on how many GPUs you have) to pytorch_model.bin using zero_to_fp32.py.
  2. Convert pytorch_model.bin to safetensors format using worldmodel/bin_to_st.py and do inference.

Note

The pretrained AdaWorld can be found at Hugging Face.

Tip

Remember to modify num_nodes and devices in lam/config/lam.yaml accordingly if you have a different GPU setup.

Remember to set max_epochs or stop the training when you think the training is long enough.


<= Previous: [Installation]

=> Next: [Action Transfer]