Training

To train the latent action autoencoder (suppose you train on 1 node with 8 GPUs):

cd lam
train.sh

Note

The checkpoint of our latent action autoencoder can be found at Hugging Face.

Download the pretrained Stable Video Diffusion checkpoint svd.safetensors from Hugging Face.
Reset default_ckpt in worldmodel/train.py with the path of svd.safetensors.
Reset ckpt_path in worldmodel/configs/training/adaworld.yaml with the last checkpoint path of the latent action autoencoder.

To pretrain the autoregressive world model (suppose you train on 1 node with 8 GPUs):

cd worldmodel
run_train.sh

After training:

Convert the DeepSpeed checkpoints (depends on how many GPUs you have) to pytorch_model.bin using zero_to_fp32.py.
Convert pytorch_model.bin to safetensors format using worldmodel/bin_to_st.py and do inference.

Note

The pretrained AdaWorld can be found at Hugging Face.

Tip

Remember to modify num_nodes and devices in lam/config/lam.yaml accordingly if you have a different GPU setup.

Remember to set max_epochs or stop the training when you think the training is long enough.

<= Previous: [Installation]

Provide feedback