This repository provides the official implementation of our NeurIPS 2024 paper:
ActFusion: A Unified Diffusion Model for Action Segmentation and Anticipation
Dayoung Gong, Suha Kwak, and Minsu Cho
NeurIPS, Vancouver, 2024
Recommended Environment
- Python 3.8.20
- CUDA 11.7
- PyTorch 1.13.0+cu117
Install dependencies
pip install -r requirements.txtDownload the preprocessed dataset from this link (borrowed from MS-TCN).
Create a directory structure as below, and place the datasets inside the datasets/ folder:
project-root/
├── ckpt/ # pretrained model checkpoints
│ ├── breakfast/
│ └── 50salads/
├── configs/ # auto-generated JSON config files
│ ├── Breakfast.json
│ └── 50salads.json
├── datasets/ # downloaded datasets
│ ├── breakfast/
│ └── 50salads/
├── result/ # experiment outputs will be saved here
├── src/ # source code
│ ├── model/
│ │ ├── actfusion.py
│ │ ├── backbone.py
│ │ ├── attn.py
│ │ └── __init__.py
│ ├── dataset.py
│ ├── default_configs.py
│ ├── trainer.py
│ ├── utils.py
│ ├── vis.py
│ └── __init__.py
├── main.py
├── LICENSE
└── README.md
Generate config files by running:
python default_configs.pyThen start training with:
python main.py --config configs/Breakfast.json --result_dir $result_dir --split $split_num- Download pretrained checkpoints from this link
- Place the downloaded folders inside the
ckpt/directory - Run evaluation:
python main.py --config configs/Breakfast.json --result_dir $result_dir --split $split_num --test --ckptThis repository builds upon the DiffAct codebase. We thank the original authors for sharing their work.
If you find our code or paper helpful, please consider citing both ActFusion and DiffAct:
@article{gong2024actfusion,
title={ActFusion: A Unified Diffusion Model for Action Segmentation and Anticipation},
author={Gong, Dayoung and Kwak, Suha and Cho, Minsu},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={89913--89942},
year={2024}
}
@inproceedings{liu2023diffusion,
title={Diffusion Action Segmentation},
author={Liu, Daochang and Li, Qiyue and Dinh, Anh-Dung and Jiang, Tingting and Shah, Mubarak and Xu, Chang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}This project is licensed under the MIT License.
