GitHub - svg-project/Sparse-VideoGen: [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Accelerate Video Generation with High Pixel-level Fidelity

🔥News🔥

[2025/09] We release Flash k-Means, a batched K-Means clustering algorithm implemented with Triton that offers >10x speedup!
[2025/09] Sparse VideoGen2 is open-sourced! HunyuanVideo, Wan 2.1 and Cosmos can be accelerated by 2×
[2025/09] Sparse VideoGen2 is accepted by NeurIPS 2025 as a spotlight!
[2025/05] Sparse VideoGen is accepted by ICML 2025!
[2025/04] Wan 2.1 is supported! Both T2V and I2V are accelerated.
[2025/03] Sparse VideoGen is open-sourced! HunyuanVideo and CogVideoX v1.5 can be accelerated by 2×

📚 About

Sparse VideoGen 1 & 2 are training-free frameworks that leverage inherent sparsity in the 3D Full Attention operations to accelerate video generation.

Sparse VideoGen 1's core contributions:

Identifying the spatial and temporal sparsity patterns in video diffusion models.
Proposing an Online Profiling Strategy to dynamically identify these patterns.
Implementing an end-to-end generation framework through efficient algorithm-system co-design, with hardware-efficient layout transformation and customized kernels.

Sparse VideoGen 2's core contributions:

Tackles inaccurate token identification and computation waste in video diffusion.
Introduces semantic-aware sparse attention with efficient token permutation.
Provides an end-to-end system design with a dynamic attention kernel and flash k-means kernel.

🎥 Demo of SVG1

🎥 Demo of SVG2

Comp_A.mp4

Comp_F.mp4

Comp_L.mp4

🛠️ Installation

Begin by cloning the repository:

GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/svg-project/Sparse-VideoGen.git # Do not clone the demo, otherwise is too large
cd Sparse-VideoGen

We recommend using CUDA versions 12.4 / 12.8 + PyTorch versions 2.5.1 / 2.6.0

# 1. Create and activate conda environment
conda create -n SVG python==3.12.9 # or 3.11.9 if have error when installing kernels
conda activate SVG

# 2. Install uv, then install other packages
pip install uv
uv pip install -e .

pip install flash-attn --no-build-isolation

# 4. Install customized kernels. (You might need to upgrade your cmake and CUDA version.)
pip install -U setuptools # Require at least version 77.0.0
git submodule update --init --recursive
cd svg/kernels
pip install -U cmake
bash setup.sh

# 5. Install FlashInfer (standard) and cuVS
cd 3rdparty/flashinfer
pip install --no-build-isolation --verbose --editable .
pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com

# Optional: If the FlashInfer monkey patch fails in your environment,
# install the manually patched FlashInfer (block sparse with varied block sizes).
cd 3rdparty/flashinfer
cp ../../../../assets/patches/modifications.patch ./
git apply modifications.patch
pip install --no-build-isolation --verbose --editable . # Block Sparse Attention with varied block sizes

You don’t need to install flash-kmeans separately. A copy of flash-kmeans is included in Sparse VideoGen and is used by default.

🚀 Inference Examples

Wan 2.1

We support Text-to-Video and Image-to-Video inference of Wan 2.1 model. The running scripts are:

# Text-to-Video
# bash scripts/wan/wan_t2v_720p_svg.sh # SVG
bash scripts/wan/wan_t2v_720p_sap.sh # SVG2

# Image-to-Video
# bash scripts/wan/wan_i2v_720p_svg.sh # SVG
bash scripts/wan/wan_i2v_720p_sap.sh # SVG2

HunyuanVideo

The running scripts are:

# bash scripts/hyvideo/hyvideo_t2v_720p_svg.sh # SVG
bash scripts/hyvideo/hyvideo_t2v_720p_sap.sh # SVG2

📑 Open-source Plan

Support FP8 attention
Support Wan 2.1
Support Cosmos

Efficiency Benchmark

Customized Kernels Performance

We evaluate the performance of our customized kernels against the baseline implementations. The following tables show the memory bandwidth (GB/s) comparison for different batch sizes and hidden dimensions:

RMSNorm Performance

Batch Size	Hidden Dim	Diffusers (GB/s)	SVG Customized (GB/s)	Speedup
2,097,152	32	151.36	809.69	5.35×
1,048,576	64	196.54	810.61	4.12×
524,288	128	232.66	810.21	3.48×
262,144	256	252.67	810.41	3.21×

LayerNorm Performance

Batch Size	Hidden Dim	Diffusers (GB/s)	SVG Customized (GB/s)	Speedup
2,097,152	32	45.82	808.28	17.64×
1,048,576	64	91.18	805.22	8.83×
524,288	128	197.89	804.29	4.06×
262,144	256	350.87	804.43	2.29×

Our customized kernels achieve significantly higher memory bandwidth across all configurations, with speedups ranging from 2.29× to 17.64×. The performance improvement is particularly notable for smaller hidden dimensions and larger batch sizes.

RoPE (Rotary Position Embedding) Performance

Batch Size	Num Heads	Seq Length	Head Dim	Diffusers (GB/s)	SVG Customized (GB/s)	Speedup
1	32	1024	64	17.25	158.81	9.21×
1	32	4096	64	27.74	405.75	14.63×
1	32	16384	64	30.86	605.89	19.63×
4	32	1024	64	27.60	475.94	17.24×
4	32	4096	64	30.93	614.11	19.85×
4	32	16384	64	32.41	648.36	20.00×

The RoPE implementation in SVG shows substantial performance improvements over the Diffusers baseline, with speedups ranging from 9.21× to 20.00×. The performance gain is particularly significant for longer sequence lengths and larger batch sizes, demonstrating excellent scaling characteristics.

🔗 BibTeX

If you find Sparse VideoGen useful for your research and applications or interesting, please cite our work using BibTeX:

@article{xi2025sparse,
  title={Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity},
  author={Xi, Haocheng and Yang, Shuo and Zhao, Yilong and Xu, Chenfeng and Li, Muyang and Li, Xiuyu and Lin, Yujun and Cai, Han and Zhang, Jintao and Li, Dacheng and others},
  journal={arXiv preprint arXiv:2502.01776},
  year={2025}
}

@article{yang2025sparse,
  title={Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation},
  author={Yang, Shuo and Xi, Haocheng and Zhao, Yilong and Li, Muyang and Zhang, Jintao and Cai, Han and Lin, Yujun and Li, Xiuyu and Xu, Chenfeng and Peng, Kelly and others},
  journal={arXiv preprint arXiv:2505.18875},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
examples		examples
scripts		scripts
svg		svg
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
cog_inference.py		cog_inference.py
cosmos_t2v_inference.py		cosmos_t2v_inference.py
dataloader.py		dataloader.py
hyvideo_i2v_inference.py		hyvideo_i2v_inference.py
hyvideo_t2v_inference.py		hyvideo_t2v_inference.py
orig_hyvideo_inference.py		orig_hyvideo_inference.py
pyproject.toml		pyproject.toml
wan_i2v_inference.py		wan_i2v_inference.py
wan_t2v_inference.py		wan_t2v_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerate Video Generation with High Pixel-level Fidelity

🔥News🔥

📚 About

🎥 Demo of SVG1

🎥 Demo of SVG2

🛠️ Installation

🚀 Inference Examples

Wan 2.1

HunyuanVideo

📑 Open-source Plan

Efficiency Benchmark

Customized Kernels Performance

RMSNorm Performance

LayerNorm Performance

RoPE (Rotary Position Embedding) Performance

🔗 BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Accelerate Video Generation with High Pixel-level Fidelity

🔥News🔥

📚 About

🎥 Demo of SVG1

🎥 Demo of SVG2

🛠️ Installation

🚀 Inference Examples

Wan 2.1

HunyuanVideo

📑 Open-source Plan

Efficiency Benchmark

Customized Kernels Performance

RMSNorm Performance

LayerNorm Performance

RoPE (Rotary Position Embedding) Performance

🔗 BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages