🔎 Results

QuantCache：Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation (ICCV 2025)

Junyi Wu, Zhiteng Li, Zheng Hui, Yulun Zhang, Linghe Kong and Xiaokang Yang

🔥🔥🔥 News

2025-03-09: This repo is released.
2025-06-26: Congrats! Our QuantCache has been accepted by ICCV 2025. 😊

Abstract: Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance. However, the enhanced capabilities of DiTs come with significant drawbacks, including increased computational and memory costs, which hinder their deployment on resource-constrained devices. Current acceleration techniques, such as quantization and cache mechanism, offer limited speedup and are often applied in isolation, failing to fully address the complexities of DiT architectures. In this paper, we propose QuantCache, a novel training-free inference acceleration framework that jointly optimizes hierarchical latent caching, adaptive importance-guided quantization, and structural redundancy-aware pruning. QuantCache achieves an end-to-end latency speedup of 6.72× on Open-Sora with minimal loss in generation quality. Extensive experiments across multiple video generation benchmarks demonstrate the effectiveness of our method, setting a new standard for efficient DiT inference. The code and models will be available at https://github.com/JunyiWuCode/QuantCache.

⚒️ TODO

Complete this repository

🔎 Results

We achieved an end-to-end latency speedup of 6.72× with negligible quality degradation, compared with Open-Sora 1.2.

Detailed results can be found in the paper.

Quantitative Comparisons (click to expand)

Performance comparison of various methods on VBench, Table 1 from the main paper.

Performance comparison of various methods on CLIP and Dover, Table 2 from the main paper.

Visual Comparisons (click to expand)

FP16	QuantCache W8A8	QuantCache W4A6

More Comparisons across Different Methods (click to expand)

Citation

If you find the code helpful in your research or work, please cite the following paper.

@article{wu2025quantcache,
  title={QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation},
  author={Wu, Junyi and Li, Zhiteng and Hui, Zheng and Zhang, Yulun and Kong, Linghe and Yang, Xiaokang},
  journal={arXiv preprint arXiv:2503.06545},
  year={2025}
}

💡 Acknowledgements

This work is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QuantCache：Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation (ICCV 2025)

🔥🔥🔥 News

⚒️ TODO

🔗 Contents

🔎 Results

Citation

💡 Acknowledgements

About

Uh oh!

Releases 1

Packages

Contributors 3

JunyiWuCode/QuantCache

Folders and files

Latest commit

History

Repository files navigation

QuantCache：Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation (ICCV 2025)

🔥🔥🔥 News

⚒️ TODO

🔗 Contents

🔎 Results

Citation

💡 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Packages