LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

Xie, Rui; Zhao, Tianchen; Yuan, Zhihang; Wan, Rui; Gao, Wenxi; Zhu, Zhenhua; Ning, Xuefei; Wang, Yu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.17178 (cs)

[Submitted on 26 Nov 2024]

Title:LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

Authors:Rui Xie, Tianchen Zhao, Zhihang Yuan, Rui Wan, Wenxi Gao, Zhenhua Zhu, Xuefei Ning, Yu Wang

View PDF HTML (experimental)

Abstract:Visual Autoregressive (VAR) has emerged as a promising approach in image generation, offering competitive potential and performance comparable to diffusion-based models. However, current AR-based visual generation models require substantial computational resources, limiting their applicability on resource-constrained devices. To address this issue, we conducted analysis and identified significant redundancy in three dimensions of the VAR model: (1) the attention map, (2) the attention outputs when using classifier free guidance, and (3) the data precision. Correspondingly, we proposed efficient attention mechanism and low-bit quantization method to enhance the efficiency of VAR models while maintaining performance. With negligible performance lost (less than 0.056 FID increase), we could achieve 85.2% reduction in attention computation, 50% reduction in overall memory and 1.5x latency reduction. To ensure deployment feasibility, we developed efficient training-free compression techniques and analyze the deployment feasibility and efficiency gain of each technique.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.17178 [cs.CV]
	(or arXiv:2411.17178v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.17178

Submission history

From: Rui Xie [view email]
[v1] Tue, 26 Nov 2024 07:32:36 UTC (15,973 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators