Skip to content

deep-optimization/SliderQuant

Repository files navigation

SliderQuant: Accurate Post-Training Quantization for LLMs

Project Page arXiv OpenReview Hugging Face

By Shigeng Wang, Chao Li, Yangyuxuan Kang, Jiawei Fan, Zhonghong Ou and Anbang Yao.

This repository is the official PyTorch implementation of "SliderQuant: Accurate Post-Training Quantization for LLMs", accepted to ICLR 2026.

SliderQuant overview

SliderQuant (Sliding-layer Quantization) is a new learnable post-training quantization framework for LLMs, which consists of two key components:

  • Inter-layer sliding quantization couples three types of sliding window designs to address the varying quantization sensitivity of shallow, intermediate and deep layers of any pre-trained LLMs.
  • Intra-layer sliding quantization quantizes layers inside the current slidning window in an incremental manner.

Table Of Contents

Main Results

Language Generation

Table 1

Zero-Shot Commonsense Reasoning

Table 2

Methods With Extra Inference-Time Cost

Table 3

MoE Model Results

Table 4

Math Resoning and Code Generation

Table 5

Model Zoo

The following checkpoints are planned for public release on Hugging Face:

Model Quantization Hugging Face
Llama2-13B W4A4 SliderQuant-Llama2-13B-W4A4
Llama2-13B W2A16 SliderQuant-Llama2-13B-W2A16
Qwen2.5-14B W4A4 SliderQuant-Qwen2.5-14B-W4A4
Qwen2.5-14B W2A16 SliderQuant-Qwen2.5-14B-W2A16

All checkpoints are available under IntelLabsChina/SliderQuant.

Install

git clone https://github.com/genggng/sliderquant

mamba create -n sliderquant python=3.10 -y
mamba activate sliderquant

cd sliderquant
pip install -e .

How To Train

  1. Create a folder and place the experimental configuration file inside, following this structure:
sliderquant/
├── log-llama2
│   └── llama2-w4a4
│       └── config.yaml
  1. Edit task_list.conf to specify the result_dir.
result_dir=configs/llama2-7b-w2a16

result_dir=${exp_id}
GPU_NUM=1
port=29507
THRESHOLD=0.05
WAIT_MODE=true
WAIT_INTERVAL=60
  1. Start training:
./auto_train_ddp.sh

How To Test

  1. Edit task_list.conf to specify the result_dir.
result_dir=configs/llama2-7b-w2a16

GPU_NUM=1
port=29507
THRESHOLD=0.05
WAIT_MODE=true
WAIT_INTERVAL=60
  1. Run evaluation:
./auto_test_one.sh

Citation

If SliderQuant is useful in your research, please cite:

@inproceedings{wang2026sliderquant,
  title={SliderQuant: Accurate Post-Training Quantization for LLMs},
  author={Wang, Shigeng and Li, Chao and Kang, Yangyuxuan and Fan, Jiawei and Ou, Zhonghong and Yao, Anbang},
  booktitle={International Conference on Learning Representations},
  year={2026}
}

Acknowledgement

SliderQuant builds code from:

We are grateful to the authors and maintainers of both projects for making their amazing code public.

About

The official project website of "SliderQuant: Accurate Post-Training Quantization for LLMs" (accepted to ICLR 2026).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors