ReSA

Rectified Sparse Attention (ReSA)

📝 TL;DR

Rectified Sparse Attention (ReSA) improves sparse decoding by periodically refreshing the KV cache, achieving near-lossless quality and up to 2.42x speedup at 256K context length.

📝 Abstract

Efficient long-sequence generation is a critical challenge for Large Language Models. While recent sparse decoding methods improve efficiency, they suffer from KV cache misalignment, where approximation errors accumulate and degrade generation quality. In this work, we propose Rectified Sparse Attention (ReSA), a simple yet effective method that combines block-sparse attention with periodic dense rectification. By refreshing the KV cache at fixed intervals using a dense forward pass, ReSA bounds error accumulation and preserves alignment with the pretraining distribution. Experiments across math reasoning, language modeling, and retrieval tasks demonstrate that ReSA achieves near-lossless generation quality with significantly improved efficiency. Notably, ReSA delivers up to 2.42x end-to-end speedup under decoding at 256K sequence length, making it a practical solution for scalable long-context inference.

Method Overview

Performance Comparison on Math Reasoning Tasks

Setting	Gaokao2023En	Minerva	OlympiadBench	AIME24	AMC23	Avg
R1-Qwen-Distill 1.5B
Dense	71.6	28.7	40.8	27.4	65.6	46.82
Sparse	67.9	29.0	38.7	21.3	60.6	43.50
ReSA	71.8	28.1	39.5	23.0	65.4	45.56
Avg Length	4915.8	6390.8	8991.6	12126.4	7866.4	8058.2
R1-Qwen-Distill 7B
Dense	73.8	40.4	52.3	48.1	89.0	60.72
Sparse	72.9	38.1	48.4	46.1	83.1	57.72
ReSA	73.5	39.7	52.3	51.1	86.0	60.52
Avg Length	2889.9	4018.7	7520.0	10474.5	5732.2	6127.1

Table: Performance comparison on math reasoning tasks. ReSA maintains near-lossless performance, whereas sparse decoding alone degrades as decoding progresses.

🚀 Getting Started

Usage

Pretrained Model Preparation

Download the pretrained model to /path/to/pretrained/. For e.g.:

huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir /path/to/pretrained/ --local-dir-use-symlinks False

Math Evaluation

Evaluate on math data with:

bash scripts/local_eval_math.sh

Replace /path/to/pretrained/ in the script with your pretrained model path. Replace /path/to/output/ with the path to save evaluation results. Pass in the configuration of ReSA using save_feature.

Collect Math Evaluation Results

Install the needed packages:

bash scripts/setup_math_eval.sh

Collect math evaluation results from the output file:

bash scripts/math_eval_result.sh /path/to/output/ file_name

Replace file_name with your math result file, for e.g., DeepSeek-R1-Distill-Qwen-1.5B_resa_0.1_32_local.jsonl

📚 References

(To be completed)

Acknowledgements

This project incorporates code from Qwen2.5-Math. We appreciate the contributions of the original authors.

Name		Name	Last commit message	Last commit date
parent directory ..
figures		figures
llm		llm
math_data		math_data
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Rectified Sparse Attention (ReSA)

📝 TL;DR

📝 Abstract

Method Overview

Performance Comparison on Math Reasoning Tasks

🚀 Getting Started

Usage

Pretrained Model Preparation

Math Evaluation

Collect Math Evaluation Results

📚 References

Acknowledgements

FilesExpand file tree

ReSA

Directory actions

More options

Directory actions

More options

Latest commit

History

ReSA

Folders and files

parent directory

README.md

Rectified Sparse Attention (ReSA)

📝 TL;DR

📝 Abstract

Method Overview

Performance Comparison on Math Reasoning Tasks

🚀 Getting Started

Usage

Pretrained Model Preparation

Math Evaluation

Collect Math Evaluation Results

📚 References

Acknowledgements