Skip to content

Mr-Loevan/FAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Visual-Language Model Reasoning


Paper PDF Model Code

Overview

This repository contains the official implementation of FAST-GRPO (Fast-Slow Thinking Group Relative Policy Optimization), achieving high performance in applying fast-slow thinking to both visual and textual reasoning.

Table of Contents

Installation

Setup Environment

# Clone the repository
git clone https://github.com/Mr-Loevan/FAST-GRPO.git
cd FAST-GRPO

# Create conda environment
conda create -n fast_grpo python=3.11
conda activate fast_grpo

# Install dependencies (Refer to EasyR1 installation)
pip install -r requirements.txt
pip install -e .

Quick Start

# Run training with default configuration
bash examples/train_fast_llm.sh

Core Components

FAST-GRPO introduces three key innovations that work together to achieve fast-slow reasoning:

1. Thinking Reward Function

The Thinking Reward Function (examples/reward_function/thinking_reward.py) implements an adaptive difficulty-aware reward mechanism:

  • Adaptive Difficulty: difficulty = (1 - pass_rate) * normalized_complexity
  • Differentiated Rewards:
    • Easy problems (< 80th percentile) and correct answer: Rewards concise solutions
    • Hard problems (> 80th percentile) and incorrect answer: Rewards exploration efforts
2. Dynamic KL Penalty

Implements group-based adaptive KL divergence control for stable training:

# Configuration in config.yaml
algorithm:
  kl_penalty: low_var_kl
  kl_coef: 1.0e-2
  kl_type: "group_accuracy_based"
  kl_min_coef: 0.001  # β_min
  kl_max_coef: 0.01   # β_max
  • Group-based Adaptation: Adjusts KL coefficient based on group performance
3. Slow2Fast Sampling

Progressive curriculum learning that gradually increases training difficulty:

# Configuration in config.yaml
algorithm:
  online_filtering: true
  filter_key: accuracy
  dynamic_filter_schedule:
    - epoch_ratio: 0.5   
      filter_low: 0.3    
      filter_high: 0.99  
    - epoch_ratio: 1.0   
      filter_low: 0.01  
      filter_high: 0.7   
  • Phase 1 (0-50%): Learn from medium-to-high difficulty samples for slow thinking
  • Phase 2 (50-100%): Include easy samples for fast-thinking

Training

Run Training Example

# Use provided script (recommended)
bash examples/train_fast_llm.sh

Model Zoo

Model Base Model Download
FAST-1.5B DeepSeek-R1-Distill-Qwen-1.5B ModelScope
FAST-3B Qwen-2.5-VL-3B ModelScope
FAST-7B Qwen-2.5-VL-7B Coming Soon
FAST-4B Qwen-3-VL-4B Coming Soon

Evaluation Results

Performance on Reasoning Benchmarks

Method GSM8K (Acc) GSM8K (Length) MATH 500 (Acc) MATH 500 (Length) AIME 2024 (Acc) AIME 2024 (Length)
FAST-1.5B 86.8 851 85.8 2645 34.17 8003

Note: Length denotes the number of generated tokens.

Citation

If you find this work useful, please cite our paper:

@inproceedings{xiao2025fastslow,
  title={Fast-Slow Thinking {GRPO} for Large Vision-Language Model Reasoning},
  author={Wenyi Xiao and Leilei Gan},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=MI1uT5rReV}
}

License

This project is licensed under the Apache 2.0 License.

Acknowledgments

  • The results reported in our paper were originally implemented with OpenRLHF
  • This repository provides a reimplementation using EasyR1 framework
  • Thanks to the VeRL and EasyR1 team for the base training framework.

About

[NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages