Skip to content

StigLidu/DualDistill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DualDistill

License: MIT arXiv

Official implementation of DualDistill: A trajectory-composition distillation method for integrating tool use into long-chain-of-thought reasoning.

Weihua Du, Pranjal Aggarwal, Sean Welleck, & Yiming Yang
"Agentic-R1: Distilled Dual-Strategy Reasoning." (2025)

Key Features

  • Efficient Training: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
  • Unified Reasoning: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model
Overview of DualDistill

Overview of DualDistill methodology

Datasets

Dataset Description Link
Training Set Complete training dataset with teacher trajectories 🤗 HuggingFace
Test Set Evaluation benchmarks dataset/test/

Results

Performance comparison of Agentic-R1 models
  • Agentic-R1 demonstrates significant performance gains on DeepMath-L and Combinatorics300, where both complex reasoning and tool use are crucial for success.
  • Agentic-R1-SD (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.

Quick Start

Installation

  1. Clone the repository:

    git clone https://github.com/StigLidu/DualDistill.git
    cd DualDistill
  2. Create environment (optional but recommended):

    conda create -n dualdistill python=3.11
    conda activate dualdistill
  3. Install dependencies:

    pip install -r requirements.txt
    pip install flash-attn --no-build-isolation

Training Pipeline

Step 1: Model & Data Preparation

Download the base model:

python script/data_script/model_download.py \
  --repo_id deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
  --local_dir models

Prepare training data:

python script/data_script/teacher_data_download.py

Step 2: Teacher Distillation

Train the student model using teacher trajectories:

bash script/sft_script/SFT.sh

Step 3: Self-Distillation

Start inference server:

bash script/eval_script/start_inference_server.sh [model_path] [display_name] [port]

Sample self-distillation trajectories:

python sft/self_distillation_sampler.py \
  --server_url http://localhost:$port/v1 \
  --model_name [display_name] \
  --model_path [model_path] \
  --save_path [path_to_save_trajectories]

Prepare self-distillation data:

# Extract teacher solutions
python script/data_script/extract_training_solution.py

# Construct training dataset
python script/data_script/processing_self_distillation_traj.py

Fine-tune on self-distillation data:

bash script/sft_script/expert_iteration.sh [model_path] [data_path] [save_path]

Model Evaluation

Start Inference Server

bash script/eval_script/start_inference_server.sh [model_path] [display_name] [port]

Run Evaluation

bash script/eval_script/eval_remote_server.sh \
  [url] [display_name] [data_path] [code_mode] [max_token]

Example:

bash script/eval_script/eval_remote_server.sh \
  "http://localhost:8080/v1" "agentic-r1" "dataset/test/math.json" "true" "4096"

Trained Models

Model Description HuggingFace Link
Agentic-R1-7B Base model with teacher distillation 🤗 Download
Agentic-R1-7B-SD Enhanced model with self-distillation 🤗 Download

⚠️ Important Notes

  • Code Execution Safety: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
  • Inference Config: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the model_max_length in tokenizer_config.json.
  • Self-Distillation Warning: The self-distillation step requires sampling many trajectories and can be time-consuming.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

We thank the following open-source projects for their foundational contributions:

Contact

For questions or support, please contact:

Citation

If you find our work useful, please consider citing:

@article{du2025agentic,
  title={Agentic-R1: Distilled Dual-Strategy Reasoning},
  author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
  journal={arXiv preprint arXiv:2507.05707},
  year={2025}
}

⭐ Star us on GitHub if this project helped you!

About

[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published