DualDistill

Official implementation of DualDistill: A trajectory-composition distillation method for integrating tool use into long-chain-of-thought reasoning.

Weihua Du, Pranjal Aggarwal, Sean Welleck, & Yiming Yang
"Agentic-R1: Distilled Dual-Strategy Reasoning." (2025)

Key Features

Efficient Training: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
Unified Reasoning: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model

Overview of DualDistill methodology

Datasets

Dataset	Description	Link
Training Set	Complete training dataset with teacher trajectories	🤗 HuggingFace
Test Set	Evaluation benchmarks	`dataset/test/`

Results

Performance comparison of Agentic-R1 models

Agentic-R1 demonstrates significant performance gains on DeepMath-L and Combinatorics300, where both complex reasoning and tool use are crucial for success.
Agentic-R1-SD (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.

Quick Start

Installation

Clone the repository:

git clone https://github.com/StigLidu/DualDistill.git
cd DualDistill

Create environment (optional but recommended):

conda create -n dualdistill python=3.11
conda activate dualdistill

Install dependencies:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Training Pipeline

Step 1: Model & Data Preparation

Download the base model:

python script/data_script/model_download.py \
  --repo_id deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
  --local_dir models

Prepare training data:

python script/data_script/teacher_data_download.py

Step 2: Teacher Distillation

Train the student model using teacher trajectories:

bash script/sft_script/SFT.sh

Step 3: Self-Distillation

Start inference server:

bash script/eval_script/start_inference_server.sh [model_path] [display_name] [port]

Sample self-distillation trajectories:

python sft/self_distillation_sampler.py \
  --server_url http://localhost:$port/v1 \
  --model_name [display_name] \
  --model_path [model_path] \
  --save_path [path_to_save_trajectories]

Prepare self-distillation data:

# Extract teacher solutions
python script/data_script/extract_training_solution.py

# Construct training dataset
python script/data_script/processing_self_distillation_traj.py

Fine-tune on self-distillation data:

bash script/sft_script/expert_iteration.sh [model_path] [data_path] [save_path]

Model Evaluation

Start Inference Server

bash script/eval_script/start_inference_server.sh [model_path] [display_name] [port]

Run Evaluation

bash script/eval_script/eval_remote_server.sh \
  [url] [display_name] [data_path] [code_mode] [max_token]

Example:

bash script/eval_script/eval_remote_server.sh \
  "http://localhost:8080/v1" "agentic-r1" "dataset/test/math.json" "true" "4096"

Trained Models

Model	Description	HuggingFace Link
Agentic-R1-7B	Base model with teacher distillation	🤗 Download
Agentic-R1-7B-SD	Enhanced model with self-distillation	🤗 Download

⚠️ Important Notes

Code Execution Safety: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
Inference Config: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the model_max_length in tokenizer_config.json.
Self-Distillation Warning: The self-distillation step requires sampling many trajectories and can be time-consuming.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

We thank the following open-source projects for their foundational contributions:

OpenHands - Agent framework
DeepMath-103K - Mathematical reasoning dataset
vLLM - High-performance inference engine

Contact

For questions or support, please contact:

Weihua Du: [email protected]

Citation

If you find our work useful, please consider citing:

@article{du2025agentic,
  title={Agentic-R1: Distilled Dual-Strategy Reasoning},
  author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
  journal={arXiv preprint arXiv:2507.05707},
  year={2025}
}

⭐ Star us on GitHub if this project helped you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DualDistill

Key Features

Datasets

Results

Quick Start

Installation

Training Pipeline

Step 1: Model & Data Preparation

Step 2: Teacher Distillation

Step 3: Self-Distillation

Model Evaluation

Start Inference Server

Run Evaluation

Trained Models

⚠️ Important Notes

License

Acknowledgments

Contact

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataset		dataset
fig		fig
math_utils		math_utils
script		script
sft		sft
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

StigLidu/DualDistill

Folders and files

Latest commit

History

Repository files navigation

DualDistill

Key Features

Datasets

Results

Quick Start

Installation

Training Pipeline

Step 1: Model & Data Preparation

Step 2: Teacher Distillation

Step 3: Self-Distillation

Model Evaluation

Start Inference Server

Run Evaluation

Trained Models

⚠️ Important Notes

License

Acknowledgments

Contact

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages