π News β’ π Introduction β’ π Main Results
β¨ Getting Started β’ π― Usage β’ π Data Generation β’ π€ Models β’ π οΈ Development β’ π¨ Contact β’ π Citation
- [2025-08-21] ReSo has been accepted to EMNLP 2025! See you in Suzhou in November.
ReSo is a comprehensive framework for multi-step mathematical and scientific reasoning. It combines a self-organizing multi-agent architecture with reward-driven optimization to plan, solve, and refine solutions iteratively.
Key capabilities:
- Agent graph for task decomposition and collaboration
- Reward modeling for iterative self-optimization
- Modular LLM backends and configurable pipelines
ReSo achieves a 30% higher accuracy than other frameworks. Experimental results demonstrate the superior performance of ReSo on challenging tasks.
- Python 3.10+
- CUDA-compatible GPU (recommended)
- Git
- Clone the repository
git clone <repository-url>
cd ReSo/- Create and activate environment
conda create -n ReSo python=3.10 -y
conda activate ReSo
pip install -r requirements.txt- Configure API keys (optional, if using external LLMs)
Create and edit your environment file:
cp .env.template .envFill .env with your credentials:
# OpenAI
OAI_API_KEY=your_openai_api_key
OAI_BASE_URL=https://api.openai.com/v1
# Qwen
QWEN_API_KEY=your_qwen_api_key
QWEN_BASE_URL=your_qwen_base_url
# Claude
CLAUDE_API_KEY=your_claude_api_key
CLAUDE_BASE_URL=your_claude_base_url
# Gemini
GEMINI_API_KEY=your_gemini_api_key
GEMINI_BASE_URL=your_gemini_base_url
# DeepSeek
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_BASE_URL=your_deepseek_base_urlReSo/
βββ ReSo/ # Core framework modules
β βββ agent_graph/ # Agent graph implementation
β βββ llm_agent/ # LLM agent components
β βββ model/ # Custom model implementations
β βββ task_graph/ # Task graph management
βββ datasets/ # Data synthesis and storage
β βββ data_gen.py # Complex problem generator
β βββ get_answer.py # Answer extraction utilities
β βββ sub_question/ # Base sub-question datasets
β βββ MATH-MAS/ # MATH MAS datasets
β βββ Scibench-MAS/ # Science benchmark datasets
βββ experiments/ # Training and evaluation scripts
βββ reward_model/ # Reward model training & usage
βββ config.ini # Model & agent configuration
βββ config_hyper.ini # Training hyperparameters
βββ requirements.txt # Python dependencies
Train on your dataset:
python experiments/train_ReSo.py --dataset_path <path_to_training_data>Notes:
- Configure training hyperparameters in
config_hyper.ini. - Adjust model/agent settings in
config.ini.
MATH-MAS benchmarks:
# Easy
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Easy.json --plan_mode gt
# Medium
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Medium.json --plan_mode gt
# Hard
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Hard.json --plan_mode gtGSM8K:
python experiments/test_gsm8k.py --dataset_path <gsm8k_dataset_path>Common flags:
--dataset_path: Path to dataset file--plan_mode: Planning mode (gtfor ground truth)--random_select: Randomized selection (optional)--error_tolerance: Error threshold (optional)
Create complex multi-step problems using the generator.
Location: datasets/sub_question/
math_test.json(math)scibench.json(science)
Each entry contains a prompt, answer, variables, and metadata.
python datasets/data_gen.py -n <num_questions> -c <complexity_level> [-o <output_file>]Examples:
# 100 questions, 3 sub-questions
python datasets/data_gen.py -n 100 -c 3
# 50 questions, 5 sub-questions, custom output
python datasets/data_gen.py -n 50 -c 5 -o datasets/mixed/complex_dataset.json- DAG construction for dependency structure
- Linking sub-questions via variables/answers
- Integration into a final composite task
- Validation for consistency and solvability
See datasets/README.md for details.
We provide fine-tuned models on Hugging Face:
- Plan model for multi-step planning
- CRM (Critic-Reward Model) for evaluation and optimization
Browse: https://huggingface.co/henggg/ReSo/tree/main
- Agent graph for structured collaboration
- Automatic task decomposition
- Coordinated solving across agents
- Reward modeling for quality assessment
- Iterative refinement and error detection
- Supports custom reward models
- Modular design for new models/strategies
- Multiple LLM providers (OpenAI, Claude, Gemini, Qwen, DeepSeek, etc.)
- Configurable pipelines and behaviors
ReSo shows strong performance on MATH, GSM8K, and science benchmarks. Refer to the paper for full metrics.
- Implement interface in
ReSo/llm_agent/ - Add options in
config.ini - Register in
ReSo/llm_agent/model_info.py
- Define architecture in
ReSo/model/ - Implement training in
reward_model/train.py - Add evaluation in
reward_model/test.py
- Add formats in
datasets/sub_question/ - Update logic in
datasets/data_gen.py - Update parsing in
datasets/get_answer.py
Issues and PRs are welcome. Please follow standard code style, add tests when changing behavior, and update docs when relevant.
- Open an issue on GitHub
- Email: [email protected]
If you find ReSo helpful, please cite our paper:
@article{zhou2025reso,
title={ReSo: A reward-driven self-organizing llm-based multi-agent system for reasoning tasks},
author={Zhou, Heng and Geng, Hejia and Xue, Xiangyuan and Kang, Li and Qin, Yiran and Wang, Zhiyong and Yin, Zhenfei and Bai, Lei},
journal={arXiv preprint arXiv:2503.02390},
year={2025}
}

