verl-recipe is a set of examples based on verl for end-to-end RL training recipes.
All recipe should follow the following structure:
README.md: recipe descriptioncode: recipe codescript: reproducible training script
Specifically, README.md should contain following sections:
- Installation: which verl version is required for this recipe?
# release version
pip install verl==0.6.0
# dev version
pip install verl@git+https://github.com/volcengine/verl.git@313dfdb2199124a37189e32e6d4a6c654379f2d4
- Training: how to train
- Evaluation: performance metrics
- Citation: paper, notion, blog, etc.
We rely on pre-commit to keep our code consistent. To set it up:
pip install pre-commit
pre-commit install
# for staged changes
pre-commit run
# for all files in the repo
pre-commit run --all-files
# run a specific hook with pre-commit
# pre-commit run --all-files --show-diff-on-failure --color=always <hood-id>
pre-commit run --all-files --show-diff-on-failure --color=always ruff
pre-commit run --all-files --show-diff-on-failure --color=always autogen-trainer-cfg- retool: Reinforcement Learning for Strategic Tool Use in LLMs
- langgraph_agent: A tiny example to demonstrate multi-turn rollout with LangGraph ReactAgent to solve math expression.
- spo: Single-stream Policy Optimization.