Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

Authors: Andre He, Daniel Fried, Sean Welleck

This repository contains the official implementation for the paper "Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening".

This codebase is based on veRL and is designed for RL training of Lean-based theorem-proving models. The Lean verifier is integrated into the online RL loop to provide rewards for model-generated proofs.

Getting Started

1. Environment Setup

Follow the installation and setup instructions provided in the veRL repository to initialize your environment.

2. Lean and Verifier Dependencies

This project uses a verifier adapted from DeepSeek-Prover-V1.5. To get started:

Clone and install the DeepSeek-Prover-V1.5 repository.
Either install it globally or update the path to the verifier in verl/lean/verifier.py.
Install the Lean REPL, which is required to interact with the Lean environment during proof checking.

Running Experiments

To launch RL training using GRPO, use the script:
examples/lean/grpo.sh
The core training loop and implementation of unlikeliness reward are in verl/trainer/ppo/ray_lean_trainer.py.
To evaluate trained checkpoints via sampling:
examples/inference/run_sampling.sh
To analyze results, compute pass rates, and generate performance plots, see:
misc/paper_figures.ipynb

Datasets

To train on your own collection of Lean theorems, you must first convert your dataset into a .parquet file. Examples of dataset formats and splits can be found in the data directory:

minif2f_train and minif2f_test: Standard splits of MiniF2F.
mff-lwb-goedel-28k.parquet: A concatenation of minif2f_train with ~30k problems from Lean-Workbook that were successfully solved by Godel-Prover, excluding problems held out for validation.
mff-lwb-10k-seen: A similarly constructed dataset using theorems solved by InternLM2.5-step-prover.

These datasets do not include proofs found by prior work; they are only used to identify provable theorems.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
data		data
docker		docker
docs		docs
examples		examples
misc		misc
patches		patches
tests		tests
verl		verl
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.style.yapf		.style.yapf
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
README_verl.md		README_verl.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

Getting Started

1. Environment Setup

2. Lean and Verifier Dependencies

Running Experiments

Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

Getting Started

1. Environment Setup

2. Lean and Verifier Dependencies

Running Experiments

Datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages