Skip to content

AndreHe02/rewarding-unlikely-release

Repository files navigation

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

Authors: Andre He, Daniel Fried, Sean Welleck

This repository contains the official implementation for the paper "Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening".

This codebase is based on veRL and is designed for RL training of Lean-based theorem-proving models. The Lean verifier is integrated into the online RL loop to provide rewards for model-generated proofs.


Getting Started

1. Environment Setup

Follow the installation and setup instructions provided in the veRL repository to initialize your environment.

2. Lean and Verifier Dependencies

This project uses a verifier adapted from DeepSeek-Prover-V1.5. To get started:

  • Clone and install the DeepSeek-Prover-V1.5 repository.
  • Either install it globally or update the path to the verifier in verl/lean/verifier.py.
  • Install the Lean REPL, which is required to interact with the Lean environment during proof checking.

Running Experiments


Datasets

To train on your own collection of Lean theorems, you must first convert your dataset into a .parquet file. Examples of dataset formats and splits can be found in the data directory:

  • minif2f_train and minif2f_test: Standard splits of MiniF2F.
  • mff-lwb-goedel-28k.parquet: A concatenation of minif2f_train with ~30k problems from Lean-Workbook that were successfully solved by Godel-Prover, excluding problems held out for validation.
  • mff-lwb-10k-seen: A similarly constructed dataset using theorems solved by InternLM2.5-step-prover.

These datasets do not include proofs found by prior work; they are only used to identify provable theorems.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors