Skip to content

facebookresearch/igpo

IGPO: Inpainting-Guided Policy Optimization for Diffusion Large Language Models


1 Meta Superintelligence Labs 2 UCLA 3 Tsinghua University, College of AI 4 MIT
*Work done at Meta Core Contribution

arXiv Project Page

Overview

A novel policy optimization framework for diffusion large language models that leverages their unique "inpainting" ability to guide exploration and improve RL training efficiency and model performance.

Main Figure


Environment Setup

conda env create -f env.yml
conda activate igpo

Usage

Download the MetaMathQA dataset from Hugging Face.

After downloading, the structure should be:

igpo/MetaMathQA/
├── MetaMathQA-395K.json
└── README.md

To run IGPO:

sbatch run_igpo.slurm

(need to change the wandb api key in the slurm files)

To run GRPO:

sbatch run_grpo.slurm

Acknowledgement

This code is built on the D1 codebase.

Citation

If you find IGPO useful in your research, please consider citing:

@article{zhao2025inpainting,
  title={Inpainting-Guided Policy Optimization for Diffusion Large Language Models},
  author={Zhao, Siyan and Liu, Mengchen and Huang, Jing and Liu, Miao and Wang, Chenyu and Liu, Bo and Tian, Yuandong and Pang, Guan and Bell, Sean and Grover, Aditya and others},
  journal={arXiv preprint arXiv:2509.10396},
  year={2025}
}

License

IGPO is MIT licensed, as found in the LICENSE file.

About

Implementation of inpainting-guided policy optimization for diffusion large language models.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published