Blogs
2026
- IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs
Z. Cheng✱, Y. Xie✱, Y. Qu✱, A. Setlur✱, S. Hao, V. Pimpalkhute, T. Liang, F. Yao, Z. Liu, E. Xing, V. Smith, R. Salakhutdinov, Z. Hu, T. Killian, A. Kumar
[CMU MLD Blog] (Jan 2026)
2025
How to Explore to Scale RL Training of LLMs on Hard Problems?
Y. Qu✱, A. Setlur✱, V. Smith, R. Salakhutdinov, A. Kumar
[Blog] (Dec 2025)Sharpening or Discovery, RL or Meta RL?: How RL Improves LLM Reasoning
A. Setlur, A. Kumar
[Blog] (June 2025)Optimizing LLM test-time compute involves solving a meta RL problem
A. Setlur, Y. Qu, M. Yang, L. Zhang, V. Smith, A. Kumar
[CMU MLD Blog] (Jan 2025)