Publications

2026

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes
A. Setlur, Z. Wang, A. Cohen, P. Rashidinejad^✱, S.M. Xie^✱
Pre-print
[pdf]
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
A. Setlur^✱, M. Yang^✱, C. Snell, J. Greer, I. Wu, V. Smith, M. Simchowitz, A. Kumar
EXAIT at ICML 2025 (Best Paper Award)
LCFM at ICML 2025 (Oral)
PUT at ICML 2025 (Oral)
ICLR (2026)
[pdf] [blog post] [site]
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
M. Yang, H. Bai, I. Wu, G. Yang, A. Setlur, A. Kumar
ICLR (2026)
[pdf] [site]
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Y. Qu, A. Singh, Y. Lee, A. Setlur, R. Salakhutdinov, C. Finn, A. Kumar
RAM-2: Reasoning, Attention & Memory Workshop at COLM 2025 (Oral)
ICLR (2026)
[pdf] [site]
https://arxiv.org/pdf/2507.17895
A. Setlur^✱, P. Thaker^✱, J. Ullman^✱
AISTATS (2026)
[pdf]
Learning to Reason on Hard Problems with Privileged On-Policy Exploration
Y. Qu^✱, A. Setlur^✱, V. Smith, R. Salakhutdinov, A. Kumar
MATH-AI Workshop at NeurIPS 2025 (Oral)
[blog]

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interactions
J. Shen^✱, H. Bai^✱, L. Zhang, Y. Zhou, A. Setlur, P. Doshi, C. Snell, J. Greer, I. Wu, V. Smith, M. Simchowitz, A. Kumar, N. Jiang, T. Zhang, A. Talwalkar, A. Kumar
Multi-Modal Reasoning for Agentic Intelligence Workshop at ICCV 2025 (Best Paper Award)
Language Agents and World Models at NeurIPS 2025 (Best Paper Award)
Scaling Environments for Agent Workshop at NeurIPS 2025 (Oral)
NeurIPS (2025)
[pdf] [site]
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
A. Setlur, N. Rajaraman, S. Levine, A. Kumar
ICML 2025 (Spotlight), VerifAI ICLR 2025 (Oral)
[pdf]
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Y. Qu^✱, M. Yang^✱, A. Setlur, L. Tunstall, E. Beeching, R. Salakhutdinov, A. Kumar
FM‑Wild ICLR 2025 (Oral) FM‑Wild ICLR 2025 (Oral)
ICML (2025) (Spotlight)
[pdf] [blog post]
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
A. Setlur^✱, C. Nagpal^✱, A. Fisch, X. Geng, J. Eisenstein, R. Agarwal, A. Agarwal, J. Berant, A. Kumar
ICLR (2025) (Spotlight)
[pdf]

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
A. Setlur, S. Garg, X. Geng, N. Garg, V. Smith, A. Kumar
NeurIPS (2024)
[pdf]
On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift
P. Thaker, A. Setlur, Z.S. Wu, V. Smith
NeurIPS (2024)
[pdf]
Prompting for Robustness: Extracting Robust Classifiers from Foundation Models
A. Setlur^✱, S Garg^✱, V Smith, S Levine
ICML (2024)
[pdf]
Project with Source, Probe with Target: Extracting Useful Features for Adaptation to Distribution Shifts
A.S. Chen^✱, Y. Lee^✱, A. Setlur, S. Levine, C Finn
ICLR (2024) (Spotlight)
[pdf]
The Optimal Constant Solution: Predictable Extrapolation in Deep Neural Networks
K. Kang, A. Setlur, C. Tomlin, S. Levine
ICLR (2024)
[pdf]

Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift
S. Garg^✱, A. Setlur^✱, Z. Lipton, S. Balakrishnan, V. Smith, A. Raghunathan
NeurIPS 2023
[pdf]
Private and Personalized Histogram Estimation in a Federated Setting
A. Setlur, V Feldman, K Talwar
NeurIPS 2023
[pdf]
Leveraging public representations for private transfer learning
P Thaker, A. Setlur, ZS Wu, V Smith
NeurIPS 2023
[pdf]
Multitask Learning Can Improve Worst-Group Outcomes
A. Kulkarni^✱, L. Dery^✱, A. Setlur, A. Raghunathan, A .Talwalkar, G. Neubig
TMLR 2023
[pdf]
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts
A Chen, Y Lee, A. Setlur, S Levine, C Finn
ArXiv Preprint
[pdf]
Contextual Reliability: When Different Features Matter in Different Contexts
G. Ghosal^✱, A. Setlur^✱, D. Brown, A. Dragan, A. Raghunathan
ICML 2023
[pdf]
Bitrate-constrained DRO: Beyond worst case robustness to unknown group shifts
A. Setlur, D. Dennis, B. Eysenbach, A. Raghunathan, C. Finn, V. Smith, S. Levine
ICLR 2023
[pdf]

Adversarial Unlearning: Reducing Confidence Along Adversarial Directions
A. Setlur, B. Eysenbach, V. Smith, S. Levine
NeurIPS 2022
[pdf]

Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution
A. Setlur^✱, O. Li^✱, V. Smith
NeurIPS 2021
[pdf]
Is support set diversity necessary for meta-learning?
A. Setlur^✱, O. Li^✱, V. Smith
ArXiv Preprint
[pdf]
Explaining the efficacy of counterfactually augmented data
D. Kaushik, A. Setlur, E. Hovy, ZC. Lipton
ICLR 2021
[pdf]
Nonlinear ISA with auxiliary variables for learning speech representations
A. Setlur, B. Póczos, AW. Black
InterSpeech 2021 (Best Paper Nomination)
[pdf]
Towards Using Heterogeneous Relation Graphs for End-to-End TTS
A. Setlur^✱, A Madaan^✱, T Parekh^✱, Y Yang, AW Black
ASRU 2021
[pdf]

Politeness transfer: A tag and generate approach
A. Madaan^✱, A. Setlur^✱, T. Parekh^✱, B. Poczos, G. Neubig, Y. Yang, R. Salakhutdinov, A. W Black, S. Prabhumoye
ACL 2020
[pdf]
Robust handwriting recognition with limited and noisy data
H. Pham, A. Setlur, S. Dingliwal, T. Lin, B. Poczos, K. Huang, Z. Li, J. Lim, C. McCormack, T. Vu
ICFHR 2020
[pdf]
An efficient fault tolerant workflow scheduling approach using replication heuristics and checkpointing in the cloud
A. Setlur, S. Nirmala, H. Singh, S. Khoriya
Journal of Parallel and Distributed Computing
[pdf]
ReStGAN: A step towards visually guided shopper experience via text-to-image synthesis
S Surya, A. Setlur, A Biswas, S Negi
WACV 2020
[pdf]
Better Approximate Inference for Partial Likelihood Models with a Latent Structure
A. Setlur, B. Póczós
TPP @ Neurips 2019 (Oral)
[pdf]