Alekh Agarwal

Staff Research Scientist
Google
Email: ,

About Me

I am currently a researcher in the learning theory team at Google. Prior to that, I spent nine wonderful years at Microsoft Research where I was a member of the Machine Learning group in the New York lab and later led the Reinforcement Learning team in Redmond. I obtained my PhD in Computer Science from UC Berkeley, working with Peter Bartlett and Martin Wainwright.

Interests

I am broadly interested in Machine Learning, Statistics and Optimization. I am currently working on several aspects of Interactive Machine Learning, including contextual bandits, reinforcement learning and active learning with an eye towards practical learning systems with strong theoretical guarantees. I have previously worked on tradeoffs between computational and statistical complexities, large-scale and distributed machine learning and statistical inference in high-dimensions.

Publications

RL theory monograph: A monograph on RL theory based on notes from courses taught by Nan Jiang at UIUC and together with Sham Kakade at UW. The notes are being actively updated, and any feedback, typos etc. are welcome.

Ph.D. Thesis

Computational Trade-offs in Statistical Learning, Ph.D. Thesis, Department of Computer Science, UC Berkeley, 2012.

Recent preprints

Model-free Representation Learning and Exploration in Low-rank MDPs
with Aditya Modi, Jinglin Chen, Akshay Krishnamurthy and Nan Jiang.
Federated Residual Learning
with Chen-Yu Wei and John Langford.
A Multiworld Testing Decision Service
with a number of wonderful Microsoft colleagues. Link to service here. More details here.

Journal Publications

Practical Evaluation and Optimization of Contextual Bandit Algorithms
with Alberto Bietti and John Langford
To appear in Journal of Machine Learning Research.
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
with Sham Kakade, Jason Lee and Gaurav Mahajan
In Journal of Machine Learning Research, Vol. 22, 2020.
Active Learning for Cost-Sensitive Classification
with Akshay Krishnamurthy, T.-K. Huang, Hal Daumé and John Langford
In Journal of Machine Learning Research, Vol. 20, 2019.
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon
to appear in SIAM Journal of Optimization.
Exact Recovery of Sparsely Used Overcomplete Dictionaries
with Anima Anandkumar and Praneeth Netrapalli
In IEEE Transactions on Information Theory, Vol. 63, Issue 1, 2017.
A Reliable Effective Terascale Linear Learning System
with Olivier Chappelle, Miroslav Dudik and John Langford
In Journal of Machine Learning Research, Vol. 15, 2014.
The Generalization Ability of Online Algorithms for Dependent Data
with John Duchi
In IEEE Transactions on Information Theory, Vol. 59, Issue 1, 2013.
Stochastic convex optimization with bandit feedback
with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin
In SIAM Journal on Optimization, Vol. 23, Issue 1, 2013.
Ergodic Mirror Descent
with John Duchi, Mikael Johansson and Mike Jordan
In SIAM Journal on Optimization, Vol. 22, Issue 4, 2012.
Fast global convergence of gradient methods for high-dimensional statistical recovery
with Sahand Negahban and Martin Wainwright
In The Annals of Statistics, Vol. 40, Number 5, 2012.
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions (Annals formatted version)
with Sahand Negahban and Martin Wainwright
In The Annals of Statistics, Vol. 40, Number 2, July 2012.
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization
with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright
In IEEE Transcations on Information Theory, Vol 58, Issue 5, May 2012.
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
with John Duchi and Martin Wainwright
In IEEE Transactions on Automatic Control, Vol. 57, Issue 3, 2012.
Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes
with Pradeep Ravikumar and Martin Wainwright
In Journal Of Machine Learning Research, Vol. 11, 2010.

Conference Publications (see Google Scholar page for most updated version)

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling
with Tong Zhang.
In COLT 2022
Minimax Regret Optimization for Robust Machine Learning under Distribution Shift
with Tong Zhang.
In COLT 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning (Outstanding paper award)
with Ching-An Cheng, Tengyang Xie and Nan Jiang.
In ICML 2022
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
with Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang and Wen Sun.
In ICML 2022
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
with Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy and John Langford.
In ICLR 2022
Bellman-consistent Pessimism for Offline Reinforcement Learning
with Tengyang Xie, Ching-An Cheng, Nan Jiang and Paul Mineiro.
In NeurIPS 2021
Provably Correct Optimization and Exploration with Non-linear Policies
with Fei Feng, Lin Yang and Wotao Yin.
In ICML 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
with Andrea Zanette and Ching-An Cheng.
In COLT 2021
Towards a Dimension-Free Understanding of Adaptive Linear Control
with Juan Perdomo, Max Simchowitz and Peter Bartlett.
In COLT 2021
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
with Mikael Henaff, Sham Kakade and Wen Sun.
In NeurIPS 2020
Provably Good Batch Reinforcement Learning Without Great Exploration
with Yao Liu, Adith Swaminathan and Emma Brunskill.
In NeurIPS 2020
Policy Improvement from Multiple Experts
with Ching-An Cheng and Andrey Kolobov.
In NeurIPS 2020

Safe Reinforcement Learning via Curriculum Induction
with Matteo Turchetta, Andrey Kolobov, Shital Shah and Andreas Krause.
In NeurIPS 2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
with Sham Kakade, Akshay Krishnamurthy and Wen Sun.
In NeurIPS 2020, oral presentation
Taking a hint: How to leverage loss predictors in contextual bandits?
with Chen-Yu Wei and Haipeng Luo.
In COLT 2020

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
with Sham Kakade, Jason Lee and Gaurav Mahajan
In COLT 2020
On the Optimality of Sparse Model-Based Planning for Markov Decision Processes
with Sham Kakade and Lin Yang.
In COLT 2020
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
with Jordan Ash, Chicheng Zhang, Akshay Krishnamurthy and John Langford.
In ICLR 2020
Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations
with Aditya Modi, Debadeepta Dey, Adith Swaminathan, Besmira Nushi, Sean Andrist and Eric Horvitz.
In AAAI 2020
Off-Policy Policy Gradient with State Distribution Correction
with Yao Liu, Adith Swaminathan and Emma Brunskill.
In UAI 2019
Fair Regression: Quantitative Definitions and Reduction-based Algorithms
with Steven Wu and Miro Dudik.
In ICML 2019
Provably efficient RL with Rich Observations via Latent State Decoding
with Simon Du, Akshay Krishnamurthy, Nan Jiang, Miro Dudik and John Langford.
In ICML 2019
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
with Chicheng Zhang, Hal Daumé, John Langford and Sahand Negahban.
In ICML 2019
Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
with Wen Sun, Nan Jiang, Akshay Krishnamurthy and John Langford.
In COLT 2019
On Polynomial Time PAC Reinforcement Learning with Rich Observations
with Christoph Dann, Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.
In NeurIPS 2018
A Reductions Approach to Fair Classification
with Alina Beygelzimer, Miro Dudik, John Langford and Hanna Wallach.
In ICML 2018
Practical Contextual Bandits with Regression Oracles
with Dylan Foster, Haipeng Luo, Miro Dudik and Rob Schapire.
In ICML 2018
Hierarchical Imitation and Reinfocement Learning
with Hoang Le, Nan Jiang, Miro Dudik, Yisong Yue and Hal Daumé.
In ICML 2018
Efficient Contextual Bandits in Non-stationary Worlds
with Haipeng Luo, Chen-Yu Wei and John Langford.
In COLT 2018
Off-policy evaluation for slate recommendation
with Adith Swaminathan, Akshay Krishnamurthy, Miro Dudik, John Langford, Damien Jose and Imed Zitouni
In NIPS 2017, oral presentation
Corralling a Band of Bandit Algorithms
with Haipeng Luo, Behnam Neyshabur and Rob Schapire
In COLT 2017
Active Learning for Cost-Sensitive Classification
with Akshay Krishnamurthy, T-K Huang, Hal Daumé III and John Langford
In ICML 2017
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire
In ICML 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
with Yu-Xiang Wang and Miro Dudik
In ICML 2017
Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations
with Akshay Krishnamurthy and John Langford
In NIPS 2016
Efficient Second Order Online Learning by Sketching
with Haipeng Luo, Nicolo Cesa-Bianchi and John Langford
In NIPS 2016
Efficient Contextual Semi-Bandit Learning
with Akshay Krishnamurthy and Miro Dudik
In NIPS 2016
Fast Convergence of Regularized Learning in Games (Best paper award)
with Vasilis Syrgkanis, Haipeng Luo and Rob Schapire
In NIPS 2015
Efficient and Parsimonious Agnostic Active Learning
with T-K Huang, Daniel Hsu, John Langford and Rob Schapire
In NIPS 2015
Learning to Search Better Than Your Teacher
with Kai-Wei Chang, Akshay Krishnamurthy, Hal Daumé and John Langford
In ICML 2015
A Lower Bound for the Optimization of Finite Sums
with Léon Bottou
In ICML 2015
Scalable Nonlinear Learning with Adaptive Polynomial Expansions
with Alina Beygelzimer, Daniel Hsu, John Langford and Matus Telgarsky
In NIPS 2014
Learning sparsely used overcomplete dictionaries
with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon
In COLT 2014
Robust Multi-Objective Learning with Mentor Feedback
with Ashwinkumar BV, Miro Dudik, Rob Schapire and Alex Slivkins
In COLT 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
with Daniel Hsu, Satyen Kale, John Langford, Lihong Li and Rob Schapire
In ICML 2014
Least Squares Revisited: Scalable Approaches for Multi-class Prediction
with Sham Kakade, Nikos Karampatziakis, Le Song and Greg Valiant
In ICML 2014
Selective sampling algorithms for cost-sensitive multiclass prediction (long version with proofs)
In ICML 2013
Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions (Long version)
with Sahand Negahban and Martin Wainwright
In NIPS 2012
Contextual Bandit Learning with Predictable Rewards
with Miroslav Dudik, Satyen Kale, John Langford and Robert Schapire
In AISTATS 2012
Stochastic convex optimization with bandit feedback
with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin
In NIPS 2011
Distributed Delayed Stochastic Optimization (Long version)
with John Duchi
In NIPS 2011
Ergodic Subgradient Descent
with John Duchi, Mikael Johansson and Mike Jordan
In Allerton 2011
Learning with Missing Features
with Afshin Rostamizadeh and Peter Bartlett
In UAI 2011
Oracle inequalities for computationally budgeted model selection (Long version)
with John Duchi, Peter Bartlett and Clement Levrard
In COLT 2011
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
with Sahand Negahban and Martin Wainwright
In ICML 2011
DIStributed Dual Averaging In Networks
with John Duchi and Martin Wainwright
In NIPS 2010.
Convergence rates of gradient methods for high-dimensional statistical recovery
with Sahand Negahban and Martin Wainwright
In NIPS 2010, oral presentation
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback (longer version with additional proofs)
with Ofer Dekel and Lin Xiao
In COLT 2010.
Optimal Allocation Strategies for the Dark Pool Problem
with Peter Bartlett and Max Dama
In AISTATS 2010.
Information-theoretic lower bounds on the oracle complexity of convex optimization
with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright
In NIPS 2009.
A Stochastic View of Optimal Regret through Minimax Duality
with Jake Abernethy, Alexander Rakhlin and Peter Bartlett
arXiv preprint, short version appeared in COLT 2009.
Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes
with Pradeep Ravikumar and Martin Wainwright
In ICML 2008.
An Analysis of Inference with the Universum
with Fabian Sinze, Olivier Chapelle and Bernhard Schölkopf
In NIPS 2007
Learning Random Walks to Rank Nodes in Graphs
with Soumen Chakrabarti
In ICML 2007
Learning Parameters in Entity-relationship Graphs from Ranking Preferences
with Soumen Chakrabarti
In ECML/PKDD 2006
Learning to Rank Networked Entities
with Soumen Chakrabarti and Sunny Aggarwal
In SIGKDD 2006

Teaching

CSE 599: Reinforcement Learning and Bandits, taught at University of Washington in Spring 2019 with Sham Kakade.
Bandits and Reinforcement Learning, taught at Columbia University in Fall 2017 with Alex Slivkins.

Professional Activities

Program Chair for NeurIPS 2022.
Fundraising Chair for AISTATS 2016.
Co-organized NIPS 2015 workshop on Optimization for Machine Learning.
Co-organized NIPS 2014 workshop on Optimization for Machine Learning.
Co-organized NIPS 2013 workshop on Optimization for Machine Learning.
Co-organized NIPS 2013 workshop on Optimization for Machine Learning.
Co-organized NIPS 2012 workshop on Optimization for Machine Learning.
Co-organized NIPS 2011 workshop on Computational Trade-offs in Statistical Learning.
Co-organized NIPS 2010 workshop Learning on Cores, Clusters and Clouds.
Senior Area Chair: NeurIPS 2019, NeurIPS 2020.
Area chair or equivalent: ICML 2013-2020, NeurIPS 2013-2018, COLT 2013-2020, AISTATS 2013, NeurIPS 2013.
Journal Reviewing: JMLR, Annals of Statistics, IEEE Transcations on Automatic Control, IEEE Transcations on Info Theory, SIAM Journal on Optimization, Machine Learning.