Publications
RL theory monograph: A monograph on RL theory based on notes from courses taught by Nan Jiang at UIUC and together with Sham Kakade at UW. The notes are being actively updated, and any feedback, typos etc. are welcome.
Ph.D. Thesis
Recent preprints
Journal Publications
Practical Evaluation and Optimization of Contextual Bandit Algorithms with Alberto Bietti and John Langford To appear in Journal of Machine Learning Research.
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift with Sham Kakade, Jason Lee and Gaurav Mahajan In Journal of Machine Learning Research, Vol. 22, 2020.
Active Learning for Cost-Sensitive Classification with Akshay Krishnamurthy, T.-K. Huang, Hal Daumé and John Langford In Journal of Machine Learning Research, Vol. 20, 2019.
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon to appear in SIAM Journal of Optimization.
Exact Recovery of Sparsely Used Overcomplete Dictionaries with Anima Anandkumar and Praneeth Netrapalli In IEEE Transactions on Information Theory, Vol. 63, Issue 1, 2017.
A Reliable Effective Terascale Linear Learning System with Olivier Chappelle, Miroslav Dudik and John Langford In Journal of Machine Learning Research, Vol. 15, 2014.
The Generalization Ability of Online Algorithms for Dependent Data with John Duchi In IEEE Transactions on Information Theory, Vol. 59, Issue 1, 2013.
Stochastic convex optimization with bandit feedback with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin In SIAM Journal on Optimization, Vol. 23, Issue 1, 2013.
Ergodic Mirror Descent with John Duchi, Mikael Johansson and Mike Jordan In SIAM Journal on Optimization, Vol. 22, Issue 4, 2012.
Fast global convergence of gradient methods for high-dimensional statistical recovery with Sahand Negahban and Martin Wainwright In The Annals of Statistics, Vol. 40, Number 5, 2012.
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions (Annals formatted version) with Sahand Negahban and Martin Wainwright In The Annals of Statistics, Vol. 40, Number 2, July 2012.
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright In IEEE Transcations on Information Theory, Vol 58, Issue 5, May 2012.
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling with John Duchi and Martin Wainwright In IEEE Transactions on Automatic Control, Vol. 57, Issue 3, 2012.
Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes with Pradeep Ravikumar and Martin Wainwright In Journal Of Machine Learning Research, Vol. 11, 2010.
Conference Publications (see Google Scholar page for most updated version)
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling with Tong Zhang. In COLT 2022
Minimax Regret Optimization for Robust Machine Learning under Distribution Shift with Tong Zhang. In COLT 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning (Outstanding paper award) with Ching-An Cheng, Tengyang Xie and Nan Jiang. In ICML 2022
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach with Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang and Wen Sun. In ICML 2022
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics with Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy and John Langford. In ICLR 2022
Bellman-consistent Pessimism for Offline Reinforcement Learning with Tengyang Xie, Ching-An Cheng, Nan Jiang and Paul Mineiro. In NeurIPS 2021
Provably Correct Optimization and Exploration with Non-linear Policies with Fei Feng, Lin Yang and Wotao Yin. In ICML 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation with Andrea Zanette and Ching-An Cheng. In COLT 2021
Towards a Dimension-Free Understanding of Adaptive Linear Control with Juan Perdomo, Max Simchowitz and Peter Bartlett. In COLT 2021
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning with Mikael Henaff, Sham Kakade and Wen Sun. In NeurIPS 2020
Provably Good Batch Reinforcement Learning Without Great Exploration with Yao Liu, Adith Swaminathan and Emma Brunskill. In NeurIPS 2020
Policy Improvement from Multiple Experts with Ching-An Cheng and Andrey Kolobov. In NeurIPS 2020
Safe Reinforcement Learning via Curriculum Induction with Matteo Turchetta, Andrey Kolobov, Shital Shah and Andreas Krause. In NeurIPS 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs with Sham Kakade, Akshay Krishnamurthy and Wen Sun. In NeurIPS 2020, oral presentation
Taking a hint: How to leverage loss predictors in contextual bandits? with Chen-Yu Wei and Haipeng Luo. In COLT 2020
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift with Sham Kakade, Jason Lee and Gaurav Mahajan In COLT 2020
On the Optimality of Sparse Model-Based Planning for Markov Decision Processes with Sham Kakade and Lin Yang. In COLT 2020
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds with Jordan Ash, Chicheng Zhang, Akshay Krishnamurthy and John Langford. In ICLR 2020
Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations with Aditya Modi, Debadeepta Dey, Adith Swaminathan, Besmira Nushi, Sean Andrist and Eric Horvitz. In AAAI 2020
Off-Policy Policy Gradient with State Distribution Correction with Yao Liu, Adith Swaminathan and Emma Brunskill. In UAI 2019
Fair Regression: Quantitative Definitions and Reduction-based Algorithms with Steven Wu and Miro Dudik. In ICML 2019
Provably efficient RL with Rich Observations via Latent State Decoding with Simon Du, Akshay Krishnamurthy, Nan Jiang, Miro Dudik and John Langford. In ICML 2019
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback with Chicheng Zhang, Hal Daumé, John Langford and Sahand Negahban. In ICML 2019
Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches with Wen Sun, Nan Jiang, Akshay Krishnamurthy and John Langford. In COLT 2019
On Polynomial Time PAC Reinforcement Learning with Rich Observations with Christoph Dann, Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire. In NeurIPS 2018
A Reductions Approach to Fair Classification with Alina Beygelzimer, Miro Dudik, John Langford and Hanna Wallach. In ICML 2018
Practical Contextual Bandits with Regression Oracles with Dylan Foster, Haipeng Luo, Miro Dudik and Rob Schapire. In ICML 2018
Hierarchical Imitation and Reinfocement Learning with Hoang Le, Nan Jiang, Miro Dudik, Yisong Yue and Hal Daumé. In ICML 2018
Efficient Contextual Bandits in Non-stationary Worlds with Haipeng Luo, Chen-Yu Wei and John Langford. In COLT 2018
Off-policy evaluation for slate recommendation with Adith Swaminathan, Akshay Krishnamurthy, Miro Dudik, John Langford, Damien Jose and Imed Zitouni In NIPS 2017, oral presentation
Corralling a Band of Bandit Algorithms with Haipeng Luo, Behnam Neyshabur and Rob Schapire In COLT 2017
Active Learning for Cost-Sensitive Classification with Akshay Krishnamurthy, T-K Huang, Hal Daumé III and John Langford In ICML 2017
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire In ICML 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits with Yu-Xiang Wang and Miro Dudik In ICML 2017
Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations with Akshay Krishnamurthy and John Langford In NIPS 2016
Efficient Second Order Online Learning by Sketching with Haipeng Luo, Nicolo Cesa-Bianchi and John Langford In NIPS 2016
Efficient Contextual Semi-Bandit Learning with Akshay Krishnamurthy and Miro Dudik In NIPS 2016
Fast Convergence of Regularized Learning in Games (Best paper award ) with Vasilis Syrgkanis, Haipeng Luo and Rob Schapire In NIPS 2015
Efficient and Parsimonious Agnostic Active Learning with T-K Huang, Daniel Hsu, John Langford and Rob Schapire In NIPS 2015
Learning to Search Better Than Your Teacher with Kai-Wei Chang, Akshay Krishnamurthy, Hal Daumé and John Langford In ICML 2015
A Lower Bound for the Optimization of Finite Sums with Léon Bottou In ICML 2015
Scalable Nonlinear Learning with Adaptive Polynomial Expansions with Alina Beygelzimer, Daniel Hsu, John Langford and Matus Telgarsky In NIPS 2014
Learning sparsely used overcomplete dictionaries with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon In COLT 2014
Robust Multi-Objective Learning with Mentor Feedback with Ashwinkumar BV, Miro Dudik, Rob Schapire and Alex Slivkins In COLT 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits with Daniel Hsu, Satyen Kale, John Langford, Lihong Li and Rob Schapire In ICML 2014
Least Squares Revisited: Scalable Approaches for Multi-class Prediction with Sham Kakade, Nikos Karampatziakis, Le Song and Greg Valiant In ICML 2014
Selective sampling algorithms for cost-sensitive multiclass prediction (long version with proofs) In ICML 2013
Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions (Long version ) with Sahand Negahban and Martin Wainwright In NIPS 2012
Contextual Bandit Learning with Predictable Rewards with Miroslav Dudik, Satyen Kale, John Langford and Robert Schapire In AISTATS 2012
Stochastic convex optimization with bandit feedback with Dean Foster, Daniel Hsu, Sham Kakade and Alexander Rakhlin In NIPS 2011
Distributed Delayed Stochastic Optimization (Long version ) with John Duchi In NIPS 2011
Ergodic Subgradient Descent with John Duchi, Mikael Johansson and Mike Jordan In Allerton 2011
Learning with Missing Features with Afshin Rostamizadeh and Peter Bartlett In UAI 2011
Oracle inequalities for computationally budgeted model selection (Long version ) with John Duchi, Peter Bartlett and Clement Levrard In COLT 2011
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions with Sahand Negahban and Martin Wainwright In ICML 2011
DIStributed Dual Averaging In Networks with John Duchi and Martin Wainwright In NIPS 2010.
Convergence rates of gradient methods for high-dimensional statistical recovery with Sahand Negahban and Martin Wainwright In NIPS 2010, oral presentation
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback (longer version with additional proofs) with Ofer Dekel and Lin Xiao In COLT 2010.
Optimal Allocation Strategies for the Dark Pool Problem with Peter Bartlett and Max Dama In AISTATS 2010.
Information-theoretic lower bounds on the oracle complexity of convex optimization with Peter Bartlett, Pradeep Ravikumar and Martin Wainwright In NIPS 2009.
A Stochastic View of Optimal Regret through Minimax Duality with Jake Abernethy, Alexander Rakhlin and Peter Bartlett arXiv preprint, short version appeared in COLT 2009.
Message-passing for graph structured linear programs: Proximal projections, convergence and rounding schemes with Pradeep Ravikumar and Martin Wainwright In ICML 2008.
An Analysis of Inference with the Universum with Fabian Sinze, Olivier Chapelle and Bernhard Schölkopf In NIPS 2007
Learning Random Walks to Rank Nodes in Graphs with Soumen Chakrabarti In ICML 2007
Learning Parameters in Entity-relationship Graphs from Ranking Preferences with Soumen Chakrabarti
In ECML/PKDD 2006
Learning to Rank Networked Entities with Soumen Chakrabarti and Sunny Aggarwal
In SIGKDD 2006
Teaching
CSE 599: Reinforcement Learning and Bandits , taught at University of Washington in Spring 2019 with Sham Kakade .
Bandits and Reinforcement Learning , taught at Columbia University in Fall 2017 with Alex Slivkins .
Professional Activities
Program Chair for NeurIPS 2022 .
Fundraising Chair for AISTATS 2016.
Co-organized NIPS 2015 workshop on Optimization for Machine Learning .
Co-organized NIPS 2014 workshop on Optimization for Machine Learning .
Co-organized NIPS 2013 workshop on Optimization for Machine Learning .
Co-organized NIPS 2013 workshop on Optimization for Machine Learning .
Co-organized NIPS 2012 workshop on Optimization for Machine Learning .
Co-organized NIPS 2011 workshop on Computational Trade-offs in Statistical Learning .
Co-organized NIPS 2010 workshop Learning on Cores, Clusters and Clouds .
Senior Area Chair: NeurIPS 2019, NeurIPS 2020.
Area chair or equivalent: ICML 2013-2020, NeurIPS 2013-2018, COLT 2013-2020, AISTATS 2013, NeurIPS 2013.
Journal Reviewing: JMLR, Annals of Statistics, IEEE Transcations on Automatic Control, IEEE Transcations on Info Theory, SIAM Journal on Optimization, Machine Learning.