Nan Jiang (@nanjiang

Nan Jiang

2,586 posts

Nan Jiang

@nanjiang_cs

machine learning researcher, with focus on reinforcement learning. assoc prof @ uiuc cs. Course on RL theory (w/ videos): nanjiang.cs.illinois.edu/cs542

nanjiang.cs.illinois.edu

Joined November 2017

Pinned
Nan Jiang
@nanjiang_cs
Aug 14, 2020
Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING arxiv.org/abs/2008.04990 Exciting new work w/@tengyangx! 1/
Nan Jiang
@nanjiang_cs
Sep 4, 2020
after consulting my colleagues, I decided to make my 598 lectures publicly available. The video links can be found on the course website, or from this list (bit.ly/2F2L0Qi). just started proofs of VI and PI, and check out if you are interested in a stat theory of RL!
Nan Jiang
@nanjiang_cs
Aug 26, 2020
Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory rltheorybook.github.io. I am also teaching a phd seminar course on this topic (w/ recordings): nanjiang.cs.illinois.edu/cs598; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...
Nan Jiang
@nanjiang_cs
May 30, 2024
Translation: your junior faculty privileges will end soon…
46K
Nan Jiang
@nanjiang_cs
Mar 23, 2022
I received the NSF CAREER award. Each submission was month+ effort and I'm glad I get it the 2nd time. Also the detailed reviews & the process were not as delighting as the decision. Some experience & thoughts below: 1/
Nan Jiang
@nanjiang_cs
Jun 4, 2019
The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. arxiv.org/abs/1905.13341 Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x)
Nan Jiang
@nanjiang_cs
Sep 21, 2025
I was surprised by how many didnt know that (1) per token MLE is whole seq MLE, and (2) PG at token level same as PG at seq level (optimizkng one big combinatorial action). story is different if you introduce fitted critic/Q-values or intermediate resets.
Nando de Freitas
@NandoDF
Sep 21, 2025
Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum
109K
Nan Jiang
@nanjiang_cs
Jun 14, 2025
Re error propagation: if you believe model-based is a solution but also want the benefits of model-free, perhaps time to investigate (never thoroughly-studied) bellman-error minimization... BRM is, in a way, closer to model-based than TD (small revelation from my l4dc talk)
Seohong Park
@seohong_park
Jun 13, 2025
Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
29K
Nan Jiang
@nanjiang_cs
Apr 24, 2024
friends must have been bored of me saying this, but clearly not nearly enough ppl know this not all equations can be turned into an optimization loss
Kyunghyun Cho
@kchonyc
Apr 24, 2024
once @ylecun told me (heavily paraphrased), it's not F=ma but \min (F-ma)^2. i didn't realize its importance, but it is perhaps the most enlightning perspective i've ever heard.
30K
Nan Jiang
@nanjiang_cs
Jul 19, 2022
this paper got Outstanding Paper Award! Congrats to my coauthors (esp. Ching-An and Tengyang). More reasons to check out the details! List of all paper awards: icml.cc/virtual/2022/a…
Nan Jiang
@nanjiang_cs
Jul 18, 2022
Tmr @icmlconf 2:15pm R301, Ching-An will present our ATAC alg: w/ a clever transformation by PD lemma, we turn initial-state pessimistic term from our prior work into *relative* pess and smoothly bridge IL & offline RL, with robust improvement guarantees. icml.cc/Conferences/20…
Nan Jiang
@nanjiang_cs
Aug 26, 2020
Alekh, @ShamKakade6 and I have a (quite drafty) monograph on rl theory rltheorybook.github.io. I am also teaching a phd seminar course on this topic (w/ recordings): nanjiang.cs.illinois.edu/cs598; just did 1st lec 2h ago! still figuring out if I can share the videos publicly...
Marc G. Bellemare
@marcgbellemare
Aug 26, 2020
We have a monograph on deep reinforcement learning (google.com/search?q=an+in…) which covers some of the recent work. Otherwise, much of the non-deep RL work is theory, in which case I am not the expert but perhaps @nanjiang_cs has suggestions.
Nan Jiang
@nanjiang_cs
Jul 19, 2025
missing ICML, and I used this week to write my first technical blog on some recent thoughts on two different roles of simulators in RL and the confusions/misconceptions around them. Comments welcome! nanjiang.cs.illinois.edu/2025/07/16/sim…
11K
Nan Jiang
@nanjiang_cs
Sep 30, 2025
My 3rd blogpost on PG, the topic I am least familiar with but get asked a lot, so I thought I'd just put together the very limited stuff I know on this topic. Somehow the post gets cynical from time to time🙃 nanjiang.cs.illinois.edu/2025/09/29/pg.…
20K
Nan Jiang
@nanjiang_cs
May 27, 2023
Paper I've wanted to share for a while: model-free RL w/o value fns, but w/ *density estimators*! Featuring very unique *double-chain* error induction to overcome seemingly inevitable error exponentiation. Jt w/ students Audrey Huang and Jinglin Chen arxiv.org/abs/2302.02252 1/
37K
Nan Jiang
@nanjiang_cs
Nov 29, 2023
As semester draws to end, I want to share this *identity* (h/t @tengyangx) that connects so many fundamental pieces of the RL theory together: optimism, pessimism, policy opt, proved by PD lemma + Bellman-error telescoping, all in one equation! 1/3
17K