Jack (Hao) Bai
haob2 AT illinois DOT edu
Hi there! I’m Jack. I’m a third-year Ph.D. student at UIUC CS, advised by Prof. Tong Zhang. I work closely with Prof. Aviral Kumar @ CMU MLD. I am an incoming research intern at NVIDIA, managed by Prof. Yejin Choi.
Recently, I research on fundamental questions on vision-language model reasoning in multi-step environments, modernly named “agents”, with reinforcement learning. I tackle problems with both empirical insights and theoretical considerations.
I was previously a visiting scholar advised by Sergey Levine @ BAIR, and a research intern at Microsoft Research. I received my dual undergrad degree from UIUC and Zhejiang University. During those wonderful years, I was lucky enough to have worked with great minds like Yi Ma @ BAIR and Chengxiang Zhai @ UIUC.
In my free time, I study music theory, majoring in chord progression.
A public up-to-date resume can be found here.
News
| Mar 08, 2026 | Our paper WebGym has been accepted to CVPR 2026! Check out the paper on ArXiv and the project page. |
|---|---|
| Jan 09, 2026 | Today, we proudly announce the release of WebGym, the largest yet open-source RL training environment for visual web agents. The preprint can be accessed at ArXiv. We proposed (1) the RL framework with highest rollout speed, (2) recipe that supports training agents on long-horizon tasks, and (3) scaling dimensions that effectively improves the RL performance with the task set proposed. |
| Jun 11, 2025 | My first paper on web agents with RL, TTI is released! Check out the preprint! I am super proud of this work and believe it will lead to a shift of paradigm in multi-step agent reasoning with RL+VLM. |
Research Blogs
| Mar 11, 2026 | rl What Does Flow-Matching Bring to Deep RL? |
|---|---|
| Feb 15, 2026 | rl Generalizable Value Functions and Emotions (?) |
| Jan 09, 2026 | rl How to Use Privileged Information in RL |
| Oct 01, 2025 | agent Position: Why Web is a Good Environment to Study RL? |
| Sep 01, 2025 | llm Pretraining, Post-training, and Test-Time Reasoning |
| Aug 07, 2025 | rl Challenges in Scaling Q-Learning |
| Jul 22, 2025 | agent Are Multi-step Agents Overthinking? |
| May 27, 2025 | rl Policy Optimization without a Critic: The GRPO Family |
| Mar 15, 2025 | rl Can Language Models Be Critic Functions? |
| Oct 22, 2024 | rl RL on Language under Single-step Settings |
| Aug 01, 2024 | llm LLM Optimization Basics: Memory |
| Jun 15, 2024 | llm LLM Optimization Basics: Time |
| May 22, 2024 | rl Importance Sampling: Why and How |
| Mar 13, 2024 | rl Policy Gradient and Actor-Critic |
| Jun 07, 2023 | llm Self-Attention Layer and The Transformers Architecture |
Big Minds
| Mar 23, 2026 | Jensen Huang: NVIDIA and the AI Revolution |
|---|---|
| Nov 25, 2025 | Ilya Sutskever: From the Age of Scaling to the Age of Research |
| Aug 15, 2023 | Ilya Sutskever: An Observation on Generalization |
| Feb 01, 2018 | Ilya Sutskever: Meta Learning and Self Play |
Theory Study
| Feb 12, 2026 | music The Pentatonic Scale |
|---|---|
| Dec 13, 2025 | music Non-Diatonic Notes |
| Sep 18, 2025 | phil Foundations of Reductionism |
| Aug 24, 2025 | music Jazz Chords and Their Variants |
| Jul 04, 2025 | info Kolmogorov Complexity |
| Jun 13, 2025 | music The Komuro Progression |
Selected Publications
- EMNLP’23