Qinghua Liu

I study training models at OpenAI.

I received my Ph.D. from Princeton, advised by Chi Jin. My PhD research focused on fundamental theories in reinforcement learning (RL), including partially observable RL (POMDP, PSR, Survey), multi-agent RL (Stochastic Game), and RL with large state spaces (Function Approximation). After that, I spent a wonderful year as a postdoctoral researcher at Microsoft Research NYC, where I explored language model research.

During summer 2022, I interned at DeepMind, working with Csaba Szepesvári and Gellért Weisz. Previously, I received a B.E. degree in Electrical Engineering and a B.S. degree in Mathematics from Tsinghua University.

[Google Scholar]

Models

I've co-trained GPT-5–series models (e.g. GPT-5, GPT-5.1) for multi-personality / personalization. Highlighted in

Selected Papers (Show All)

(α-β order) denotes alphabetical authorship ordering

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making
Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvári, Chi Jin
Symposium on Theory of Computing (STOC), 2023

V-Learning – A Simple, Efficient, Decentralized Algorithm for Multiagent RL
(α-β order) Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
Mathematics of Operations Research (MOR), 2023
Best Paper in ICLR 2022 ‘‘Gamification and Multiagent Solutions’’ Workshop

When Is Partially Observable Reinforcement Learning Not Scary?
Qinghua Liu, Alan Chung, Csaba Szepesvári, Chi Jin
Conference on Learning Theory (COLT), 2022

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
(α-β order) Chi Jin, Qinghua Liu, Sobhan Miryoosefi
Neural Information Processing Systems (NeurIPS), 2021 (Spotlight)