I'm a fifth-year Ph.D. student in the Department of Automation at Tsinghua University, advised by Prof. Xiangyang Ji.
My research focuses on Reinforcement Learning and Large Reasoning Models.
I work with the THU-IDM team, where we develop efficient algorithms for decision-making.
Prior to my doctoral studies, I received my Bachelor's degree in Physics from Zhiyuan Honor College at Shanghai Jiao Tong University.
[2026-02] I am currently exploring job opportunities. If my experience and research interests align with your needs, I would be delighted to connect.
[2021-09] Started my Ph.D. journey at Tsinghua University.
Research
I'm interested in Reinforcement Learning and Large Language Models. My research focuses on efficient and intelligent decision-making with minimal environment interactions.
Online predicts and selects informative prompts prior to rollout by inferring their learning dynamics, accelerating RL finetuning of large reasoning models.
We propose an easy-to-implement method, referred to as Posterior and Diversity Synergized Task Sampling (PDTS), to accommodate fast and robust sequential decision-making.
We consider explicitly generative modeling task distributions placed over task identifiers and propose robustifying fast adaptation from adversarial training.
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk landscape, providing a theoretical foundation for robust active task sampling.
This work introduces Model Predictive Prompt Selection, a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions.