In the past, I have interned at FAIR Paris (with Remi Munos), Amazon NYC (with Udaya Ghai
and Dean Foster), and Microsoft Research NYC (with Akshay Krishnamurthy
and Dylan Foster).
I finished my master degree in MLD,
advised by Kris Kitani.
I completed my undergraduate at UC San Diego with CS and Math majors and
I was advised by Sicun Gao.
I am interested in the theory, science and application of interactive decision-making.
My current study focuses on when and how we can achieve efficient and robust learning, from thinking about the three foundations of interactive decision-making: data, environment and interaction protocol. I am also interested in the application of principled decision-making algorithms in large-scale real-world applications, such as generative models and robotics.
We prove that offline contrastive-based method (e.g., DPO)
requires a stronger coverage property than online RL-based method (e.g., RLHF). We propose
Hybrid Preference Optimization to combine the benefits of both offline and online methods.
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations, but the environment is governed by low-dimensional latent states and Lipschitz continuous dynamics.
We prove the benefit of representation learning on diverse source environments which allows efficient learning on the
source environment with the learned representation under the low-rank MDPs setting.
Combining online data and offline data can solve RL with both statistical and computation efficiency.
Experiments on Montezuma's Revenge reveals that hybrid RL works much better
than pure online RL and pure offline RL.
An efficient rich-observation RL algorithm that learns to decode from rich observations to latent states
(via adversarial training), while balancing exploration and exploitation.
A simple provably efficient model-based algorithm that achieves competitive performance in both dense reward
continuous control tasks and sparse reward control tasks that require efficient exploration.
We study Sim-to-Real/policy transfer/policy adaptation under a model-based framework
resulting an algorithm that enjoyes strong theoretical guarantees
and excellent empirical performance.
Teaching Assistant
UCSD CSE291: Topics in Search and Optimization (Winter 2020)
UCSD CSE154: Deep Learning (Fall 2019)
UCSD CSE150: Introduction to AI: Search and Reasoning (Winter 2019, Spring 2020)
UCSD CSE30: Computer Organization and Systems Programming (Spring 2019, Winter 2018)