I'm a final year PhD student in the computer science department at Stanford University
advised by Dorsa Sadigh. Previously, I was a research intern at Physical Intelligence.
and a student researcher in the Google Deepmind Robotics group. My research was supported by an NDSEG Fellowship.
I completed my undergrad at UC Berkeley where I worked with Professors Pieter Abbeel
and Lerrel Pinto.
Importance Weighted Retrieval (IWR) is a retrieval method that estimates importance weights, or the ratio between the target and prior data distributions, using Gaussian KDEs on top of existing retrieval methods.
We introduce a new method for robot data curation that uses mutual information to select the most informative demonstrations. We show that this approach outperforms existing methods in both simulation and real-world settings.
This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the
precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL),
uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to
learn behaviors that closely align with human expectations.
GVL poses value estimation as a temporal ordering problem over shuffled video frames;
this seemingly more challenging task encourages VLMs to more fully exploit their underlying semantic
and temporal grounding capabilities to differentiate frames based on their perceived task progress, consequently producing significantly better value predictions.
We introduce DITTO, an algorithm for few-shot adaptation or alignment of large language models using only two or three human demonstrations. We show that DITTO outperforms prior methods by a significant margin on automated evals and a real-world users study.
We introduce ReMix, a recipe for automatically curating data mixtures for training large scale robotic imitation learning policies. We demonstrate improvements over human-curated and uniform dataset mixtures.
Evaluating robot motions involves more than just the start and end states; it's about how the task is performed. We propose motion instruction fine-tuning (MotIF) and MotIF-1K dataset to improve VLMs' ability to understand nuanced robotic motions.
Through extensive experimentation and multiple model scales we characterize the over-optimization problem for direct alignment algorithms in large langauge models.
Reduces Reinforcement Learning from Human Feedback to contrastive learning under the regret model of human preferences, which has recently been shown to be more accurate than the widely accepted reward model. Unlike many RLHF methods, CPL is fully off-policy and works on arbitrary MDPs.
Approaches to preference-based RL typically work in two phases: first a reward function is learned, then it is maximized using a vanilla RL algorithm. We introduce the Inverse Preference Learning framework, where we directly learn a Q-function that models the user's preferences without explicitly learning a reward function.
We introduce DWSL, an algorithm for offline goal-conditioned reinforcement learning that uses only supervised objectives while still learning a constrained optimal policy. DWSL performs particularly well on high-dimensional image domains and seems robust to hyperparamters.
We introduce a novel framework for Q-learning that models the maximal soft-values without needing to sample from a policy and improves performance in online and offline RL settings.
Better robot strucutres hold the promise of better performance. We propose a new algorithm, TAME, that is able to evolve morphologies without any task specification. This is accomplished using an information theoretic objective that efficiently ranks morphologies based on their ability to explore and control their environment.
We propose transferring RL policies across agents using a hierarchical framework. Then, to remedy poor zero-shot transfer performance we introduce two additional imitation objectives.
We examine and compare three methods for explicitly disentangling learned latent representations in VAE models.
Awards
National Defense Science and Engineering Graduate Scholarship (NDSEG) 2021, roughly 5% selection rate.
Honorable mention for the 2021 CRA Outstanding Undergraduate Researcher Award
Highest Degree Honors in Engineering at UC Berkeley Spring 2021, top 3% of the graduating class.
UC Berkeley Regents and Chancellors Scholarship
Rambus Innovator of the Future 2017
Industry
Research Intern, Physical Intelligence
Summer 2025
Working on data collection, labeling, and training more reactive policies.
Student Researcher, Google DeepMind Robotics
Summer 2024
Working on the machine learning and robotics team.
Intern, Citadel Global Quantitative Strategies
Summer 2019
Developed C++ systems for trading APIs and monitoring systems. Worked on optimizing memory usage of large model training.
Intern, Intel Artificial Intelligence Group
Summer 2018
blog post
Worked on demo systems for Intel's OpenVino model optimization system in the AWS DeepLens. Explored systems for gradient based explanations of deep networks.