Yuda Song

Hi, I am a 4th year PhD student at Machine Learning Department, Carnegie Mellon University. I am advise by Aarti Singh and Drew Bagnell. I am also closely working with Wen Sun. I am partially supported by Two Sigma PhD fellowship.

In the past, I have interned at FAIR Paris (with Remi Munos), Amazon NYC (with Udaya Ghai and Dean Foster), and Microsoft Research NYC (with Akshay Krishnamurthy and Dylan Foster). I finished my master degree in MLD, advised by Kris Kitani. I completed my undergraduate at UC San Diego with CS and Math majors and I was advised by Sicun Gao.

My email is yudas at andrew dot cmu dot edu.

Google Scholar / CV / Github / Twitter

Research

I am interested in the theory, science and application of interactive decision-making. My current study focuses on when and how we can achieve efficient and robust learning, from thinking about the three foundations of interactive decision-making: data, environment and interaction protocol. I am also interested in the application of principled decision-making algorithms in large-scale real-world applications, such as generative models and robotics.

Publications

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
Yuda Song, Dhruv Rohatgi, Aarti Singh, J Andrew Bagnell
Neurips, 2025

Outcome-based Exploration for LLM Reasoning
Yuda Song, Julia Kempe, Remi Munos
Neurips ARLET workshop, 2025

Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou, Yuda Song, Andrea Zanette
ICML, 2025

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Yuda Song, Hanlin Zhang, Udaya Ghai, Carson Eisenach, Sham M. Kakade, Dean Foster
ICLR, 2025

The Importance of Online Data: Understanding Preference Fine-Tuning Through the Lens of Coverage
Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun
NeurIPS, 2024.

We prove that offline contrastive-based method (e.g., DPO) requires a stronger coverage property than online RL-based method (e.g., RLHF). We propose Hybrid Preference Optimization to combine the benefits of both offline and online methods.

Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song, J. Andrew Bagnell, Aarti Singh
ICML, 2024.

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Yuda Song, Lili Wu, Dylan J. Foster, Akshay Krishnamurthy
ICML, 2024.

We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations, but the environment is governed by low-dimensional latent states and Lipschitz continuous dynamics.

Provable Benefits of Representational Transfer in Reinforcement Learning
(alphabetical order) Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
COLT, 2023.
[code]

We prove the benefit of representation learning on diverse source environments which allows efficient learning on the source environment with the learned representation under the low-rank MDPs setting.

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury
ICML, 2023.
[code]

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
Yuda Song*, Yifei Zhou*, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun
ICLR, 2023.
[code] [Talk at RL theory seminars]

Combining online data and offline data can solve RL with both statistical and computation efficiency. Experiments on Montezuma's Revenge reveals that hybrid RL works much better than pure online RL and pure offline RL.

Representation Learning for General-sum Low-rank Markov Games
Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
ICLR, 2023.

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
ICML, 2022.
[code] [Talk at RL theory seminars]

An efficient rich-observation RL algorithm that learns to decode from rich observations to latent states (via adversarial training), while balancing exploration and exploitation.

No-regret Model-Based Meta RL for Personalized Navigation
Yuda Song, Ye Yuan, Wen Sun, Kris Kitani
L4DC, 2022.

Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design
Ye Yuan, Yuda Song, Zhengyi Luo, Wen Sun, Kris Kitani
ICLR, 2022.
[Project Page]

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
Yuda Song, Wen Sun
ICML, 2021
[code]

A simple provably efficient model-based algorithm that achieves competitive performance in both dense reward continuous control tasks and sparse reward control tasks that require efficient exploration.

Provably Efficient Model-based Policy Adaptation
Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao
ICML, 2020
[Project Page] [code]

We study Sim-to-Real/policy transfer/policy adaptation under a model-based framework resulting an algorithm that enjoyes strong theoretical guarantees and excellent empirical performance.

Teaching Assistant

UCSD CSE291: Topics in Search and Optimization (Winter 2020)

UCSD CSE154: Deep Learning (Fall 2019)

UCSD CSE150: Introduction to AI: Search and Reasoning (Winter 2019, Spring 2020)

UCSD CSE30: Computer Organization and Systems Programming (Spring 2019, Winter 2018)

UCSD CSE11: Introduction to CS & OOP (Fall 2018)

Service

Reviewer: ICML (2021-), NeurIPS (2021-), ICLR (2022-), AAAI (2021-2022).

Top Reviewer: NeurIPS 2022

Source code from here