Frederik Ebert

I earned my PhD in Computer Science from UC Berkeley in May 2022, where I was advised by Prof. Sergey Levine and Prof. Chelsea Finn (Stanford CS department). My research at the Berkeley Artificial Intelligence Laboratory (BAIR) focused on data-driven robotic manipulation, emphasizing generalizable learning across tasks, environments, and embodiments using large-scale data, deep generative models, imitation learning, reinforcement learning, tactile sensing, and hierarchical control.

After graduating, I founded Emancro, where I led the development of learning-based robotic systems in real-world hospital settings, gaining first-hand insight into the limitations of current robotic foundation models under distribution shift, long-horizon tasks, and reliability constraints.

I am broadly interested in developing scalable robotic foundation models and world-model-based control systems that leverage diverse, multimodal interaction data to enable fast, reliable, and robust manipulation in complex real-world environments.

Publications

Image
Image

Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials
Aviral Kumar*, Anikait Singh*, Frederik Ebert*, Mitsuhiko Nakamoto, Yanlai Yang, Chelsea Finn, Sergey Levine,
Robotics Science and Systems, 2023
arXiv / project website

This paper introduces Pre-Training for Robots (PTR), an offline reinforcement learning framework that leverages large, diverse robot datasets and as few as 10 task-specific demonstrations to rapidly adapt to new tasks in unseen environments as well as allowing on-line RL fine-tuning in the real world.

Image
Image

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Dataset
Frederik Ebert*, Yanlai Yang*, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Chelsea Finn, Sergey Levine,
Robotics Science and Systems, 2022
arXiv / project website / blog post

We collected a large multi-task, multi-domain demonstration dataset, which we call the Bridge Dataset consisting of more than 7300 demonstrations, of 73 different tasks in 10 different toy-kitchen environments. Training on this data produces signifcantly more generalizeable policies and allows transferring tasks from the bridge data to a target domain.

Image

Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
Karl Pertsch*, Oleh Rybkin*, Frederik Ebert, Chelsea Finn, Dinesh Jayaraman, Sergey Levine
Conference on Neural Information Processing Systems (NeurIPS), 2021. arXiv / project website

We propose a hierarchical prediction model that predicts sequences by recursive infilling. We use this model to devise a hierarchical planning approach that allows to scale visual MPC to long-horizon tasks with hundreds of time steps.

Image
Image

OmniTact: A Multi-Directional High Resolution Touch Sensor
Akhil Padmanabha, Frederik Ebert, Stephen Tian, Roberto Calandra, Sergey Levine, Chelsea Finn
IEEE International Conference on Robotics and Automation (ICRA), 2020
arXiv / project website / blog post

OmniTact is a novel high-resolution multi-directional tactile sensor based on the Gelsight sensor. We show that the omnidirectional sensing capabilities allow inserting an eletrical connector purely based on the sense of touch.

Image
Image

RoboNet: Large-Scale Multi-Robot Learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn
Conference on Robot Learning (CoRL), 2019
arXiv / project website

RoboNet is a large-scale database for sharing robotic experience across different robots for learning generalizable skills. RoboNet contains datafrom 7 different robot platforms and allows transferring skill and dynamics models between different robots.

Image
Image

Time-Agnostic Prediction: Predicting Predictable Video Frames
Dinesh Jayaraman, Frederik Ebert, Alexei A. Efros Sergey Levine
International Conference on Learning Representations (ICLR), 2019
arXiv / video results and data

Time agnostic prediction (TAP) is a method for predicting intermediate images in between a start frame and a goal frame for the purpose of planning. Instead of predicting at fixed time-intervals the optimizer chooses the optimal time-step.

Image
Image

Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight
Annie Xie, Frederik Ebert, Sergey Levine, Chelsea Finn
Robotics: Science and Systems (RSS) , 2019
arXiv / video results and data / blog post

We combine diverse demonstration data with self-supervised interaction data, aiming to leverage the interaction data to build generalizable models and the demonstration data to guide the model-based RL planner to solve complex tasks.

Image
Image

Manipulation by Feel: Touch-Based Control with Deep Predictive Models
Stephen Tian*, Frederik Ebert*, Dinesh Jayaraman, Dinesh Jayaraman,Mayur Mudigonda, Chelsea Finn,Roberto Calandra ,Sergey Levine (* equal contribution)
IEEE International Conference on Robotics and Automation (ICRA) , 2019
arXiv / video results and data / blog post

We propose deep tactile MPC, a framework for learning to perform tactile servoing from raw tactile sensor inputs, without manual supervision. We show that this method enables a robot equipped with a GelSight-style tactile sensor to manipulate a ball and 20-sided die

Image
Image

Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning
Frederik Ebert, Sudeep Dasari, Alex Lee, Sergey Levine, Chelsea Finn Conference on Robot Learning (CoRL), 2018
arXiv / code / video results and data

To enable a robot to continuously retry a task, we devise a self-supervised algorithm for learning image registration, which can keep track of objects of interest for the duration of the trial. We demonstrate that this idea can be combined with a video-prediction based controller to enable complex behaviors to be learned from scratch using only raw visual inputs, including grasping, repositioning objects, and non-prehensile manipulation.

Image
Image

Self-Supervised Visual Planning with Temporal Skip Connections
Frederik Ebert, Chelsea Finn, Alex Lee, Sergey Levine
Conference on Robot Learning (CoRL), 2017 (Long Talk)
arXiv / code / video results and data

We present three simple improvements to self-supervised visual foresight algorithm that lead to substantially better visual planning capabilities. Our method can perform tasks that require longer-term planning and involve multiple objects.

© 2020 Frederik Ebert