Frederik Ebert

I earned my PhD in Computer Science from UC Berkeley in May 2022, where I was advised by Prof. Sergey Levine and Prof. Chelsea Finn (Stanford CS department). My research at the Berkeley Artificial Intelligence Laboratory (BAIR) focused on data-driven robotic manipulation, emphasizing generalizable learning across tasks, environments, and embodiments using large-scale data, deep generative models, imitation learning, reinforcement learning, tactile sensing, and hierarchical control.

After graduating, I founded Emancro, where I led the development of learning-based robotic systems in real-world hospital settings, gaining first-hand insight into the limitations of current robotic foundation models under distribution shift, long-horizon tasks, and reliability constraints.

I am broadly interested in developing scalable robotic foundation models and world-model-based control systems that leverage diverse, multimodal interaction data to enable fast, reliable, and robust manipulation in complex real-world environments.

Publications

	Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials Aviral Kumar, Anikait Singh, Frederik Ebert, Mitsuhiko Nakamoto, Yanlai Yang, Chelsea Finn, Sergey Levine, Robotics Science and Systems*, 2023 arXiv / project website This paper introduces Pre-Training for Robots (PTR), an offline reinforcement learning framework that leverages large, diverse robot datasets and as few as 10 task-specific demonstrations to rapidly adapt to new tasks in unseen environments as well as allowing on-line RL fine-tuning in the real world.
	Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Dataset Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Chelsea Finn, Sergey Levine, Robotics Science and Systems, 2022 arXiv / project website / blog post We collected a large multi-task, multi-domain demonstration dataset, which we call the Bridge Dataset consisting of more than 7300 demonstrations, of 73 different tasks in 10 different toy-kitchen environments. Training on this data produces signifcantly more generalizeable policies and allows transferring tasks from the bridge data to a target domain.
	Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors Karl Pertsch, Oleh Rybkin, Frederik Ebert, Chelsea Finn, Dinesh Jayaraman, Sergey Levine Conference on Neural Information Processing Systems (NeurIPS), 2021. arXiv / project website We propose a hierarchical prediction model that predicts sequences by recursive infilling. We use this model to devise a hierarchical planning approach that allows to scale visual MPC to long-horizon tasks with hundreds of time steps.
	OmniTact: A Multi-Directional High Resolution Touch Sensor Akhil Padmanabha, Frederik Ebert, Stephen Tian, Roberto Calandra, Sergey Levine, Chelsea Finn IEEE International Conference on Robotics and Automation (ICRA), 2020 arXiv / project website / blog post OmniTact is a novel high-resolution multi-directional tactile sensor based on the Gelsight sensor. We show that the omnidirectional sensing capabilities allow inserting an eletrical connector purely based on the sense of touch.
	RoboNet: Large-Scale Multi-Robot Learning Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn Conference on Robot Learning (CoRL), 2019 arXiv / project website RoboNet is a large-scale database for sharing robotic experience across different robots for learning generalizable skills. RoboNet contains datafrom 7 different robot platforms and allows transferring skill and dynamics models between different robots.
	Time-Agnostic Prediction: Predicting Predictable Video Frames Dinesh Jayaraman, Frederik Ebert, Alexei A. Efros Sergey Levine International Conference on Learning Representations (ICLR), 2019 arXiv / video results and data Time agnostic prediction (TAP) is a method for predicting intermediate images in between a start frame and a goal frame for the purpose of planning. Instead of predicting at fixed time-intervals the optimizer chooses the optimal time-step.
	Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight Annie Xie, Frederik Ebert, Sergey Levine, Chelsea Finn Robotics: Science and Systems (RSS) , 2019 arXiv / video results and data / blog post We combine diverse demonstration data with self-supervised interaction data, aiming to leverage the interaction data to build generalizable models and the demonstration data to guide the model-based RL planner to solve complex tasks.
	Manipulation by Feel: Touch-Based Control with Deep Predictive Models Stephen Tian, Frederik Ebert, Dinesh Jayaraman, Dinesh Jayaraman,Mayur Mudigonda, Chelsea Finn,Roberto Calandra ,Sergey Levine (* equal contribution) IEEE International Conference on Robotics and Automation (ICRA) , 2019 arXiv / video results and data / blog post We propose deep tactile MPC, a framework for learning to perform tactile servoing from raw tactile sensor inputs, without manual supervision. We show that this method enables a robot equipped with a GelSight-style tactile sensor to manipulate a ball and 20-sided die
	Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning Frederik Ebert, Sudeep Dasari, Alex Lee, Sergey Levine, Chelsea Finn Conference on Robot Learning (CoRL), 2018 arXiv / code / video results and data To enable a robot to continuously retry a task, we devise a self-supervised algorithm for learning image registration, which can keep track of objects of interest for the duration of the trial. We demonstrate that this idea can be combined with a video-prediction based controller to enable complex behaviors to be learned from scratch using only raw visual inputs, including grasping, repositioning objects, and non-prehensile manipulation.
	Self-Supervised Visual Planning with Temporal Skip Connections Frederik Ebert, Chelsea Finn, Alex Lee, Sergey Levine Conference on Robot Learning (CoRL), 2017 (Long Talk) arXiv / code / video results and data We present three simple improvements to self-supervised visual foresight algorithm that lead to substantially better visual planning capabilities. Our method can perform tasks that require longer-term planning and involve multiple objects.