Harry H. Zhang

I am a graduate student in the SPARK Lab of MIT LIDS. I am extremely fortunate to be advised by Prof. Luca Carlone.

Prior to MIT, I was a MS-Research student in the CMU Robotics Institute studying Artificial Intelligence and Robotics, advised by Prof. David Held. I also worked at Amazon as an Applied Scientist II.

Prior to CMU, I earned my B.S. (2017-2021) with Honors from UC Berkeley with a major in EECS and a minor in Mechanical Engineering. During my time at Berkeley, I did research under Prof. Ken Goldberg and Dr. Jeffrey Ichnowski in AUTOLab. I maintain and curate a popular deep reinforcement learning tutorial on my Github.

Outside of school, I do quantitative finance.

Email  /  Google Scholar  /  Github

profile photo
News and Updates

In reverse chronological order:

  • May. 2025: CUPS accepted to ICML, see you in Vancouver!
  • Jan. 2025: Conformalized HPE paper accepted to ICLR, see you in Singapore!
  • Jan. 2024: Multi-model fitting paper accepted to ICRA, see you in Yokohama!
  • Aug. 2023: FlowBot++ accepted to CoRL, see you in ATL!
  • Sep. 2022: TAX-Pose accepted to CoRL, see you in New Zealand!
  • Apr. 2022: FlowBot3D accepted to RSS, see you in NYC!
  • May. 2021: Won the Warren Y. Dere Award from UC Berkeley EECS.
  • Jun. 2020: Dex-Net AR got featured on VentureBeat.
  • Jun. 2020: Dex-Net AR got featured on Sohu.

Research Interests

My current research focuses on trustworthy AI and autonomous systems. Specifically, I design algorithms for machines to learn representations for more robust real-world generalization and better certifiability. My research revolves around the theme of learning-based perception systems and robotic systems.

Peer-Reviewed Publications
Image CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty
Harry Zhang, Luca Carlone
Accepted to International Conference on Machine Learning (ICML), 2025.
Arxiv | Code | Video

We introduce CUPS, a novel method for learning sequence- to-sequence 3D human shapes and poses from RGB videos with uncertainty quantification. To improve on top of prior work, we develop a method to score multiple hypothe- ses proposed during training, effectively integrating uncer- tainty into the learning process. This process results in a deep uncertainty function that is trained end-to-end with the 3D pose estimator. Post-training, the learned deep uncer- tainty model is used as the conformity score. Since the data in human pose-shape learning is not fully exchangeable, we also pro- vide two practical bounds for the coverage gap in confor- mal prediction, developing theoretical backing for the un- certainty bound of our model.

Image CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
Harry Zhang, Luca Carlone
Accepted to International Conference on Learning Representations (ICLR), 2025.
Arxiv | Code | Video

We introduce CHAMP, a novel method for learning sequence-to-sequence, multi-hypothesis 3D human poses from 2D keypoints by leveraging a conditional distribution with a diffusion model. To predict a single output 3D pose sequence, we generate and aggregate multiple 3D pose hypotheses. For better aggregation results, we develop a method to score these hypotheses during training, effectively integrating conformal prediction into the learning process. This process results in a differentiable conformal predictor that is trained end-to-end with the 3D pose estimator. Post-training, the learned scoring model is used as the conformity score, and the 3D pose estimator is combined with a conformal predictor to select the most accurate hypotheses for downstream aggregation.

Image CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang, David Jin, Luca Carlone
Accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2025. Spotlight.
Arxiv | Code | Video

We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap.

Image Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds
David Jin, Sushrut Karmalkar, Harry Zhang, Luca Carlone
Accepted to IEEE International Conference on Robotics and Automation (ICRA), 2024.
Arxiv | Code | Video

We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds.

Image FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection
Harry Zhang, Benjamin Eisner, David Held
Accepted to Conference on Robot Learning (CoRL), 2023.
Arxiv | Code | Video | Open Review

We explore yet another novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.

Image TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Brian Okorn*, Chu Er Pan*, Harry Zhang*, Benjamin Eisner*, David Held
Accepted to Conference on Robot Learning (CoRL), 2022 (* indicates equal contribution)
Arxiv | Code | Video | Open Review

We conjecture that the task-specific pose relationship between relevant parts of interacting objects is a generalizable notion of a manipulation task that can transfer to new objects. We call this task-specific pose relationship "cross-pose". We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task.

Image FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
Benjamin Eisner*, Harry Zhang*, David Held
Accepted to Robotics Science and Systems (RSS), 2022 (* indicates equal contribution) - Long talk, Best Paper Award Finalist (Selection Rate 1.5%).
Arxiv | Code | Video | Berkeley CPAR Talk | MIT Technology Review China | Synced Review Sohu | CMU Research Highlights

We explore a novel method to perceive and manipulate 3D articulated objects that generalizes to enable the robot to articulate unseen classes of objects.

Image AVPLUG: Approach Vector Planning for Unicontact Grasping amid Clutter
Yahav Avigal*, Vishal Satish*, Harry Zhang, Huang Huang, Michael Danielczuk, Jeffrey Ichnowski, Ken Goldberg
Accepted to Conference on Automation Science and Engineering (CASE), 2021.
Arxiv | Code | Video

We present present AVPLUG: Approach Vector PLanning for Unicontact Grasping: an algorithm for efficiently finding the approach vector using an efficient oct-tree occupancy model and Minkowski sum computation to maximize information gain.

project image Robots of the Lost Arc: Self-Supervised Learning to Dynamically Manipulate Fixed-Endpoint Cables
Harry Zhang, Jeffrey Ichnowski, Daniel Seita, Jonathan Wang, Huang Huang, Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2021
Arxiv | Code | Bay Area Robotics Symposium Coverage | ICRA 2022 Deformable Object Manipulation Workshop

We propose a self-supervised learning framework that enables a UR5 robot to perform these three tasks. The framework finds a 3D apex point for the robot arm, which, together with a task-specific trajectory function, defines an arcing motion that dynamically manipulates the cable to perform tasks with varying obstacle and target locations.

Image Dex-Net AR: Distributed Deep Grasp Planning Using a Commodity Cellphone and Augmented Reality App
Harry Zhang, Jeffrey Ichnowski, Yahav Avigal, Joseph Gonzalez, Ion Stoica, Ken Goldberg
Accepted to International Conference on Robotics and Automation (ICRA), 2020
Arxiv | Code | Video | VentureBeat Coverage | Sohu Coverage (in Mandarin)

We present a distributed pipeline, Dex-Net AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object.

Image Orienting Novel Objects using Self-Supervised Rotation Estimation
Shivin Devgon, Jeffrey Ichnowski, Ashwin Balakrishna, Harry Zhang, Ken Goldberg
Accepted to Conference on Automation Science and Enigeering (CASE), 2020.
Arxiv | Code | Video

We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation.

Preprints
Image Self-Supervised Learning of Dynamic Planar Manipulation of Free-End Cables
Jonathan Wang*, Huang Huang*, Vincent Lim, Harry Zhang, Jeffrey Ichnowski, Daniel Seita, Yunliang Chen, Ken Goldberg
Preprint, in submission to International Conference on Robotics and Automation (ICRA), 2022.
Arxiv | Code | Video

We present an algorithm to train a robot to control free-end cables in a self-supervised fashion.

Image Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Bobby Yan*, Harry Zhang*, Huang Huang*,
Preprint, 2022.
Arxiv | Code | Video

We introduce andexplore a novel method for adding safety constraints for model-based RL during training and policy learning.

Teaching
Image

10-725: Graduate Convex Optimization
16-385: Computer Vision

Image

CS 189: Introduction to Machine Learning

EE 127: Introduction to Convex Optimization

CS 188: Introduction to Artificial Intelligence

CS 170: Algorithms

ME C231A: Model Predictive Control


Website template from Jon Barron Image