Ritvik Singh (@ritvik

Ritvik Singh

145 posts

Ritvik Singh

@ritvik_singh9

PhD student @berkeley_ai. prev. @NvidiaAI, @UofT

Berkeley, CA

Joined February 2022

Pinned
Ritvik Singh
@ritvik_singh9
Jun 17
Introducing ABC: open data, training, and infrastructure for robotics. We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques. @arthurallshire @Cinnabar233 @adamrasb @redstone_hong @davidrmcall
00:00
266K
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Excited to announce our latest work: DextrAH-RGB where we successfully train visuomotor policies in sim to perform dexterous grasping of arbitrary objects end-to-end from RGB input!
00:00
83K
Ritvik Singh
@ritvik_singh9
Apr 8, 2025
Excited to share that this Fall I'll be starting my PhD @Berkeley_EECS with @pabbeel and @JitendraMalikCV!
17K
Ritvik Singh
@ritvik_singh9
Apr 28, 2025
Over the past few months we've been working with Boston Dynamics on end-to-end dexterous manipulation for the EAtlas:
00:00
15K
Ritvik Singh
@ritvik_singh9
Feb 25, 2025
Beyond just being a systems contribution, it’s interesting how adding this curriculum makes multimodal learning more generalizable. Really satisfying to see the network explicitly switch from attending to vision to attending to force as the policy enters a contact rich regime.
Jason Liu
@JasonJZLiu
Feb 25, 2025
Low-cost teleop systems have democratized robot data collection, but they lack any force feedback, making it challenging to teleoperate contact-rich tasks. Many robot arms provide force information — a critical yet underutilized modality in robot learning. We introduce: 1. 🦾A
00:00
1.3K
Ritvik Singh
@ritvik_singh9
Oct 27, 2022
Excited to share our latest work, DeXtreme! One takeaway is that standard pose estimators don't work out of the box on these challenging robotics scenarios with varying lighting conditions, significant occlusions, blur, etc. This is where synthetic data plays a key role.
Ankur Handa
@ankurhandos
Oct 26, 2022
DeXtreme is our new work on scaling sim-to-real for contact-rich manipulation with a vision-based state estimation on a robot hand with the infrastructure we have been developing with Isaac Gym over the past one year. arxiv.org/abs/2210.13702 dextreme.org
00:00
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Replying to @ritvik_singh9
We first train a state-based "teacher" policy in simulation using reinforcement learning and automatic domain randomization (x.com/ankurhandos/st…).
00:00
Ankur Handa
@ankurhandos
Oct 26, 2022
Replying to @ankurhandos
and automatic domain randomisation where the randomisation ranges are modulated based on the performance of the policy at the boundaries leading to significantly better performance because of the curriculum it provides and the diversity of the data generated with it.
1.5K
Ritvik Singh
@ritvik_singh9
May 7, 2025
Very impressive, first work I've seen to scalably learn humanoid motions from videos
Arthur Allshire
@arthurallshire
May 7, 2025
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)
00:00
611
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Replying to @ritvik_singh9
This was joint work with my colleagues @arthurallshire, @ankurhandos, @robot_trainer, and Karl Van Wyk. For more information please check out our website:
dextrah-rgb.github.io
DextrAH-RGB
Visuomotor Policies to Grasp Anything with Dexterous Hands
822
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Replying to @ritvik_singh9
Prior works almost always perform dexterous grasping from depth. However, this becomes overly restrictive in the real world as depth maps can be incredibly sensitive to ambient light conditions, and limits the ability to use RGB-based pre-trained models.
Nathan Ratliff
@robot_trainer
Jul 11, 2024
Exciting new work! Fast, robust, reactive, direct-from-sensor grasp-anything policies. RL really works, and it’s going to transform the entire robotics economy. DextrAH-G: Dexterous Arm-Hand Grasping arxiv.org/abs/2407.02274
00:00
2.2K
Ritvik Singh
@ritvik_singh9
May 7, 2025
Replying to @ChongZzZhang
fwiw I've only experimented with L2 vs KL, I found KL to always be better. KL(T||S) w fixed var (and diagonal Gaussians) effectively reduces to (mu_s-mu_t)^T \Sigma_t^{-1} (mu_s-mu_t) which has the nice intuition of weighing error in action dims by how uncertain we are
632
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Replying to @ritvik_singh9
Another benefit of using RGB is that we can leverage pre-trained encoders such as ResNet-18 in order to serve as our backbone. We notice that the optimal performance is given by pre-trained encoders finetuned in simulation.
809
Ritvik Singh
@ritvik_singh9
Jun 13, 2025
Replying to @anag004 @ericjang11 and 3 others
Intel Realsense cameras are not stereo RGB -- they are stereo IR. Depth is derived from projected a stereo pattern and running a classical matching algo. This is an important distinction bc things like glare, reflective surfaces, etc. can affect the projected IR pattern
250
Ritvik Singh
@ritvik_singh9
Feb 6, 2025
Replying to @ritvik_singh9
Our transformer encoder employs a form of cross-attention whereby tokens from the left image can only attend to tokens in the right image and the learnable embedding token, and vice versa.
638