Log inSign up
Ritvik Singh
145 posts
user avatar
Ritvik Singh
@ritvik_singh9
PhD student @berkeley_ai. prev. @NvidiaAI, @UofT
Berkeley, CA
ritvik-singh.com
Joined February 2022
362
Following
1,401
Followers
  • Pinned
    user avatar
    Ritvik Singh
    @ritvik_singh9
    Jun 17
    Introducing ABC: open data, training, and infrastructure for robotics. We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques. @arthurallshire @Cinnabar233 @adamrasb @redstone_hong @davidrmcall
    Image
    00:00
    266K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Excited to announce our latest work: DextrAH-RGB where we successfully train visuomotor policies in sim to perform dexterous grasping of arbitrary objects end-to-end from RGB input!
    Image
    00:00
    83K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Apr 8, 2025
    Excited to share that this Fall I'll be starting my PhD @Berkeley_EECS with @pabbeel and @JitendraMalikCV!
    17K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Apr 28, 2025
    Over the past few months we've been working with Boston Dynamics on end-to-end dexterous manipulation for the EAtlas:
    Image
    00:00
    15K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 25, 2025
    Beyond just being a systems contribution, it’s interesting how adding this curriculum makes multimodal learning more generalizable. Really satisfying to see the network explicitly switch from attending to vision to attending to force as the policy enters a contact rich regime.
    user avatar
    Jason Liu
    @JasonJZLiu
    Feb 25, 2025
    Low-cost teleop systems have democratized robot data collection, but they lack any force feedback, making it challenging to teleoperate contact-rich tasks. Many robot arms provide force information — a critical yet underutilized modality in robot learning. We introduce: 1. 🦾A
    Image
    00:00
    1.3K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Oct 27, 2022
    Excited to share our latest work, DeXtreme! One takeaway is that standard pose estimators don't work out of the box on these challenging robotics scenarios with varying lighting conditions, significant occlusions, blur, etc. This is where synthetic data plays a key role.
    user avatar
    Ankur Handa
    @ankurhandos
    Oct 26, 2022
    DeXtreme is our new work on scaling sim-to-real for contact-rich manipulation with a vision-based state estimation on a robot hand with the infrastructure we have been developing with Isaac Gym over the past one year. arxiv.org/abs/2210.13702 dextreme.org
    Image
    00:00
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Replying to @ritvik_singh9
    We first train a state-based "teacher" policy in simulation using reinforcement learning and automatic domain randomization (x.com/ankurhandos/st…).
    Image
    00:00
    Image
    user avatar
    Ankur Handa
    @ankurhandos
    Oct 26, 2022
    Replying to @ankurhandos
    and automatic domain randomisation where the randomisation ranges are modulated based on the performance of the policy at the boundaries leading to significantly better performance because of the curriculum it provides and the diversity of the data generated with it.
    1.5K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    May 7, 2025
    Very impressive, first work I've seen to scalably learn humanoid motions from videos
    user avatar
    Arthur Allshire
    @arthurallshire
    May 7, 2025
    our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)
    Image
    00:00
    611
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Replying to @ritvik_singh9
    This was joint work with my colleagues @arthurallshire, @ankurhandos, @robot_trainer, and Karl Van Wyk. For more information please check out our website:
    dextrah-rgb.github.io
    DextrAH-RGB
    Visuomotor Policies to Grasp Anything with Dexterous Hands
    822
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Replying to @ritvik_singh9
    Prior works almost always perform dexterous grasping from depth. However, this becomes overly restrictive in the real world as depth maps can be incredibly sensitive to ambient light conditions, and limits the ability to use RGB-based pre-trained models.
    user avatar
    Nathan Ratliff
    @robot_trainer
    Jul 11, 2024
    Exciting new work! Fast, robust, reactive, direct-from-sensor grasp-anything policies. RL really works, and it’s going to transform the entire robotics economy. DextrAH-G: Dexterous Arm-Hand Grasping arxiv.org/abs/2407.02274
    Image
    00:00
    2.2K
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    May 7, 2025
    Replying to @ChongZzZhang
    fwiw I've only experimented with L2 vs KL, I found KL to always be better. KL(T||S) w fixed var (and diagonal Gaussians) effectively reduces to (mu_s-mu_t)^T \Sigma_t^{-1} (mu_s-mu_t) which has the nice intuition of weighing error in action dims by how uncertain we are
    632
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Replying to @ritvik_singh9
    Another benefit of using RGB is that we can leverage pre-trained encoders such as ResNet-18 in order to serve as our backbone. We notice that the optimal performance is given by pre-trained encoders finetuned in simulation.
    809
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Jun 13, 2025
    Replying to @anag004 @ericjang11 and 3 others
    Intel Realsense cameras are not stereo RGB -- they are stereo IR. Depth is derived from projected a stereo pattern and running a classical matching algo. This is an important distinction bc things like glare, reflective surfaces, etc. can affect the projected IR pattern
    250
  • user avatar
    Ritvik Singh
    @ritvik_singh9
    Feb 6, 2025
    Replying to @ritvik_singh9
    Our transformer encoder employs a form of cross-attention whereby tokens from the left image can only attend to tokens in the right image and the learnable embedding token, and vice versa.
    638

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement