Shubham Tulsiani (@shubhtuls) / X

Shubham Tulsiani

164 posts

Shubham Tulsiani

@shubhtuls

Assistant Professor in the Robotics Institute, Carnegie Mellon University I want to build perception systems that can understand the physical world

Joined January 2011

Pinned
Shubham Tulsiani
@shubhtuls
Feb 27
[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.
00:00
16K
Shubham Tulsiani
@shubhtuls
Feb 23, 2024
[1/6] What representation comes to mind when you think of a ‘camera’? Perhaps an extrinsic + intrinsic matrix? In our ICLR (oral) paper, we instead infer a distributed representation where each pixel is associated with a ray, and show SoTA results for few-view pose estimation.
00:00
142K
Shubham Tulsiani
@shubhtuls
Oct 2, 2025
[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).
00:00
61K
Shubham Tulsiani
@shubhtuls
Aug 23, 2022
[1/4] Camera poses are essential for (neural) 3D reconstruction. But what about sparse-view settings where obtaining these via COLMAP isn’t feasible? Our ECCV paper tackles this using an energy-based formulation for predicting relative rotation (jasonyzhang.com/relpose)
00:00
Shubham Tulsiani
@shubhtuls
May 12, 2025
[1/6] Our #CVPR2025 paper “DiffusionSfM” extends our RayDiffusion framework — inferring both geometry and cameras via diffusing pixelwise ray origins and endpoints.
00:00
41K
Shubham Tulsiani
@shubhtuls
Dec 9, 2024
[1/5] Recovering 3D from sparse-view input in-the-wild requires solving a chicken-and-egg problem between pose estimation and 3D reconstruction. Our NeurIPS paper, SparseAGS, presents a method to jointly solve these for high-fidelity 3D estimation.
00:00
11K
Shubham Tulsiani
@shubhtuls
Apr 11, 2023
[1/3] Excited to share our #CVPR23 paper with @zhizdev on 3D object reconstruction from as few as 2 views! Please see our website for results over 50 categories: sparsefusion.github.io
00:00
21K
Shubham Tulsiani
@shubhtuls
Mar 31, 2022
[1/5] 3D Generation tasks are inherently multimodal — generating a full shape from partial observation, or 3D from text, or even single-view prediction. Our CVPR paper shows that a common prior over the space of shapes allows multi-modal prediction across these different tasks.
GIF
Shubham Tulsiani
@shubhtuls
Dec 7, 2024
[1/5] Diffusion models can now generate images in a flash. Can we similarly have ultra-fast 3D generation? We present Turbo3D — a generative 3D model for high-quality text-to-3D generation in 0.35s!
00:00
11K
Shubham Tulsiani
@shubhtuls
Jul 20, 2021
(1/6) Excited to share our paper “PixelTransformer” that was presented at ICML today. It proposes a simple and unified framework for generating dense spatial signals (e.g images, shapes, polynomials) given just a few samples.
Shubham Tulsiani
@shubhtuls
Apr 26, 2024
[1/7] We humans use our hands to interact with a myriad objects around us. Our upcoming CVPR paper G-HOP (judyye.github.io/ghop-www/) learns a 3D generative model for such interactions, and can synthesize, both, the hand and object in 3D given a category label.
00:00
15K
Shubham Tulsiani
@shubhtuls
Apr 8, 2022
[1/6] An elusive goal in single-view 3D prediction has been to scale beyond a handful of object categories. Our upcoming CVPR paper presents an extremely simple approach towards this, and allows learning a unified reconstruction model over 150 object categories.
GIF
Shubham Tulsiani
@shubhtuls
Apr 19, 2025
Excited to share this dataset with registered aerial and ground images with dense geometry and correspondence supervision. Please see Khiem’s thread for some cool applications this enables!
Khiem Vuong
@kvuongdev
Apr 18, 2025
[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.
00:00
7.1K
Shubham Tulsiani
@shubhtuls
Dec 12, 2023
[1/3] Ever wanted to obtain 3D from just a couple of images? We present UpFusion, a system for 3D object reconstruction given a sparse set of unposed input images. Work led by @bharathrajn98, in collaboration with @hyjameslee, @SergeyTulyakov.
00:00
16K