Linyi Jin

I am a Research Scientist at Google DeepMind in New York.

I received my Ph.D. from the University of Michigan, advised by David Fouhey, where my research focused on 3D from casual visual data. Before that, I obtained my Master’s degree in Robotics from the University of Michigan. I hold dual B.S.E. degrees in Computer Science from the University of Michigan and Mechanical Engineering from Shanghai Jiao Tong University.

I interned at Adobe Research in summer 2021 and 2025 and at Google DeepMind in summer 2024.

Email  /  CV  /  Google Scholar  /  LinkedIn

profile photo

News

- [2025/12] I defended my PhD!
- [2025/06] MegaSaM gets Best Paper Honorable Mention award!
- [2025/05] I will join Adobe Research as a Research Scientist Intern, working with Zhengqi Li.
- [2025/04] Stereo4D is selected as Oral and MegaSaM is selected as a Best Paper Award Candidate at CVPR 2025!
- [2025/03] We've released Stereo4D dataset and MegaSaM code.
- [2025/02] Stereo4D and MegaSaM are accepted to CVPR 2025!


Publications

Image
Eye2Eye: A simple approach for monocular-to-stereo video synthesis
Michal Geyer, Omer Tov, Linyi Jin, Richard Tucker, Inbar Mosseri, Tali Dekel, Noah Snavely
arXiv 2025
project page / arXiv / bibtex

We use video models to convert monocular videos into stereo videos that can be viewed with 3D glasses or VR headsets. It handles challenging scenes with specular and semi-transparent objects.

Image
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely*, Aleksander Hołyński*
CVPR 2025 (Oral -- 3.3% of the accepted papers)
project page / arXiv / bibtex / code and dataset

Use stereo videos from the internet to create a dataset of over 100,000 real-world 4D scenes with metric scale and long-term 3D motion trajectories.

Image
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Hołyński, Noah Snavely
CVPR 2025 (Oral, Best Paper Honorable Mention)
project page / arXiv / bibtex / code

MegaSaM estimates cameras and dense structure, quickly and accurately, from any static or dynamic video.

Image
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation.
Chris Rockwell, Nilesh Kulkarni, Linyi Jin, JJ Park, Justin Johnson, David Fouhey
CVPR, 2024 (Highlight -- 11.9% accept rate)
project page / arXiv / code / bibtex

Our flexible method produces accurate and robust pose estimates using complementary strengths of Correspondence + Solver and Learning-Based methods.

Image
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface.
Linyi Jin, Nilesh Kulkarni, David Fouhey
CVPR, 2024
project page / arXiv / code / bibtex

Our new system for scene-level 3D reconstruction from posed images, which works with as few as one view, reconstructs the complete geometry of unseen scenes, including hidden surfaces.

Image
Perspective Fields for Single Image Camera Calibration.
Linyi Jin, Jianming Zhang, Yannick Hold-Geoffroy, Oliver Wang, Kevin Matzen, Matthew Sticha, David Fouhey
CVPR, 2023 (Highlight -- 2.5% accept rate)
project page / demo / arXiv / code / bibtex

A novel image space representation for camera perspectives, facilitating precise calibration in in-the-wild environments and cropped images.

Image
Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data.
Nilesh Kulkarni, Linyi Jin, Justin Johnson, David Fouhey
CVPR, 2023
project page / arXiv / code / bibtex

Learning 3D implicit function from a single input image. Unlike other methods, D2-DRDF does not depend on mesh supervision during training and can directly operate with raw RGB-D data obtained from scene captures.

Image
PlaneFormers: From Sparse View Planes to 3D Reconstruction.
Samir Agarwala, Linyi Jin, Chris Rockwell, David Fouhey
ECCV, 2022
project page / arXiv / code / bibtex

We introduce a simpler approach that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. This is substantially more effective than SparsePlanes.

Image
Image
Understanding 3D Object Articulation in Internet Videos.
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David Fouhey
CVPR, 2022
project page / arXiv / code / bibtex

We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.

Image
SparsePlanes: Planar Surface Reconstruction from Sparse Views.
Linyi Jin, Shengyi Qian, Andrew Owens, David Fouhey
ICCV, 2021 (Oral -- 3% acceptance rate)
project page / arXiv / code / bibtex

We learn to reconstruct scenes from sparse views with an unknown relationship. We take advantage of planar regions and their geometric properties to recover the scene layout.

Image
Associative3D: Volumetric Reconstruction from Sparse Views.
Shengyi Qian*, Linyi Jin*, David Fouhey
ECCV, 2020
project page / arXiv / code / bibtex

We can build a voxel-based reconstruction of images from two views, even without access to the relative camera positions.

Invited presentation at ECCV 2020 Workshop Holistic Scene Structures for 3D Vision.

Image
Inferring Occluded Geometry Improves Performance when Retrieving an Object from Dense Clutter.
Andrew Price*, Linyi Jin*, Dmitry Berenson
ISRR, 2019
project page / arXiv / bibtex

We augment a manipulation planner for cluttered environments with a shape completion network and a volumetric memory system, allowing the robot to reason about what may be contained in occluded areas.


Teaching

Image EECS 442 Computer Vision (Winter '19)
IA with David Fouhey.


This website uses template from Jon Barron.