Linyi Jin

I am a Research Scientist at Google DeepMind in New York.

I received my Ph.D. from the University of Michigan, advised by David Fouhey, where my research focused on 3D from casual visual data. Before that, I obtained my Master’s degree in Robotics from the University of Michigan. I hold dual B.S.E. degrees in Computer Science from the University of Michigan and Mechanical Engineering from Shanghai Jiao Tong University.

I interned at Adobe Research in summer 2021 and 2025 and at Google DeepMind in summer 2024.

Email / CV / Google Scholar / LinkedIn

News

- [2025/12] I defended my PhD!
- [2025/06] MegaSaM gets Best Paper Honorable Mention award!
- [2025/05] I will join Adobe Research as a Research Scientist Intern, working with Zhengqi Li.
- [2025/04] Stereo4D is selected as Oral and MegaSaM is selected as a Best Paper Award Candidate at CVPR 2025!
- [2025/03] We've released Stereo4D dataset and MegaSaM code.
- [2025/02] Stereo4D and MegaSaM are accepted to CVPR 2025!

Publications

	Eye2Eye: A simple approach for monocular-to-stereo video synthesis Michal Geyer, Omer Tov, Linyi Jin, Richard Tucker, Inbar Mosseri, Tali Dekel, Noah Snavely arXiv 2025 project page / arXiv / bibtex We use video models to convert monocular videos into stereo videos that can be viewed with 3D glasses or VR headsets. It handles challenging scenes with specular and semi-transparent objects.
	Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Hołyński CVPR 2025 (Oral -- 3.3% of the accepted papers) project page / arXiv / bibtex / code and dataset Use stereo videos from the internet to create a dataset of over 100,000 real-world 4D scenes with metric scale and long-term 3D motion trajectories.
	MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Hołyński, Noah Snavely CVPR 2025 (Oral, Best Paper Honorable Mention) project page / arXiv / bibtex / code MegaSaM estimates cameras and dense structure, quickly and accurately, from any static or dynamic video.
	FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation. Chris Rockwell, Nilesh Kulkarni, Linyi Jin, JJ Park, Justin Johnson, David Fouhey CVPR, 2024 (Highlight -- 11.9% accept rate) project page / arXiv / code / bibtex Our flexible method produces accurate and robust pose estimates using complementary strengths of Correspondence + Solver and Learning-Based methods.
	3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface. Linyi Jin, Nilesh Kulkarni, David Fouhey CVPR, 2024 project page / arXiv / code / bibtex Our new system for scene-level 3D reconstruction from posed images, which works with as few as one view, reconstructs the complete geometry of unseen scenes, including hidden surfaces.
	Perspective Fields for Single Image Camera Calibration. Linyi Jin, Jianming Zhang, Yannick Hold-Geoffroy, Oliver Wang, Kevin Matzen, Matthew Sticha, David Fouhey CVPR, 2023 (Highlight -- 2.5% accept rate) project page / demo / arXiv / code / bibtex A novel image space representation for camera perspectives, facilitating precise calibration in in-the-wild environments and cropped images.
	Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data. Nilesh Kulkarni, Linyi Jin, Justin Johnson, David Fouhey CVPR, 2023 project page / arXiv / code / bibtex Learning 3D implicit function from a single input image. Unlike other methods, D2-DRDF does not depend on mesh supervision during training and can directly operate with raw RGB-D data obtained from scene captures.
	PlaneFormers: From Sparse View Planes to 3D Reconstruction. Samir Agarwala, Linyi Jin, Chris Rockwell, David Fouhey ECCV, 2022 project page / arXiv / code / bibtex We introduce a simpler approach that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. This is substantially more effective than SparsePlanes.
	Understanding 3D Object Articulation in Internet Videos. Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David Fouhey CVPR, 2022 project page / arXiv / code / bibtex We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.
	SparsePlanes: Planar Surface Reconstruction from Sparse Views. Linyi Jin, Shengyi Qian, Andrew Owens, David Fouhey ICCV, 2021 (Oral -- 3% acceptance rate) project page / arXiv / code / bibtex We learn to reconstruct scenes from sparse views with an unknown relationship. We take advantage of planar regions and their geometric properties to recover the scene layout.
	Associative3D: Volumetric Reconstruction from Sparse Views. Shengyi Qian, Linyi Jin*, David Fouhey ECCV*, 2020 project page / arXiv / code / bibtex We can build a voxel-based reconstruction of images from two views, even without access to the relative camera positions. Invited presentation at ECCV 2020 Workshop Holistic Scene Structures for 3D Vision.
	Inferring Occluded Geometry Improves Performance when Retrieving an Object from Dense Clutter. Andrew Price, Linyi Jin*, Dmitry Berenson ISRR*, 2019 project page / arXiv / bibtex We augment a manipulation planner for cluttered environments with a shape completion network and a volumetric memory system, allowing the robot to reason about what may be contained in occluded areas.

Teaching

EECS 442 Computer Vision (Winter '19)
IA with David Fouhey.

This website uses template from Jon Barron.