Linyi Jin

I am a fourth-year Ph.D. student at the University of Michigan, advised by Prof. David Fouhey. I work on computer vision.

My research is related to 3D scene understanding, camera calibration and robotics. I was previously a Master student in Robotics. Before that, I received my B.S.E. degrees in Computer Science at UM and Mechanical Engineering at Shanghai Jiao Tong University through a dual degree program at UM-SJTU Joint Institute.

Email / CV / Google Scholar / Github

News

- [2025/06] MegaSaM gets Best Paper Honorable Mention award!
- [2025/05] I will join Adobe Research as a Research Scientist Intern, working with Zhengqi Li and Eli Shechtman.
- [2025/04] Stereo4D is selected as Oral and MegaSaM is selected as a Best Paper Award Candidate at CVPR 2025!
- [2025/03] We've released Stereo4D dataset and MegaSaM code.
- [2025/02] Stereo4D and MegaSaM are accepted to CVPR 2025!
- [2024/02] Two papers accepted to CVPR 2024!
- [2024/02] I will join Google as a Student Researcher, working with Noah Snavely and Aleksander Hołyński.

Work Experience

	Adobe Research Research Scientist Intern Summer, 2025 Host: Zhengqi Li and Eli Shechtman
	Google Deepmind Student Researcher 2024.05 - 2025.04 Host: Noah Snavely and Aleksander Hołyński Collaborators: Richard Turcker, Zhengqi Li
	Adobe Research Computer Vision Research Intern Summer, 2021 Host: Jianming Zhang Collaborators: Yannick Hold-Geoffroy, Oliver Wang, Kevin Matzen,

Publications

	Eye2Eye: A simple approach for monocular-to-stereo video synthesis Michal Geyer, Omer Tov, Linyi Jin, Richard Tucker, Inbar Mosseri, Tali Dekel, Noah Snavely arXiv 2025 project page / arXiv / bibtex We use video models to convert monocular videos into stereo videos that can be viewed with 3D glasses or VR headsets. It handles challenging scenes with specular and semi-transparent objects.
	Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Hołyński CVPR 2025 (Oral -- 3.3% of the accepted papers) project page / arXiv / bibtex / code and dataset Use stereo videos from the internet to create a dataset of over 100,000 real-world 4D scenes with metric scale and long-term 3D motion trajectories.
	MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Hołyński, Noah Snavely CVPR 2025 (Oral, Best Paper Honorable Mention) project page / arXiv / bibtex / code MegaSaM estimates cameras and dense structure, quickly and accurately, from any static or dynamic video.
	FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation. Chris Rockwell, Nilesh Kulkarni, Linyi Jin, JJ Park, Justin Johnson, David Fouhey CVPR, 2024 (Highlight -- 11.9% accept rate) project page / arXiv / code / bibtex Our flexible method produces accurate and robust pose estimates using complementary strengths of Correspondence + Solver and Learning-Based methods.
	3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface. Linyi Jin, Nilesh Kulkarni, David Fouhey CVPR, 2024 project page / arXiv / code / bibtex Our new system for scene-level 3D reconstruction from posed images, which works with as few as one view, reconstructs the complete geometry of unseen scenes, including hidden surfaces.
	Perspective Fields for Single Image Camera Calibration. Linyi Jin, Jianming Zhang, Yannick Hold-Geoffroy, Oliver Wang, Kevin Matzen, Matthew Sticha, David Fouhey CVPR, 2023 (Highlight -- 2.5% accept rate) project page / demo / arXiv / code / bibtex A novel image space representation for camera perspectives, facilitating precise calibration in in-the-wild environments and cropped images.
	Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data. Nilesh Kulkarni, Linyi Jin, Justin Johnson, David Fouhey CVPR, 2023 project page / arXiv / code / bibtex Learning 3D implicit function from a single input image. Unlike other methods, D2-DRDF does not depend on mesh supervision during training and can directly operate with raw RGB-D data obtained from scene captures.
	PlaneFormers: From Sparse View Planes to 3D Reconstruction. Samir Agarwala, Linyi Jin, Chris Rockwell, David Fouhey ECCV, 2022 project page / arXiv / code / bibtex We introduce a simpler approach that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. This is substantially more effective than SparsePlanes.
	Understanding 3D Object Articulation in Internet Videos. Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David Fouhey CVPR, 2022 project page / arXiv / code / bibtex We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.
	SparsePlanes: Planar Surface Reconstruction from Sparse Views. Linyi Jin, Shengyi Qian, Andrew Owens, David Fouhey ICCV, 2021 (Oral -- 3% acceptance rate) project page / arXiv / code / bibtex We learn to reconstruct scenes from sparse views with an unknown relationship. We take advantage of planar regions and their geometric properties to recover the scene layout.
	Associative3D: Volumetric Reconstruction from Sparse Views. Shengyi Qian, Linyi Jin*, David Fouhey ECCV*, 2020 project page / arXiv / code / bibtex We can build a voxel-based reconstruction of images from two views, even without access to the relative camera positions. Invited presentation at ECCV 2020 Workshop Holistic Scene Structures for 3D Vision.
	Inferring Occluded Geometry Improves Performance when Retrieving an Object from Dense Clutter. Andrew Price, Linyi Jin*, Dmitry Berenson ISRR*, 2019 project page / arXiv / bibtex We augment a manipulation planner for cluttered environments with a shape completion network and a volumetric memory system, allowing the robot to reason about what may be contained in occluded areas.

Teaching

EECS 442 Computer Vision (Winter '19)
IA with David Fouhey.

This website uses template from Jon Barron.