Shape of Motion: 4D Reconstruction from a Single Video

Shape of Motion: 4D Reconstruction from a Single Video

18 Jul 2024 | Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, and Angjoo Kanazawa
This paper introduces a method for reconstructing 4D scenes from a single monocular video, enabling long-range 3D motion tracking and novel view synthesis. The method represents dynamic scenes using a set of 3D Gaussians that translate and rotate over time, with motion modeled as a compact set of shared SE(3) motion bases. The approach leverages data-driven priors, including monocular depth maps and long-range 2D tracks, to consolidate noisy supervisory signals into a globally consistent representation of the dynamic scene. The method achieves state-of-the-art performance in both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes. The key contributions include a new dynamic scene representation that enables real-time novel view synthesis and globally consistent 3D tracking for any point at any time, and a carefully designed framework that optimizes the representation on posed monocular video by leveraging physical motion priors and data-driven priors. The method is evaluated on both synthetic and real-world dynamic video datasets, showing significant improvements over prior methods in 2D/3D long-range tracking and novel view synthesis tasks. The approach is effective in modeling complex scenes with multiple moving objects and is capable of generating dense, full-length 3D tracks. The method is also evaluated on the Kubric dataset, demonstrating its effectiveness in handling challenging real-world scene motion. The results show that the method outperforms existing approaches in terms of 3D tracking accuracy and novel view synthesis quality.This paper introduces a method for reconstructing 4D scenes from a single monocular video, enabling long-range 3D motion tracking and novel view synthesis. The method represents dynamic scenes using a set of 3D Gaussians that translate and rotate over time, with motion modeled as a compact set of shared SE(3) motion bases. The approach leverages data-driven priors, including monocular depth maps and long-range 2D tracks, to consolidate noisy supervisory signals into a globally consistent representation of the dynamic scene. The method achieves state-of-the-art performance in both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes. The key contributions include a new dynamic scene representation that enables real-time novel view synthesis and globally consistent 3D tracking for any point at any time, and a carefully designed framework that optimizes the representation on posed monocular video by leveraging physical motion priors and data-driven priors. The method is evaluated on both synthetic and real-world dynamic video datasets, showing significant improvements over prior methods in 2D/3D long-range tracking and novel view synthesis tasks. The approach is effective in modeling complex scenes with multiple moving objects and is capable of generating dense, full-length 3D tracks. The method is also evaluated on the Kubric dataset, demonstrating its effectiveness in handling challenging real-world scene motion. The results show that the method outperforms existing approaches in terms of 3D tracking accuracy and novel view synthesis quality.
Reach us at info@study.space
[slides] Shape of Motion%3A 4D Reconstruction from a Single Video | StudySpace