18 Jul 2024 | Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa
The paper "Shape of Motion: 4D Reconstruction from a Single Video" addresses the challenging task of monocular dynamic reconstruction, which involves recovering the geometry and 3D motion of complex dynamic scenes from a single video. The authors propose a method that explicitly models full-sequence-long 3D motion, leveraging two key insights: the low-dimensional nature of 3D motion and the complementary nature of data-driven priors. They represent the dynamic scene as a set of persistent 3D Gaussians, with each point's motion expressed as a linear combination of compact SE(3) motion bases. This approach enables joint 3D long-range tracking and novel view synthesis. The method is evaluated on both synthetic and real-world datasets, demonstrating superior performance in long-range 3D/2D motion estimation and novel view synthesis compared to existing methods. The key contributions include a new dynamic scene representation and a framework that optimizes this representation using physical and data-driven priors.The paper "Shape of Motion: 4D Reconstruction from a Single Video" addresses the challenging task of monocular dynamic reconstruction, which involves recovering the geometry and 3D motion of complex dynamic scenes from a single video. The authors propose a method that explicitly models full-sequence-long 3D motion, leveraging two key insights: the low-dimensional nature of 3D motion and the complementary nature of data-driven priors. They represent the dynamic scene as a set of persistent 3D Gaussians, with each point's motion expressed as a linear combination of compact SE(3) motion bases. This approach enables joint 3D long-range tracking and novel view synthesis. The method is evaluated on both synthetic and real-world datasets, demonstrating superior performance in long-range 3D/2D motion estimation and novel view synthesis compared to existing methods. The key contributions include a new dynamic scene representation and a framework that optimizes this representation using physical and data-driven priors.