GFlow: Recovering 4D World from Monocular Video

GFlow: Recovering 4D World from Monocular Video

28 May 2024 | Shizun Wang, Xingyi Yang, Qiuqiong Shen, Zhenxiang Jiang, Xinchao Wang*
**GFlow: Recovering 4D World from Monocular Video** This paper addresses the challenging task of reconstructing a 4D world (3D + time) from a single monocular video input without any camera parameters. The proposed framework, GFlow, leverages 2D priors such as depth and optical flow to transform a video into an explicit 4D representation using 3D Gaussian Splatting (3DGS). GFlow clusters the scene into still and moving parts, then optimizes camera poses and the dynamics of 3D Gaussian points based on these priors. This process ensures fidelity and smooth transitions across frames. GFlow also introduces a pixel-wise densification strategy to integrate new visual content. The framework enables various applications, including tracking, segmentation, novel view synthesis, and editing, demonstrating its versatility and power in video analysis and manipulation. Experimental results on the DAVIS and Tanks and Temples datasets show that GFlow outperforms existing methods in reconstruction quality, object segmentation, and camera pose estimation.**GFlow: Recovering 4D World from Monocular Video** This paper addresses the challenging task of reconstructing a 4D world (3D + time) from a single monocular video input without any camera parameters. The proposed framework, GFlow, leverages 2D priors such as depth and optical flow to transform a video into an explicit 4D representation using 3D Gaussian Splatting (3DGS). GFlow clusters the scene into still and moving parts, then optimizes camera poses and the dynamics of 3D Gaussian points based on these priors. This process ensures fidelity and smooth transitions across frames. GFlow also introduces a pixel-wise densification strategy to integrate new visual content. The framework enables various applications, including tracking, segmentation, novel view synthesis, and editing, demonstrating its versatility and power in video analysis and manipulation. Experimental results on the DAVIS and Tanks and Temples datasets show that GFlow outperforms existing methods in reconstruction quality, object segmentation, and camera pose estimation.
Reach us at info@study.space