MoDGs: Dynamic Gaussian Splatting from Casually-Captured Monocular Videos with Depth Priors

MoDGs: Dynamic Gaussian Splatting from Casually-Captured Monocular Videos with Depth Priors

17 May 2025 | Qingming Liu1,5*, Yuan Liu2*, Jiepeng Wang3, Xianqiang Lyu1, Peng Wang3 Wenping Wang4 Junhui Hou1†
MoDGS is a novel method for rendering novel views of dynamic scenes from casually captured monocular videos. The method addresses the challenge of reconstructing dynamic scenes from videos with static or slow-moving cameras, which are difficult for existing methods due to weak multiview consistency. MoDGS leverages recent single-view depth estimation techniques to guide the learning of dynamic scenes, and introduces a 3D-aware initialization method to learn a reasonable deformation field and a new robust depth loss to guide the learning of dynamic scene geometry. The method also proposes an ordinal depth loss to address scale inconsistency in estimated depth values across different frames, ensuring more consistent depth outputs. Comprehensive experiments show that MoDGS outperforms state-of-the-art methods in rendering high-quality novel view images from casually captured monocular videos. The method is evaluated on three widely used datasets, including the Nvidia, DyNeRF, and Davis datasets, as well as a self-collected in-the-wild dataset. MoDGS achieves superior performance in terms of PSNR, SSIM, and LPIPS metrics, demonstrating its effectiveness in rendering novel views of dynamic scenes from casually captured monocular videos. The method's key innovations include the 3D-aware initialization and ordinal depth loss, which enable accurate reconstruction of dynamic scenes from monocular videos with minimal camera movement.MoDGS is a novel method for rendering novel views of dynamic scenes from casually captured monocular videos. The method addresses the challenge of reconstructing dynamic scenes from videos with static or slow-moving cameras, which are difficult for existing methods due to weak multiview consistency. MoDGS leverages recent single-view depth estimation techniques to guide the learning of dynamic scenes, and introduces a 3D-aware initialization method to learn a reasonable deformation field and a new robust depth loss to guide the learning of dynamic scene geometry. The method also proposes an ordinal depth loss to address scale inconsistency in estimated depth values across different frames, ensuring more consistent depth outputs. Comprehensive experiments show that MoDGS outperforms state-of-the-art methods in rendering high-quality novel view images from casually captured monocular videos. The method is evaluated on three widely used datasets, including the Nvidia, DyNeRF, and Davis datasets, as well as a self-collected in-the-wild dataset. MoDGS achieves superior performance in terms of PSNR, SSIM, and LPIPS metrics, demonstrating its effectiveness in rendering novel views of dynamic scenes from casually captured monocular videos. The method's key innovations include the 3D-aware initialization and ordinal depth loss, which enable accurate reconstruction of dynamic scenes from monocular videos with minimal camera movement.
Reach us at info@study.space
Understanding MoDGS%3A Dynamic Gaussian Splatting from Casually-captured Monocular Videos