MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction

MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction

8 Mar 2024 | Heng Zhou, Zhetao Guo, Shuhong Liu, Lechen Zhang, Qihao Wang, Yuxiang Ren, and Mingrui Li
MoD-SLAM is a monocular NeRF-based dense mapping method that enables real-time 3D reconstruction of unbounded scenes. The method introduces a Gaussian-based unbounded scene representation to address the challenge of mapping scenes without boundaries. A depth estimation module is designed to extract accurate priori depth values to supervise mapping and tracking processes. A robust depth loss term is introduced into the tracking process to improve pose estimation in large-scale scenes. Experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems. The method also incorporates a depth-supervised camera tracking method to improve camera pose estimation in monocular SLAM. By combining loop closure detection and global optimization, MoD-SLAM demonstrates state-of-the-art performance in both mapping and tracking metrics. The method uses a spherical contraction function to handle unbounded large scenes and employs Gaussian encoding for more precise geometric structure and visual appearance information. A monocular depth estimation module and depth distillation module are designed to extract accurate depth values and constrain the scale in the currently observed scene. The method also introduces a NeRF-based volume rendering method that uses color values and depth values for networks training. The system is evaluated on synthetic and real-world datasets, including Replica and ScanNet, showing enhanced geometric structure and texture features in reconstructed scenes. The results demonstrate that MoD-SLAM has superior performance in both localization and reconstruction with low time and GPU memory consumption compared to state-of-the-art SLAM systems.MoD-SLAM is a monocular NeRF-based dense mapping method that enables real-time 3D reconstruction of unbounded scenes. The method introduces a Gaussian-based unbounded scene representation to address the challenge of mapping scenes without boundaries. A depth estimation module is designed to extract accurate priori depth values to supervise mapping and tracking processes. A robust depth loss term is introduced into the tracking process to improve pose estimation in large-scale scenes. Experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of 3D reconstruction and localization by up to 30% and 15% respectively compared with existing state-of-the-art monocular SLAM systems. The method also incorporates a depth-supervised camera tracking method to improve camera pose estimation in monocular SLAM. By combining loop closure detection and global optimization, MoD-SLAM demonstrates state-of-the-art performance in both mapping and tracking metrics. The method uses a spherical contraction function to handle unbounded large scenes and employs Gaussian encoding for more precise geometric structure and visual appearance information. A monocular depth estimation module and depth distillation module are designed to extract accurate depth values and constrain the scale in the currently observed scene. The method also introduces a NeRF-based volume rendering method that uses color values and depth values for networks training. The system is evaluated on synthetic and real-world datasets, including Replica and ScanNet, showing enhanced geometric structure and texture features in reconstructed scenes. The results demonstrate that MoD-SLAM has superior performance in both localization and reconstruction with low time and GPU memory consumption compared to state-of-the-art SLAM systems.
Reach us at info@study.space