29 May 2024 | Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang
**SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM**
**Authors:** Siting Zhu
**Abstract:**
SemGauss-SLAM is a novel dense semantic SLAM system that leverages 3D Gaussian representation to achieve accurate 3D semantic mapping, robust camera tracking, and high-quality rendering. The system incorporates semantic feature embedding into 3D Gaussians, enabling precise semantic scene representation. A feature-level loss is introduced to guide 3D Gaussian optimization, enhancing the accuracy of semantic scene representation. To reduce cumulative drift in tracking and improve semantic reconstruction accuracy, semantic-informed bundle adjustment is proposed, utilizing multi-frame semantic associations for joint optimization of 3D Gaussians and camera poses. Extensive evaluations on the Replica and ScanNet datasets demonstrate superior performance in mapping, tracking, semantic segmentation, and novel view synthesis compared to existing radiance field-based SLAM methods.
**Introduction:**
Dense semantic SLAM is crucial for robotic systems and autonomous driving, integrating semantic understanding into dense map reconstruction. Traditional methods have limitations, such as inability to predict unknown areas. Neural Radiance Fields (NeRF) have addressed these issues but suffer from inefficient rendering and low-quality novel view semantic representation. 3D Gaussian Splatting (3DGS) has shown promise in scene representation and efficient rendering, but existing 3DGS SLAM systems focus on visual mapping, lacking sufficient color information for downstream tasks. This paper introduces SemGauss-SLAM, which integrates semantic feature embedding into 3D Gaussians for more comprehensive scene understanding and high-precision dense semantic SLAM.
**Method:**
SemGauss-SLAM uses a set of isotropic Gaussians with specific properties for scene representation. Semantic feature embedding is introduced to each Gaussian, enabling compact and efficient semantic representation. The system performs RGB and semantic mapping simultaneously, with mapping initialization based on inverse transformation of pixels from the first frame. Tracking involves optimizing camera poses using a constant velocity model and iterative refinement. Loss functions include semantic, feature-level, RGB, and depth losses, with semantic-informed bundle adjustment for joint optimization of camera poses and 3D Gaussians.
**Experiments:**
The system is evaluated on the Replica and ScanNet datasets, showing superior performance in mapping accuracy, tracking, semantic segmentation, and novel view synthesis. Ablation studies validate the effectiveness of feature-level loss and semantic-informed bundle adjustment.
**Conclusion:**
SemGauss-SLAM achieves dense visual mapping, robust camera tracking, and high-quality 3D semantic mapping using 3D Gaussian representation. The system's integration of semantic feature embedding and semantic-informed bundle adjustment enhances accuracy and precision in semantic scene representation and tracking.**SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM**
**Authors:** Siting Zhu
**Abstract:**
SemGauss-SLAM is a novel dense semantic SLAM system that leverages 3D Gaussian representation to achieve accurate 3D semantic mapping, robust camera tracking, and high-quality rendering. The system incorporates semantic feature embedding into 3D Gaussians, enabling precise semantic scene representation. A feature-level loss is introduced to guide 3D Gaussian optimization, enhancing the accuracy of semantic scene representation. To reduce cumulative drift in tracking and improve semantic reconstruction accuracy, semantic-informed bundle adjustment is proposed, utilizing multi-frame semantic associations for joint optimization of 3D Gaussians and camera poses. Extensive evaluations on the Replica and ScanNet datasets demonstrate superior performance in mapping, tracking, semantic segmentation, and novel view synthesis compared to existing radiance field-based SLAM methods.
**Introduction:**
Dense semantic SLAM is crucial for robotic systems and autonomous driving, integrating semantic understanding into dense map reconstruction. Traditional methods have limitations, such as inability to predict unknown areas. Neural Radiance Fields (NeRF) have addressed these issues but suffer from inefficient rendering and low-quality novel view semantic representation. 3D Gaussian Splatting (3DGS) has shown promise in scene representation and efficient rendering, but existing 3DGS SLAM systems focus on visual mapping, lacking sufficient color information for downstream tasks. This paper introduces SemGauss-SLAM, which integrates semantic feature embedding into 3D Gaussians for more comprehensive scene understanding and high-precision dense semantic SLAM.
**Method:**
SemGauss-SLAM uses a set of isotropic Gaussians with specific properties for scene representation. Semantic feature embedding is introduced to each Gaussian, enabling compact and efficient semantic representation. The system performs RGB and semantic mapping simultaneously, with mapping initialization based on inverse transformation of pixels from the first frame. Tracking involves optimizing camera poses using a constant velocity model and iterative refinement. Loss functions include semantic, feature-level, RGB, and depth losses, with semantic-informed bundle adjustment for joint optimization of camera poses and 3D Gaussians.
**Experiments:**
The system is evaluated on the Replica and ScanNet datasets, showing superior performance in mapping accuracy, tracking, semantic segmentation, and novel view synthesis. Ablation studies validate the effectiveness of feature-level loss and semantic-informed bundle adjustment.
**Conclusion:**
SemGauss-SLAM achieves dense visual mapping, robust camera tracking, and high-quality 3D semantic mapping using 3D Gaussian representation. The system's integration of semantic feature embedding and semantic-informed bundle adjustment enhances accuracy and precision in semantic scene representation and tracking.