26 Mar 2024 | Mingrui Li *1, Shuhong Liu *2, Heng Zhou3, Guohao Zhu4, Na Cheng1, Tianchen Deng5, and Hongyu Wang †1
SGS-SLAM is a novel semantic visual SLAM system that leverages Gaussian Splatting to integrate appearance, geometry, and semantic features through multi-channel optimization. This approach addresses the limitations of neural implicit SLAM systems, such as over-smoothing and difficulty in object-level geometry and segmentation. The system introduces a unique semantic feature loss to enhance object optimization and employs a semantic-guided keyframe selection strategy to prevent erroneous reconstructions. Extensive experiments demonstrate that SGS-SLAM outperforms state-of-the-art methods in camera pose estimation, map reconstruction, semantic segmentation, and object-level geometric accuracy while maintaining real-time rendering capabilities. Key contributions include the first semantic dense visual SLAM system based on 3D Gaussians, which offers high-quality reconstructions and precise scene interpretation. The method also enables efficient scene manipulation and provides a solid foundation for downstream tasks like robotics and mixed reality applications.SGS-SLAM is a novel semantic visual SLAM system that leverages Gaussian Splatting to integrate appearance, geometry, and semantic features through multi-channel optimization. This approach addresses the limitations of neural implicit SLAM systems, such as over-smoothing and difficulty in object-level geometry and segmentation. The system introduces a unique semantic feature loss to enhance object optimization and employs a semantic-guided keyframe selection strategy to prevent erroneous reconstructions. Extensive experiments demonstrate that SGS-SLAM outperforms state-of-the-art methods in camera pose estimation, map reconstruction, semantic segmentation, and object-level geometric accuracy while maintaining real-time rendering capabilities. Key contributions include the first semantic dense visual SLAM system based on 3D Gaussians, which offers high-quality reconstructions and precise scene interpretation. The method also enables efficient scene manipulation and provides a solid foundation for downstream tasks like robotics and mixed reality applications.