July 27-August 1, 2024, Denver, CO, USA | Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
**RTG-SLAM: Real-time 3D Reconstruction at Scale Using Gaussian Splatting**
**Abstract:**
We present RTG-SLAM, a real-time 3D reconstruction system for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and an efficient on-the-fly Gaussian optimization scheme. Each Gaussian is either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth differently from color, a single opaque Gaussian can well fit a local surface region without multiple overlapping Gaussians, reducing memory and computation costs. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We categorize Gaussians into stable and unstable ones, optimizing only the unstable Gaussians and rendering only the pixels occupied by them. This reduces the number of Gaussians and pixels, enabling real-time optimization. Our system achieves comparable high-quality reconstruction to state-of-the-art NeRF-based RGBD SLAM but with twice the speed and half the memory cost, and superior performance in novel view synthesis and camera tracking accuracy.
**Keywords:**
SLAM, 3D reconstruction, Gaussian splatting, RGBD, scan
**Related Work:**
- Classical RGBD dense SLAM: Various methods use point clouds, surfels, and signed distance functions for 3D reconstruction.
- NeRF-based RGBD dense SLAM: Methods like iMap, NICE-SLAM, and ESLAM use neural radiance fields for dense SLAM, achieving high-quality results but with high memory costs.
- Gaussian-based RGBD dense SLAM: Concurrent works integrate 3D Gaussians into dense RGBD SLAM, but face challenges in real-time performance and large-scale scene reconstruction.
**Method:**
- **Compact Gaussian Representation:** Gaussians are forced to be either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors.
- **Image Rendering:** Depth is rendered differently from color using opaque Gaussians as ellipsoid discs, ensuring accurate depth representation.
- **Online Reconstruction Process:** Includes input preprocessing, Gaussian addition, Gaussian optimization, state management, camera tracking, keyframe selection, and global optimization.
**Evaluation:**
- **Time and Memory Performance:** RTG-SLAM runs at 16.28 fps with 7.3GB memory, outperforming Co-SLAM (8.77 fps, 17GB memory) and Point-SLAM (0.22 fps, 9.4GB memory).
- **Tracking Accuracy:** Achieves comparable accuracy to classical SLAM methods on real-world datasets.
- **Novel View Synthesis:** Produces higher quality images with fewer artifacts and better appearance fidelity compared to NeRF-based methods.
- **Reconstruction Quality:** Achieves comparable**RTG-SLAM: Real-time 3D Reconstruction at Scale Using Gaussian Splatting**
**Abstract:**
We present RTG-SLAM, a real-time 3D reconstruction system for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and an efficient on-the-fly Gaussian optimization scheme. Each Gaussian is either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth differently from color, a single opaque Gaussian can well fit a local surface region without multiple overlapping Gaussians, reducing memory and computation costs. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We categorize Gaussians into stable and unstable ones, optimizing only the unstable Gaussians and rendering only the pixels occupied by them. This reduces the number of Gaussians and pixels, enabling real-time optimization. Our system achieves comparable high-quality reconstruction to state-of-the-art NeRF-based RGBD SLAM but with twice the speed and half the memory cost, and superior performance in novel view synthesis and camera tracking accuracy.
**Keywords:**
SLAM, 3D reconstruction, Gaussian splatting, RGBD, scan
**Related Work:**
- Classical RGBD dense SLAM: Various methods use point clouds, surfels, and signed distance functions for 3D reconstruction.
- NeRF-based RGBD dense SLAM: Methods like iMap, NICE-SLAM, and ESLAM use neural radiance fields for dense SLAM, achieving high-quality results but with high memory costs.
- Gaussian-based RGBD dense SLAM: Concurrent works integrate 3D Gaussians into dense RGBD SLAM, but face challenges in real-time performance and large-scale scene reconstruction.
**Method:**
- **Compact Gaussian Representation:** Gaussians are forced to be either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors.
- **Image Rendering:** Depth is rendered differently from color using opaque Gaussians as ellipsoid discs, ensuring accurate depth representation.
- **Online Reconstruction Process:** Includes input preprocessing, Gaussian addition, Gaussian optimization, state management, camera tracking, keyframe selection, and global optimization.
**Evaluation:**
- **Time and Memory Performance:** RTG-SLAM runs at 16.28 fps with 7.3GB memory, outperforming Co-SLAM (8.77 fps, 17GB memory) and Point-SLAM (0.22 fps, 9.4GB memory).
- **Tracking Accuracy:** Achieves comparable accuracy to classical SLAM methods on real-world datasets.
- **Novel View Synthesis:** Produces higher quality images with fewer artifacts and better appearance fidelity compared to NeRF-based methods.
- **Reconstruction Quality:** Achieves comparable