July 27-August 1, 2024 | Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
RTG-SLAM is a real-time 3D reconstruction system using Gaussian splatting for large-scale environments. It features a compact Gaussian representation and an efficient on-the-fly optimization scheme. The system forces each Gaussian to be either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors. This approach reduces memory and computation costs. For Gaussian optimization, the system adds Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. It only optimizes unstable Gaussians and renders pixels occupied by them, significantly reducing the number of Gaussians to be optimized and pixels to be rendered. This allows real-time performance. RTG-SLAM achieves comparable high-quality reconstruction to state-of-the-art NeRF-based RGBD SLAM methods but with twice the speed and half the memory cost. It also shows superior performance in realism of novel view synthesis and camera tracking accuracy. The system reconstructs large scenes in real time with a Microsoft Azure Kinect, achieving around 16 fps without post-processing. It outperforms other methods in speed and memory, and performs well on benchmark datasets like Replica, TUM-RGBD, and ScanNet++. The system is efficient and suitable for real-time applications, with potential for future extensions to handle outdoor scenes, dynamic objects, and changing lighting conditions.RTG-SLAM is a real-time 3D reconstruction system using Gaussian splatting for large-scale environments. It features a compact Gaussian representation and an efficient on-the-fly optimization scheme. The system forces each Gaussian to be either opaque or nearly transparent, with opaque Gaussians fitting the surface and dominant colors, and transparent ones fitting residual colors. This approach reduces memory and computation costs. For Gaussian optimization, the system adds Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. It only optimizes unstable Gaussians and renders pixels occupied by them, significantly reducing the number of Gaussians to be optimized and pixels to be rendered. This allows real-time performance. RTG-SLAM achieves comparable high-quality reconstruction to state-of-the-art NeRF-based RGBD SLAM methods but with twice the speed and half the memory cost. It also shows superior performance in realism of novel view synthesis and camera tracking accuracy. The system reconstructs large scenes in real time with a Microsoft Azure Kinect, achieving around 16 fps without post-processing. It outperforms other methods in speed and memory, and performs well on benchmark datasets like Replica, TUM-RGBD, and ScanNet++. The system is efficient and suitable for real-time applications, with potential for future extensions to handle outdoor scenes, dynamic objects, and changing lighting conditions.