29 Jul 2024 | SARA SABOUR* and LILY GOLI*, Google Deepmind, University of Toronto, Canada GEORGE KOPANAS, Google AR, United Kingdom MARK MATHEWS, Google Deepmind, United States DMITRY LAGUN, Google Deepmind, United States LEONIDAS GUIBAS, Google Deepmind, Stanford University, United States ALEC JACOBSON, University of Toronto, Canada DAVID FLEET, Google Deepmind, University of Toronto, Canada ANDREA TAGLIASACCHI, Google Deepmind, University of Toronto, Simon Fraser University, Canada
SpotLessSplats is a method for 3D Gaussian Splatting (3DGS) that addresses the challenge of reconstructing 3D scenes from 2D images in real-world settings, where moving objects, lighting variations, and other photometric inconsistencies can degrade performance. The method leverages pre-trained and general-purpose features from text-to-image models to robustly identify and mask out transient distractors, such as moving people or pets, without requiring explicit supervision. It introduces two main approaches: spatial clustering and spatio-temporal clustering, which use learned semantic features to detect and remove distractors. Additionally, SpotLessSplats includes a sparsification strategy that reduces the number of Gaussians used in the reconstruction, improving efficiency and reducing computational and memory costs. The method is evaluated on challenging benchmarks and demonstrates superior reconstruction quality compared to existing methods, both visually and quantitatively. Key contributions include an adaptive robust loss function, a novel sparsification method, and comprehensive evaluation on standard benchmarks.SpotLessSplats is a method for 3D Gaussian Splatting (3DGS) that addresses the challenge of reconstructing 3D scenes from 2D images in real-world settings, where moving objects, lighting variations, and other photometric inconsistencies can degrade performance. The method leverages pre-trained and general-purpose features from text-to-image models to robustly identify and mask out transient distractors, such as moving people or pets, without requiring explicit supervision. It introduces two main approaches: spatial clustering and spatio-temporal clustering, which use learned semantic features to detect and remove distractors. Additionally, SpotLessSplats includes a sparsification strategy that reduces the number of Gaussians used in the reconstruction, improving efficiency and reducing computational and memory costs. The method is evaluated on challenging benchmarks and demonstrates superior reconstruction quality compared to existing methods, both visually and quantitatively. Key contributions include an adaptive robust loss function, a novel sparsification method, and comprehensive evaluation on standard benchmarks.