Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections.

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections.

17 Jul 2024 | Congrong Xu, Justin Kerr, Angjoo Kanazawa
Splatfacto-W is a method for novel view synthesis from unconstrained in-the-wild image collections, implemented in Nerfstudio. It integrates per-image appearance features and embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. The key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in the-wild scenarios. It improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. The method addresses the challenges of photometric variations and transient occluders in unconstrained image collections. It introduces a latent appearance model that assigns appearance features to each Gaussian point, enabling effective Gaussian color adaptation to variations in reference images. It also introduces a robust mask strategy to handle transient objects during the optimization process, improving the focus on consistent scene features. Additionally, it uses a spherical harmonics-based background model to accurately represent the sky and background elements, ensuring improved multiview consistency. Splatfacto-W is evaluated on three NeRF-W datasets, achieving better performance in terms of PSNR, SSIM, and LPIPS metrics. It supports real-time rendering at over 40 frames per second (fps) and enables dynamic appearance changes. The method is efficient, requiring less than 6 GB of GPU memory and achieving the fastest performance on a single RTX 2080Ti. It also effectively handles background representation, addressing a common limitation in 3DGS implementations. Despite these advancements, there remain challenges such as slow convergence in special lighting conditions and limitations in representing high-frequency background details. Future work will focus on addressing these issues by exploring more sophisticated neural architectures and additional network components to refine transient phenomena and enhance background modeling further.Splatfacto-W is a method for novel view synthesis from unconstrained in-the-wild image collections, implemented in Nerfstudio. It integrates per-image appearance features and embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. The key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in the-wild scenarios. It improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. The method addresses the challenges of photometric variations and transient occluders in unconstrained image collections. It introduces a latent appearance model that assigns appearance features to each Gaussian point, enabling effective Gaussian color adaptation to variations in reference images. It also introduces a robust mask strategy to handle transient objects during the optimization process, improving the focus on consistent scene features. Additionally, it uses a spherical harmonics-based background model to accurately represent the sky and background elements, ensuring improved multiview consistency. Splatfacto-W is evaluated on three NeRF-W datasets, achieving better performance in terms of PSNR, SSIM, and LPIPS metrics. It supports real-time rendering at over 40 frames per second (fps) and enables dynamic appearance changes. The method is efficient, requiring less than 6 GB of GPU memory and achieving the fastest performance on a single RTX 2080Ti. It also effectively handles background representation, addressing a common limitation in 3DGS implementations. Despite these advancements, there remain challenges such as slow convergence in special lighting conditions and limitations in representing high-frequency background details. Future work will focus on addressing these issues by exploring more sophisticated neural architectures and additional network components to refine transient phenomena and enhance background modeling further.
Reach us at info@study.space