Splatfacto-W is a method for novel view synthesis from unconstrained in-the-wild image collections, implemented in Nerfstudio. It integrates per-image appearance features and embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. The key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in the-wild scenarios. It improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS.
The method addresses the challenges of photometric variations and transient occluders in unconstrained image collections. It introduces a latent appearance model that assigns appearance features to each Gaussian point, enabling effective Gaussian color adaptation to variations in reference images. It also introduces a robust mask strategy to handle transient objects during the optimization process, improving the focus on consistent scene features. Additionally, it uses a spherical harmonics-based background model to accurately represent the sky and background elements, ensuring improved multiview consistency.
Splatfacto-W is evaluated on three NeRF-W datasets, achieving better performance in terms of PSNR, SSIM, and LPIPS metrics. It supports real-time rendering at over 40 frames per second (fps) and enables dynamic appearance changes. The method is efficient, requiring less than 6 GB of GPU memory and achieving the fastest performance on a single RTX 2080Ti. It also effectively handles background representation, addressing a common limitation in 3DGS implementations. Despite these advancements, there remain challenges such as slow convergence in special lighting conditions and limitations in representing high-frequency background details. Future work will focus on addressing these issues by exploring more sophisticated neural architectures and additional network components to refine transient phenomena and enhance background modeling further.Splatfacto-W is a method for novel view synthesis from unconstrained in-the-wild image collections, implemented in Nerfstudio. It integrates per-image appearance features and embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. The key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in the-wild scenarios. It improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS.
The method addresses the challenges of photometric variations and transient occluders in unconstrained image collections. It introduces a latent appearance model that assigns appearance features to each Gaussian point, enabling effective Gaussian color adaptation to variations in reference images. It also introduces a robust mask strategy to handle transient objects during the optimization process, improving the focus on consistent scene features. Additionally, it uses a spherical harmonics-based background model to accurately represent the sky and background elements, ensuring improved multiview consistency.
Splatfacto-W is evaluated on three NeRF-W datasets, achieving better performance in terms of PSNR, SSIM, and LPIPS metrics. It supports real-time rendering at over 40 frames per second (fps) and enables dynamic appearance changes. The method is efficient, requiring less than 6 GB of GPU memory and achieving the fastest performance on a single RTX 2080Ti. It also effectively handles background representation, addressing a common limitation in 3DGS implementations. Despite these advancements, there remain challenges such as slow convergence in special lighting conditions and limitations in representing high-frequency background details. Future work will focus on addressing these issues by exploring more sophisticated neural architectures and additional network components to refine transient phenomena and enhance background modeling further.