latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

30 Jul 2024 | Christopher Wewer¹, Kevin Raj¹, Eddy Ilg², Bernt Schiele¹, and Jan Eric Lenssen¹
latentSplat is a method for scalable and generalizable 3D reconstruction from two reference views. It autoencodes the views into a 3D latent representation consisting of variational feature Gaussians, enabling fast novel view synthesis. The method combines the strengths of regression-based and generative approaches, trained on real video data without 3D supervision. Variational 3D Gaussians model uncertainty explicitly, allowing for efficient sampling and rendering. latentSplat outperforms previous methods in reconstruction quality and generalization, while being fast and scalable to high-resolution data. It enables downstream mesh reconstruction via 3D consistent novel views and is trained purely on real videos. The method is efficient, with a lightweight decoder and efficient Gaussian representation, making it suitable for real-time rendering. It handles both object-centric and general scenes, and performs well in view interpolation and extrapolation. The method is validated on CO3D and RealEstate10k datasets, showing superior performance in generative and perceptual metrics. It also demonstrates effective 3D reconstruction from novel views, with high-quality results and reduced artifacts. The method is efficient in terms of training and inference, with significantly faster rendering compared to state-of-the-art generative models. It is able to handle large-scale scenes and high-resolution images, and provides realistic and detailed novel views. The method is also effective in uncertainty modeling, allowing for accurate generalization to out-of-context views. The results show that latentSplat achieves state-of-the-art quality in novel view synthesis, with high perceptual similarity to the ground truth and 3D consistency. The method is efficient and scalable, making it suitable for real-world applications.latentSplat is a method for scalable and generalizable 3D reconstruction from two reference views. It autoencodes the views into a 3D latent representation consisting of variational feature Gaussians, enabling fast novel view synthesis. The method combines the strengths of regression-based and generative approaches, trained on real video data without 3D supervision. Variational 3D Gaussians model uncertainty explicitly, allowing for efficient sampling and rendering. latentSplat outperforms previous methods in reconstruction quality and generalization, while being fast and scalable to high-resolution data. It enables downstream mesh reconstruction via 3D consistent novel views and is trained purely on real videos. The method is efficient, with a lightweight decoder and efficient Gaussian representation, making it suitable for real-time rendering. It handles both object-centric and general scenes, and performs well in view interpolation and extrapolation. The method is validated on CO3D and RealEstate10k datasets, showing superior performance in generative and perceptual metrics. It also demonstrates effective 3D reconstruction from novel views, with high-quality results and reduced artifacts. The method is efficient in terms of training and inference, with significantly faster rendering compared to state-of-the-art generative models. It is able to handle large-scale scenes and high-resolution images, and provides realistic and detailed novel views. The method is also effective in uncertainty modeling, allowing for accurate generalization to out-of-context views. The results show that latentSplat achieves state-of-the-art quality in novel view synthesis, with high perceptual similarity to the ground truth and 3D consistency. The method is efficient and scalable, making it suitable for real-world applications.
Reach us at info@study.space
[slides] latentSplat%3A Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction | StudySpace