18 Jul 2024 | Yuedong Chen, Haofei Xu, Chuaxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai
MVSplat is an efficient 3D Gaussian Splatting model that reconstructs 3D scenes from sparse multi-view images. The key innovation is the use of a cost volume representation to capture cross-view feature similarities, which enhances geometry learning. Unlike previous methods that rely on data-driven approaches, MVSplat uses a feed-forward architecture with photometric supervision to learn Gaussian parameters. The model predicts 3D Gaussian centers, opacity, covariance, and color using a 2D U-Net with cross-view attention. It achieves state-of-the-art performance on the RealEstate10K and ACID benchmarks, with 10× fewer parameters and 2× faster inference speed than the latest method pixelSplat. MVSplat outperforms pixelSplat in terms of appearance and geometry quality, as well as cross-dataset generalization. The cost volume representation enables efficient and accurate 3D reconstruction, and the model is trained end-to-end with a simple rendering loss. The model is lightweight, fast, and effective for real-world applications. It is also capable of generalizing across different datasets, as demonstrated in cross-dataset evaluations. The model's design allows it to handle a wide range of input views and is robust to variations in feature distributions. Overall, MVSplat provides a highly efficient and effective solution for 3D Gaussian Splatting from sparse multi-view images.MVSplat is an efficient 3D Gaussian Splatting model that reconstructs 3D scenes from sparse multi-view images. The key innovation is the use of a cost volume representation to capture cross-view feature similarities, which enhances geometry learning. Unlike previous methods that rely on data-driven approaches, MVSplat uses a feed-forward architecture with photometric supervision to learn Gaussian parameters. The model predicts 3D Gaussian centers, opacity, covariance, and color using a 2D U-Net with cross-view attention. It achieves state-of-the-art performance on the RealEstate10K and ACID benchmarks, with 10× fewer parameters and 2× faster inference speed than the latest method pixelSplat. MVSplat outperforms pixelSplat in terms of appearance and geometry quality, as well as cross-dataset generalization. The cost volume representation enables efficient and accurate 3D reconstruction, and the model is trained end-to-end with a simple rendering loss. The model is lightweight, fast, and effective for real-world applications. It is also capable of generalizing across different datasets, as demonstrated in cross-dataset evaluations. The model's design allows it to handle a wide range of input views and is robust to variations in feature distributions. Overall, MVSplat provides a highly efficient and effective solution for 3D Gaussian Splatting from sparse multi-view images.