Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction

Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction

24 May 2024 | Qiuhong Shen, Zike Wu, Xuanyu Yi, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
Gamba is an end-to-end 3D reconstruction model designed to efficiently reconstruct 3D assets from a single image at millisecond speed. The model combines 3D Gaussian Splatting (3DGS) with Mamba, a scalable sequential network, to achieve fast and high-quality reconstruction. Key contributions include: 1. **Efficient Backbone Design**: Gamba uses a Mamba-based GambaFormer network to model 3DGS reconstruction as sequential prediction with linear scalability of token length, allowing for a substantial number of Gaussians. 2. **Robust Gaussian Constraints**: Radial mask constraints derived from multi-view masks eliminate the need for explicit point cloud supervision in training, enhancing robustness and efficiency. Gamba was trained on the Objaverse dataset and evaluated on the Google Scanned Object (GSO) dataset. Experimental results demonstrate its competitive performance in both qualitative and quantitative assessments, achieving a reconstruction speed of 0.05 seconds on a single NVIDIA A100 GPU, which is about 1,000 times faster than optimization-based methods. The model outperforms existing approaches in terms of reconstruction quality and speed, making it suitable for various applications such as AR/VR content generation and autonomous vehicle path planning.Gamba is an end-to-end 3D reconstruction model designed to efficiently reconstruct 3D assets from a single image at millisecond speed. The model combines 3D Gaussian Splatting (3DGS) with Mamba, a scalable sequential network, to achieve fast and high-quality reconstruction. Key contributions include: 1. **Efficient Backbone Design**: Gamba uses a Mamba-based GambaFormer network to model 3DGS reconstruction as sequential prediction with linear scalability of token length, allowing for a substantial number of Gaussians. 2. **Robust Gaussian Constraints**: Radial mask constraints derived from multi-view masks eliminate the need for explicit point cloud supervision in training, enhancing robustness and efficiency. Gamba was trained on the Objaverse dataset and evaluated on the Google Scanned Object (GSO) dataset. Experimental results demonstrate its competitive performance in both qualitative and quantitative assessments, achieving a reconstruction speed of 0.05 seconds on a single NVIDIA A100 GPU, which is about 1,000 times faster than optimization-based methods. The model outperforms existing approaches in terms of reconstruction quality and speed, making it suitable for various applications such as AR/VR content generation and autonomous vehicle path planning.
Reach us at info@study.space
[slides] Gamba%3A Marry Gaussian Splatting with Mamba for single view 3D reconstruction | StudySpace