MVGamba is a unified 3D content generation framework that integrates multi-view diffusion models with a scalable multi-view reconstructor based on the RNN-like State Space Model (SSM). It addresses the issue of multi-view inconsistency and blurred textures in existing Gaussian reconstruction models by propagating causal context containing multi-view information for cross-view self-refinement. MVGamba generates a long sequence of Gaussians with linear complexity, enabling high-quality 3D content modeling. The framework is designed to be general and lightweight, achieving state-of-the-art performance in various 3D generation tasks with significantly fewer parameters compared to other models. Extensive experiments demonstrate that MVGamba outperforms baselines in image-to-3D, text-to-3D, and sparse-view reconstruction tasks, with a model size of only 49M parameters. The paper also includes ablation studies and discussions on the limitations and future improvements of MVGamba.MVGamba is a unified 3D content generation framework that integrates multi-view diffusion models with a scalable multi-view reconstructor based on the RNN-like State Space Model (SSM). It addresses the issue of multi-view inconsistency and blurred textures in existing Gaussian reconstruction models by propagating causal context containing multi-view information for cross-view self-refinement. MVGamba generates a long sequence of Gaussians with linear complexity, enabling high-quality 3D content modeling. The framework is designed to be general and lightweight, achieving state-of-the-art performance in various 3D generation tasks with significantly fewer parameters compared to other models. Extensive experiments demonstrate that MVGamba outperforms baselines in image-to-3D, text-to-3D, and sparse-view reconstruction tasks, with a model size of only 49M parameters. The paper also includes ablation studies and discussions on the limitations and future improvements of MVGamba.