[slides] GRM%3A Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

GRM is a large-scale 3D reconstruction and generation model that efficiently reconstructs 3D scenes from sparse-view images in about 0.1 seconds. Based on a transformer architecture, GRM translates input pixels into pixel-aligned 3D Gaussians, which are then unprojected to create a dense 3D representation of the scene. This approach enables scalable and efficient reconstruction. Experimental results show that GRM outperforms existing methods in both reconstruction quality and efficiency. GRM is also effective for generative tasks such as text-to-3D and image-to-3D generation when integrated with multi-view diffusion models. The model uses a pure transformer architecture for efficient reconstruction and a novel upsampler to enhance detail reconstruction. GRM's pixel-aligned 3D Gaussians allow for efficient and high-quality 3D reconstruction, and when combined with multi-view diffusion models, it achieves state-of-the-art performance in text-to-3D and image-to-3D generation. The model is trained on a large dataset of 3D objects and can generate high-quality 3D assets from text prompts or single images. GRM's architecture and training process are designed to be efficient and scalable, making it suitable for a wide range of applications in 3D reconstruction and generation.GRM is a large-scale 3D reconstruction and generation model that efficiently reconstructs 3D scenes from sparse-view images in about 0.1 seconds. Based on a transformer architecture, GRM translates input pixels into pixel-aligned 3D Gaussians, which are then unprojected to create a dense 3D representation of the scene. This approach enables scalable and efficient reconstruction. Experimental results show that GRM outperforms existing methods in both reconstruction quality and efficiency. GRM is also effective for generative tasks such as text-to-3D and image-to-3D generation when integrated with multi-view diffusion models. The model uses a pure transformer architecture for efficient reconstruction and a novel upsampler to enhance detail reconstruction. GRM's pixel-aligned 3D Gaussians allow for efficient and high-quality 3D reconstruction, and when combined with multi-view diffusion models, it achieves state-of-the-art performance in text-to-3D and image-to-3D generation. The model is trained on a large dataset of 3D objects and can generate high-quality 3D assets from text prompts or single images. GRM's architecture and training process are designed to be efficient and scalable, making it suitable for a wide range of applications in 3D reconstruction and generation.

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

21 Mar 2024 | Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein