7 Feb 2024 | Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu
LGM is a novel framework for high-resolution 3D content creation, generating 3D Gaussians from text prompts or single-view images. The method employs an asymmetric U-Net as a high-throughput backbone, which operates on multi-view images and can be generated from text or single-view input using multi-view diffusion models. The key innovations include a multi-view Gaussian representation for efficient and powerful 3D representation, and an asymmetric U-Net for high-throughput training. The model achieves high-resolution 3D generation with a resolution of 512, maintaining fast generation speed of 5 seconds. Extensive experiments show that the method produces high-fidelity and efficient 3D models, outperforming existing methods in both text-to-3D and image-to-3D tasks. The model also includes a mesh extraction algorithm to convert generated 3D Gaussians into smooth and textured meshes. The method is trained on the Objaverse dataset, and the results demonstrate its effectiveness in generating diverse and high-quality 3D models. The model has limitations, including the resolution of multi-view images and potential inconsistencies in generated Gaussians. Despite these limitations, the method achieves high-resolution 3D generation with high efficiency and quality.LGM is a novel framework for high-resolution 3D content creation, generating 3D Gaussians from text prompts or single-view images. The method employs an asymmetric U-Net as a high-throughput backbone, which operates on multi-view images and can be generated from text or single-view input using multi-view diffusion models. The key innovations include a multi-view Gaussian representation for efficient and powerful 3D representation, and an asymmetric U-Net for high-throughput training. The model achieves high-resolution 3D generation with a resolution of 512, maintaining fast generation speed of 5 seconds. Extensive experiments show that the method produces high-fidelity and efficient 3D models, outperforming existing methods in both text-to-3D and image-to-3D tasks. The model also includes a mesh extraction algorithm to convert generated 3D Gaussians into smooth and textured meshes. The method is trained on the Objaverse dataset, and the results demonstrate its effectiveness in generating diverse and high-quality 3D models. The model has limitations, including the resolution of multi-view images and potential inconsistencies in generated Gaussians. Despite these limitations, the method achieves high-resolution 3D generation with high efficiency and quality.