22 Mar 2024 | Kevin Xie*, Jonathan Lorraine*, Tianshi Cao*, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, and Xiaohui Zeng
LATTE3D is a large-scale amortized text-to-enhanced 3D synthesis method that enables fast, high-quality generation of 3D objects from text prompts. The method addresses the limitations of previous approaches, such as slow generation times and poor generalization to large prompt sets. LATTE3D achieves real-time generation by amortizing both neural field and textured surface generation in a single forward pass, producing highly detailed textured meshes in 400ms on a single A6000 GPU. It leverages 3D data during training through 3D-aware diffusion priors, shape regularization, and model initialization to improve robustness to diverse and complex prompts. The method also supports fast test-time optimization to further enhance quality. LATTE3D outperforms existing methods in terms of speed, quality, and generalization, as demonstrated by quantitative and qualitative results on unseen prompts. The model is also applicable for 3D stylization, allowing users to generate variations of existing 3D assets. The method is evaluated on a large dataset of 101k text prompts and 34k shapes, showing competitive performance against state-of-the-art baselines. LATTE3D's architecture is scalable and efficient, enabling generation of high-quality 3D content at interactive speeds.LATTE3D is a large-scale amortized text-to-enhanced 3D synthesis method that enables fast, high-quality generation of 3D objects from text prompts. The method addresses the limitations of previous approaches, such as slow generation times and poor generalization to large prompt sets. LATTE3D achieves real-time generation by amortizing both neural field and textured surface generation in a single forward pass, producing highly detailed textured meshes in 400ms on a single A6000 GPU. It leverages 3D data during training through 3D-aware diffusion priors, shape regularization, and model initialization to improve robustness to diverse and complex prompts. The method also supports fast test-time optimization to further enhance quality. LATTE3D outperforms existing methods in terms of speed, quality, and generalization, as demonstrated by quantitative and qualitative results on unseen prompts. The model is also applicable for 3D stylization, allowing users to generate variations of existing 3D assets. The method is evaluated on a large dataset of 101k text prompts and 34k shapes, showing competitive performance against state-of-the-art baselines. LATTE3D's architecture is scalable and efficient, enabling generation of high-quality 3D content at interactive speeds.