LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

22 Mar 2024 | Kevin Xie*, Jonathan Lorraine*, Tianshi Cao*, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, and Xiaohui Zeng
LATTE3D is a large-scale amortized text-to-enhanced 3D synthesis method that addresses the limitations of existing approaches by achieving fast, high-quality generation on a significantly larger prompt set. Key contributions include: 1. **Scalable Architecture**: LATTE3D introduces a novel architecture that amortizes both neural field and textured surface generation, enabling real-time generation of highly detailed textured meshes. 2. **3D Data Integration**: The method leverages 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to improve robustness and quality. 3. **Amortized Learning**: LATTE3D amortizes the surface-based refinement stage, significantly boosting quality and generalization. **Methods**: - **Pretraining**: LATTE3D initializes the model with a reconstruction pretraining step to stabilize training. - **Model Architecture**: The model consists of two networks, one for geometry and one for texture, with shared encoders in the pretraining and stage-1 training. - **Amortized Learning**: The method uses a two-stage pipeline for training, incorporating 3D-aware 2D SDS losses and regularization losses to improve geometry and texture details. - **Inference**: During inference, the model generates 3D textured meshes from text prompts in 400ms, with optional test-time optimization for further quality enhancement. **Experiments**: - **Dataset Construction**: A new dataset *gpt-101k* is constructed with 101k text prompts and 34k shapes. - **Evaluation**: LATTE3D is evaluated on seen and unseen prompts, showing competitive performance and generalization abilities compared to baselines. - **Ablation Studies**: Various components of the method are evaluated to understand their impact on performance. **Conclusion**: LATTE3D provides a scalable approach to text-to-enhanced 3D generation, achieving high-quality results within 400ms, with potential for further improvement through test-time optimization.LATTE3D is a large-scale amortized text-to-enhanced 3D synthesis method that addresses the limitations of existing approaches by achieving fast, high-quality generation on a significantly larger prompt set. Key contributions include: 1. **Scalable Architecture**: LATTE3D introduces a novel architecture that amortizes both neural field and textured surface generation, enabling real-time generation of highly detailed textured meshes. 2. **3D Data Integration**: The method leverages 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to improve robustness and quality. 3. **Amortized Learning**: LATTE3D amortizes the surface-based refinement stage, significantly boosting quality and generalization. **Methods**: - **Pretraining**: LATTE3D initializes the model with a reconstruction pretraining step to stabilize training. - **Model Architecture**: The model consists of two networks, one for geometry and one for texture, with shared encoders in the pretraining and stage-1 training. - **Amortized Learning**: The method uses a two-stage pipeline for training, incorporating 3D-aware 2D SDS losses and regularization losses to improve geometry and texture details. - **Inference**: During inference, the model generates 3D textured meshes from text prompts in 400ms, with optional test-time optimization for further quality enhancement. **Experiments**: - **Dataset Construction**: A new dataset *gpt-101k* is constructed with 101k text prompts and 34k shapes. - **Evaluation**: LATTE3D is evaluated on seen and unseen prompts, showing competitive performance and generalization abilities compared to baselines. - **Ablation Studies**: Various components of the method are evaluated to understand their impact on performance. **Conclusion**: LATTE3D provides a scalable approach to text-to-enhanced 3D generation, achieving high-quality results within 400ms, with potential for further improvement through test-time optimization.
Reach us at info@study.space