Magic3D: High-Resolution Text-to-3D Content Creation

Magic3D: High-Resolution Text-to-3D Content Creation

25 Mar 2023 | Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler†, Ming-Yu Liu†, Tsung-Yi Lin
**Magic3D: High-Resolution Text-to-3D Content Creation** **Authors:** Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin **Abstract:** DreamFusion has demonstrated the effectiveness of using a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving high-quality text-to-3D synthesis. However, it suffers from slow optimization and low-resolution supervision, leading to low-quality 3D models with long processing times. This paper addresses these limitations by introducing a two-stage optimization framework. The first stage uses a low-resolution diffusion prior and a sparse 3D hash grid structure to obtain a coarse model, which is then fine-tuned with a high-resolution latent diffusion model to create a textured 3D mesh model. The method, called Magic3D, can generate high-quality 3D mesh models in 40 minutes, 2 times faster than DreamFusion, while achieving higher resolution. User studies show that 61.7% of raters prefer Magic3D over DreamFusion. Magic3D also provides users with new ways to control 3D synthesis, opening up new avenues for creative applications. **Contributions:** - Magic3D is a framework for high-quality 3D content synthesis using text prompts, improving upon DreamFusion by using a coarse-to-fine strategy with multiple diffusion priors. - Magic3D synthesizes 3D content with 8 times higher resolution supervision and is 2 times faster than DreamFusion. - Magic3D significantly improves user preference (61.7%) over DreamFusion in qualitative comparisons. **Methods:** - **Coarse-to-fine Diffusion Priors:** Magic3D uses two different diffusion priors at coarse and fine resolutions to optimize the 3D representation. - **Scene Models:** The coarse stage uses a neural field representation, while the fine stage uses a textured 3D mesh representation. - **Optimization:** The coarse stage optimizes a neural field representation, and the fine stage optimizes a mesh representation using a differentiable rasterizer and high-resolution images. **Experiments:** - **Speed Evaluation:** Magic3D takes 40 minutes to generate high-quality 3D mesh models from text prompts. - **Qualitative Comparisons:** Magic3D generates much higher quality 3D shapes in terms of geometry and texture compared to DreamFusion. - **User Studies:** 61.7% of raters prefer Magic3D over DreamFusion. **Additional Features:** - **Personalized Text-to-3D:** Magic3D can generate 3D models of specific subjects by fine-tuning diffusion models with DreamBooth. - **Prompt-Based**Magic3D: High-Resolution Text-to-3D Content Creation** **Authors:** Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin **Abstract:** DreamFusion has demonstrated the effectiveness of using a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving high-quality text-to-3D synthesis. However, it suffers from slow optimization and low-resolution supervision, leading to low-quality 3D models with long processing times. This paper addresses these limitations by introducing a two-stage optimization framework. The first stage uses a low-resolution diffusion prior and a sparse 3D hash grid structure to obtain a coarse model, which is then fine-tuned with a high-resolution latent diffusion model to create a textured 3D mesh model. The method, called Magic3D, can generate high-quality 3D mesh models in 40 minutes, 2 times faster than DreamFusion, while achieving higher resolution. User studies show that 61.7% of raters prefer Magic3D over DreamFusion. Magic3D also provides users with new ways to control 3D synthesis, opening up new avenues for creative applications. **Contributions:** - Magic3D is a framework for high-quality 3D content synthesis using text prompts, improving upon DreamFusion by using a coarse-to-fine strategy with multiple diffusion priors. - Magic3D synthesizes 3D content with 8 times higher resolution supervision and is 2 times faster than DreamFusion. - Magic3D significantly improves user preference (61.7%) over DreamFusion in qualitative comparisons. **Methods:** - **Coarse-to-fine Diffusion Priors:** Magic3D uses two different diffusion priors at coarse and fine resolutions to optimize the 3D representation. - **Scene Models:** The coarse stage uses a neural field representation, while the fine stage uses a textured 3D mesh representation. - **Optimization:** The coarse stage optimizes a neural field representation, and the fine stage optimizes a mesh representation using a differentiable rasterizer and high-resolution images. **Experiments:** - **Speed Evaluation:** Magic3D takes 40 minutes to generate high-quality 3D mesh models from text prompts. - **Qualitative Comparisons:** Magic3D generates much higher quality 3D shapes in terms of geometry and texture compared to DreamFusion. - **User Studies:** 61.7% of raters prefer Magic3D over DreamFusion. **Additional Features:** - **Personalized Text-to-3D:** Magic3D can generate 3D models of specific subjects by fine-tuning diffusion models with DreamBooth. - **Prompt-Based
Reach us at info@study.space