25 Mar 2023 | Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler†, Ming-Yu Liu†, Tsung-Yi Lin
Magic3D is a high-resolution text-to-3D content creation framework that significantly improves upon existing methods like DreamFusion. It addresses two key limitations: slow optimization of Neural Radiance Fields (NeRF) and low-resolution image supervision, leading to low-quality 3D models. Magic3D employs a two-stage optimization approach, first using a low-resolution diffusion prior and a sparse 3D hash grid to accelerate the process, then refining a textured 3D mesh with a high-resolution latent diffusion model. This method generates high-quality 3D models in 40 minutes, which is twice as fast as DreamFusion, while achieving higher resolution. User studies show that 61.7% of raters prefer Magic3D over DreamFusion. The framework also provides new ways to control 3D synthesis through image-conditioned generation, opening up new creative applications.
Magic3D uses a coarse-to-fine optimization strategy with multiple diffusion priors at different resolutions to generate both view-consistent geometry and high-resolution details. In the first stage, it optimizes a coarse neural field representation using a memory- and compute-efficient hash grid. In the second stage, it switches to optimizing mesh representations, allowing the use of high-resolution diffusion priors up to 512x512. It leverages an efficient differentiable rasterizer and camera close-ups to recover high-frequency details in geometry and texture. The resulting 3D content is high-fidelity and can be imported and visualized in standard graphics software at twice the speed of DreamFusion.
The framework also extends image editing techniques to 3D object editing, providing users with creative controls over the 3D synthesis process. Magic3D enables personalized text-to-3D generation using DreamBooth for fine-tuning diffusion models with input images, and prompt-based editing for modifying 3D models with new text prompts. These capabilities enhance user control and flexibility in 3D content creation. The method demonstrates significant improvements in both quality and efficiency, making 3D content creation more accessible and creative.Magic3D is a high-resolution text-to-3D content creation framework that significantly improves upon existing methods like DreamFusion. It addresses two key limitations: slow optimization of Neural Radiance Fields (NeRF) and low-resolution image supervision, leading to low-quality 3D models. Magic3D employs a two-stage optimization approach, first using a low-resolution diffusion prior and a sparse 3D hash grid to accelerate the process, then refining a textured 3D mesh with a high-resolution latent diffusion model. This method generates high-quality 3D models in 40 minutes, which is twice as fast as DreamFusion, while achieving higher resolution. User studies show that 61.7% of raters prefer Magic3D over DreamFusion. The framework also provides new ways to control 3D synthesis through image-conditioned generation, opening up new creative applications.
Magic3D uses a coarse-to-fine optimization strategy with multiple diffusion priors at different resolutions to generate both view-consistent geometry and high-resolution details. In the first stage, it optimizes a coarse neural field representation using a memory- and compute-efficient hash grid. In the second stage, it switches to optimizing mesh representations, allowing the use of high-resolution diffusion priors up to 512x512. It leverages an efficient differentiable rasterizer and camera close-ups to recover high-frequency details in geometry and texture. The resulting 3D content is high-fidelity and can be imported and visualized in standard graphics software at twice the speed of DreamFusion.
The framework also extends image editing techniques to 3D object editing, providing users with creative controls over the 3D synthesis process. Magic3D enables personalized text-to-3D generation using DreamBooth for fine-tuning diffusion models with input images, and prompt-based editing for modifying 3D models with new text prompts. These capabilities enhance user control and flexibility in 3D content creation. The method demonstrates significant improvements in both quality and efficiency, making 3D content creation more accessible and creative.