22 Mar 2024 | Kyungmin Lee1 Kihyuk Sohn2 Jinwoo Shin1
**DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow**
Recent advancements in text-to-3D generation have leveraged score distillation methods, which use pre-trained text-to-image (T2I) diffusion models to distill knowledge. However, this approach often results in high variance and prolonged optimization. This paper proposes a novel method, DreamFlow, that enhances text-to-3D optimization by utilizing the T2I diffusion prior in a predetermined timestep schedule. DreamFlow interprets text-to-3D optimization as a multi-view image-to-image translation problem and approximates the probability flow. By designing a practical three-stage coarse-to-fine optimization framework, DreamFlow enables fast generation of high-quality, high-resolution (1024×1024) 3D content. Experiments demonstrate that DreamFlow is 5 times faster than state-of-the-art methods while producing more photorealistic 3D content. The method is evaluated through user preference studies and quantitative comparisons, showing superior performance in terms of photorealism, 3D consistency, and prompt fidelity.**DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow**
Recent advancements in text-to-3D generation have leveraged score distillation methods, which use pre-trained text-to-image (T2I) diffusion models to distill knowledge. However, this approach often results in high variance and prolonged optimization. This paper proposes a novel method, DreamFlow, that enhances text-to-3D optimization by utilizing the T2I diffusion prior in a predetermined timestep schedule. DreamFlow interprets text-to-3D optimization as a multi-view image-to-image translation problem and approximates the probability flow. By designing a practical three-stage coarse-to-fine optimization framework, DreamFlow enables fast generation of high-quality, high-resolution (1024×1024) 3D content. Experiments demonstrate that DreamFlow is 5 times faster than state-of-the-art methods while producing more photorealistic 3D content. The method is evaluated through user preference studies and quantitative comparisons, showing superior performance in terms of photorealism, 3D consistency, and prompt fidelity.