Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

13 Jun 2024 | Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang
Consistent3D is a novel method for text-to-3D generation that addresses the limitations of existing approaches like Score Distillation Sampling (SDS). SDS, while effective, suffers from geometry collapse and poor textures due to the inherent randomness in stochastic differential equation (SDE) sampling. Consistent3D introduces a deterministic sampling prior based on ordinary differential equations (ODEs) to ensure consistent and accurate 3D generation. The method involves estimating the desired 3D score function using a pre-trained 2D diffusion model, building an ODE for trajectory sampling, and introducing a Consistency Distillation Sampling (CDS) loss to distill the deterministic prior into the 3D model. This approach ensures predictable and consistent guidance, leading to high-fidelity and diverse 3D objects and large-scale scenes. Experimental results show that Consistent3D outperforms existing methods in both qualitative and quantitative terms, demonstrating its effectiveness in generating realistic 3D content from text prompts. The method is implemented in PyTorch with a single NVIDIA A100 GPU and uses a combination of multi-resolution hash grids and Instant NGP for efficient training and rendering. The framework is capable of generating a variety of 3D representations, including 3D Gaussian Splatting, and has been shown to produce high-fidelity 3D models with intricate details in a short amount of time. The results highlight the superiority of Consistent3D in text-to-3D generation, offering a more reliable and consistent framework compared to previous approaches.Consistent3D is a novel method for text-to-3D generation that addresses the limitations of existing approaches like Score Distillation Sampling (SDS). SDS, while effective, suffers from geometry collapse and poor textures due to the inherent randomness in stochastic differential equation (SDE) sampling. Consistent3D introduces a deterministic sampling prior based on ordinary differential equations (ODEs) to ensure consistent and accurate 3D generation. The method involves estimating the desired 3D score function using a pre-trained 2D diffusion model, building an ODE for trajectory sampling, and introducing a Consistency Distillation Sampling (CDS) loss to distill the deterministic prior into the 3D model. This approach ensures predictable and consistent guidance, leading to high-fidelity and diverse 3D objects and large-scale scenes. Experimental results show that Consistent3D outperforms existing methods in both qualitative and quantitative terms, demonstrating its effectiveness in generating realistic 3D content from text prompts. The method is implemented in PyTorch with a single NVIDIA A100 GPU and uses a combination of multi-resolution hash grids and Instant NGP for efficient training and rendering. The framework is capable of generating a variety of 3D representations, including 3D Gaussian Splatting, and has been shown to produce high-fidelity 3D models with intricate details in a short amount of time. The results highlight the superiority of Consistent3D in text-to-3D generation, offering a more reliable and consistent framework compared to previous approaches.
Reach us at info@study.space