3 May 2024 | Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang
The paper introduces DTC123, an advanced one-image-to-3D generation pipeline that addresses the limitations of Score Distillation Sampling (SDS) by incorporating a diffusion time-step curriculum. SDS leverages pre-trained 2D diffusion models as teachers to guide the reconstruction of 3D models from a single image, but it often suffers from geometric artifacts and texture saturation due to the uniform treatment of diffusion time-steps. DTC123 proposes a coarse-to-fine generation process where larger time steps capture coarse-grained features, and smaller time steps focus on fine-grained details. This approach ensures more effective knowledge transfer between the teacher and student models, leading to higher-quality, multi-view consistent 3D assets. Extensive experiments on various benchmarks demonstrate the superior performance of DTC123 in terms of geometry quality, texture fidelity, and diversity. The paper also includes a detailed theoretical justification for the diffusion time-step curriculum and discusses the robustness and ablation studies of the proposed method.The paper introduces DTC123, an advanced one-image-to-3D generation pipeline that addresses the limitations of Score Distillation Sampling (SDS) by incorporating a diffusion time-step curriculum. SDS leverages pre-trained 2D diffusion models as teachers to guide the reconstruction of 3D models from a single image, but it often suffers from geometric artifacts and texture saturation due to the uniform treatment of diffusion time-steps. DTC123 proposes a coarse-to-fine generation process where larger time steps capture coarse-grained features, and smaller time steps focus on fine-grained details. This approach ensures more effective knowledge transfer between the teacher and student models, leading to higher-quality, multi-view consistent 3D assets. Extensive experiments on various benchmarks demonstrate the superior performance of DTC123 in terms of geometry quality, texture fidelity, and diversity. The paper also includes a detailed theoretical justification for the diffusion time-step curriculum and discusses the robustness and ablation studies of the proposed method.