[slides] Diffusion Time-step Curriculum for One Image to 3D Generation

The paper introduces DTC123, a diffusion time-step curriculum-based pipeline for generating high-fidelity 3D assets from a single image. The method addresses the limitations of existing Score Distillation Sampling (SDS) approaches, which often produce geometric artifacts and texture saturation due to uniform treatment of diffusion time-steps during optimization. DTC123 proposes a curriculum that gradually refines 3D models from coarse to fine, with larger time-steps capturing coarse-grained geometry and smaller ones focusing on fine-grained details. The pipeline involves both teacher and student models collaborating with a time-step curriculum, leading to multi-view consistent, high-quality 3D assets. The method is evaluated on benchmark datasets, demonstrating superior performance in terms of reconstruction quality and 3D consistency. DTC123 also incorporates advanced regularization techniques to enhance generation efficiency and geometric robustness. The paper highlights three key contributions: an end-to-end one-image-to-3D pipeline, a plug-and-play training principle for SDS-based models, and a systematic validation of the diffusion time-step curriculum. The method is shown to outperform other state-of-the-art approaches in generating high-fidelity, multi-view consistent 3D assets.The paper introduces DTC123, a diffusion time-step curriculum-based pipeline for generating high-fidelity 3D assets from a single image. The method addresses the limitations of existing Score Distillation Sampling (SDS) approaches, which often produce geometric artifacts and texture saturation due to uniform treatment of diffusion time-steps during optimization. DTC123 proposes a curriculum that gradually refines 3D models from coarse to fine, with larger time-steps capturing coarse-grained geometry and smaller ones focusing on fine-grained details. The pipeline involves both teacher and student models collaborating with a time-step curriculum, leading to multi-view consistent, high-quality 3D assets. The method is evaluated on benchmark datasets, demonstrating superior performance in terms of reconstruction quality and 3D consistency. DTC123 also incorporates advanced regularization techniques to enhance generation efficiency and geometric robustness. The paper highlights three key contributions: an end-to-end one-image-to-3D pipeline, a plug-and-play training principle for SDS-based models, and a systematic validation of the diffusion time-step curriculum. The method is shown to outperform other state-of-the-art approaches in generating high-fidelity, multi-view consistent 3D assets.

Diffusion Time-step Curriculum for One Image to 3D Generation

3 May 2024 | Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang