SketchDream: Sketch-based Text-to-3D Generation and Editing

SketchDream: Sketch-based Text-to-3D Generation and Editing

14 May 2024 | FENG-LIN LIU, HONGBO FU, YU-KUN LAI, LIN GAO
SketchDream is a text-driven method for sketch-based 3D generation and editing, enabling high-quality 3D content creation from 2D sketches and text prompts. The method addresses the challenges of 2D-to-3D translation ambiguity and multi-modal condition integration, offering enhanced control over geometry and appearance. It supports both generation and editing of 3D models, with a focus on preserving unedited regions and ensuring natural interactions between components. The system integrates a sketch-based multi-view image generation diffusion model, which leverages depth guidance to establish spatial correspondence and a 3D attention module to ensure 3D consistency. A coarse-to-fine editing framework is proposed to enable local editing, where the coarse stage generates initial results to label edited regions, and the fine stage produces high-quality results with refined details. The method outperforms existing approaches in generating high-quality 3D results and achieving detailed control. It supports NeRF generation from hand-drawn sketches and allows for free-view sketch-based local editing. The framework is validated through extensive experiments, demonstrating superior performance in terms of text faithfulness, sketch faithfulness, geometry quality, and texture quality. The method is also evaluated through user studies, showing better performance and interaction compared to existing approaches. The system is implemented on NVIDIA GPUs and can generate high-quality 3D results from sketches and text prompts, with the ability to edit existing 3D models and add new components. The method is effective in generating realistic 3D content and supports detailed editing of real 3D models.SketchDream is a text-driven method for sketch-based 3D generation and editing, enabling high-quality 3D content creation from 2D sketches and text prompts. The method addresses the challenges of 2D-to-3D translation ambiguity and multi-modal condition integration, offering enhanced control over geometry and appearance. It supports both generation and editing of 3D models, with a focus on preserving unedited regions and ensuring natural interactions between components. The system integrates a sketch-based multi-view image generation diffusion model, which leverages depth guidance to establish spatial correspondence and a 3D attention module to ensure 3D consistency. A coarse-to-fine editing framework is proposed to enable local editing, where the coarse stage generates initial results to label edited regions, and the fine stage produces high-quality results with refined details. The method outperforms existing approaches in generating high-quality 3D results and achieving detailed control. It supports NeRF generation from hand-drawn sketches and allows for free-view sketch-based local editing. The framework is validated through extensive experiments, demonstrating superior performance in terms of text faithfulness, sketch faithfulness, geometry quality, and texture quality. The method is also evaluated through user studies, showing better performance and interaction compared to existing approaches. The system is implemented on NVIDIA GPUs and can generate high-quality 3D results from sketches and text prompts, with the ability to edit existing 3D models and add new components. The method is effective in generating realistic 3D content and supports detailed editing of real 3D models.
Reach us at info@study.space