Understanding ART3D%3A 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

**ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation** **Authors:** Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li **Institution:** Tsinghua Shenzhen International Graduate School, Tsinghua-Berkeley Shenzhen Institute **Abstract:** This paper introduces ART3D, a novel framework for generating 3D artistic scenes using text descriptions. ART3D combines diffusion models and 3D Gaussian splatting techniques to bridge the gap between artistic and realistic images. The method leverages depth information and initial artistic images to generate a point cloud map, addressing domain differences. A depth consistency module enhances 3D scene consistency. Experimental results demonstrate superior performance in content and structural consistency metrics compared to existing methods, advancing the field of AI-driven art creation. **Introduction:** The paper explores challenges in 3D artistic scene generation and proposes ART3D to address these issues. ART3D uses Stable Diffusion models and 3D Gaussian splatting to generate high-quality 3D artistic scenes from textual inputs. An image semantic transfer algorithm aligns semantic layouts between artistic and realistic images, while a depth consistency module ensures consistent depth information across multiple views. **Method:** ART3D consists of four key components: 1. **Image Semantic Transfer:** Enhances depth information from artistic images using Stable Diffusion models. 2. **Point Cloud Map:** Updates a point cloud map by projecting depth information onto 3D space and reprojecting images. 3. **Depth Consistency Module:** Ensures consistent depth information across different views. 4. **3D Gaussian Splatting:** Renders high-quality 3D artistic scenes using optimized Gaussian splats. **Experiments:** - **Experiment Setup:** Details of the implementation, including the Stable Diffusion model, ZoeDepth depth estimator, and DCM training parameters. - **Evaluation Metrics:** CLIP-I and CLIP-T scores for image similarity and textual description alignment. - **Qualitative Results:** Demonstrates the effectiveness of ART3D in generating consistent and diverse 3D artistic scenes. - **Quantitative Results:** Averaged results show superior performance in structural and content consistency metrics. **Ablation Studies:** - **Image Semantic Transfer:** Shows improved depth estimation and point cloud generation. - **Point Cloud Map:** Enhances 3D Gaussian splatting initialization and reconstruction speed. - **Depth Consistency Module:** Ensures coherent 3D scenes with accurate depth alignment. **Conclusion:** ART3D advances AI-driven 3D art creation by effectively addressing domain gaps and global scene consistency, providing a novel solution for generating high-quality 3D artistic scenes from textual descriptions.**ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation** **Authors:** Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li **Institution:** Tsinghua Shenzhen International Graduate School, Tsinghua-Berkeley Shenzhen Institute **Abstract:** This paper introduces ART3D, a novel framework for generating 3D artistic scenes using text descriptions. ART3D combines diffusion models and 3D Gaussian splatting techniques to bridge the gap between artistic and realistic images. The method leverages depth information and initial artistic images to generate a point cloud map, addressing domain differences. A depth consistency module enhances 3D scene consistency. Experimental results demonstrate superior performance in content and structural consistency metrics compared to existing methods, advancing the field of AI-driven art creation. **Introduction:** The paper explores challenges in 3D artistic scene generation and proposes ART3D to address these issues. ART3D uses Stable Diffusion models and 3D Gaussian splatting to generate high-quality 3D artistic scenes from textual inputs. An image semantic transfer algorithm aligns semantic layouts between artistic and realistic images, while a depth consistency module ensures consistent depth information across multiple views. **Method:** ART3D consists of four key components: 1. **Image Semantic Transfer:** Enhances depth information from artistic images using Stable Diffusion models. 2. **Point Cloud Map:** Updates a point cloud map by projecting depth information onto 3D space and reprojecting images. 3. **Depth Consistency Module:** Ensures consistent depth information across different views. 4. **3D Gaussian Splatting:** Renders high-quality 3D artistic scenes using optimized Gaussian splats. **Experiments:** - **Experiment Setup:** Details of the implementation, including the Stable Diffusion model, ZoeDepth depth estimator, and DCM training parameters. - **Evaluation Metrics:** CLIP-I and CLIP-T scores for image similarity and textual description alignment. - **Qualitative Results:** Demonstrates the effectiveness of ART3D in generating consistent and diverse 3D artistic scenes. - **Quantitative Results:** Averaged results show superior performance in structural and content consistency metrics. **Ablation Studies:** - **Image Semantic Transfer:** Shows improved depth estimation and point cloud generation. - **Point Cloud Map:** Enhances 3D Gaussian splatting initialization and reconstruction speed. - **Depth Consistency Module:** Ensures coherent 3D scenes with accurate depth alignment. **Conclusion:** ART3D advances AI-driven 3D art creation by effectively addressing domain gaps and global scene consistency, providing a novel solution for generating high-quality 3D artistic scenes from textual descriptions.

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

17 May 2024 | Pengzhi Li1, Chengshuai Tang1, Qinxuan Huang2, Zhiheng Li1†