7 Jun 2024 | Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Mingyan Gao, Qixuan Huang, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang
CityCraft is an innovative framework for generating realistic 3D city scenes. It integrates a diffusion transformer (DiT) for 2D city layout generation, a large language model (LLM) for strategic urban planning, and Blender for precise asset placement and scene construction. The framework enhances diversity and quality in urban scene generation by incorporating user prompts and language guidelines. CityCraft introduces two new datasets: CityCraft-OSM, which includes 2D semantic layouts, satellite images, and detailed annotations, and CityCraft-Buildings, featuring thousands of high-quality 3D building assets. The framework achieves state-of-the-art performance in generating realistic 3D cities. CityCraft addresses challenges in traditional methods by providing explainable, logical, and controllable planning, enhancing the process and resulting in more adaptable, efficient, and visually appealing cityscapes. The framework also includes an infinite expansion feature to generate large-scale city layouts. CityCraft demonstrates superior performance in both layout and scene generation, with lower Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores compared to other methods. It also shows superior architectural diversity and realism in generated scenes. The framework is evaluated on various metrics, including depth error and camera error, and shows no errors in depth or camera placement. CityCraft's user preference score is high, indicating user satisfaction with the realism and technical accuracy of the generated scenes. The framework is also tested for its ability to adapt to various user settings, proving its robustness across different scenes. The framework is expected to expand its capabilities to include dynamic elements such as moving traffic and integrate real-time user feedback mechanisms for interactive scene customization.CityCraft is an innovative framework for generating realistic 3D city scenes. It integrates a diffusion transformer (DiT) for 2D city layout generation, a large language model (LLM) for strategic urban planning, and Blender for precise asset placement and scene construction. The framework enhances diversity and quality in urban scene generation by incorporating user prompts and language guidelines. CityCraft introduces two new datasets: CityCraft-OSM, which includes 2D semantic layouts, satellite images, and detailed annotations, and CityCraft-Buildings, featuring thousands of high-quality 3D building assets. The framework achieves state-of-the-art performance in generating realistic 3D cities. CityCraft addresses challenges in traditional methods by providing explainable, logical, and controllable planning, enhancing the process and resulting in more adaptable, efficient, and visually appealing cityscapes. The framework also includes an infinite expansion feature to generate large-scale city layouts. CityCraft demonstrates superior performance in both layout and scene generation, with lower Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores compared to other methods. It also shows superior architectural diversity and realism in generated scenes. The framework is evaluated on various metrics, including depth error and camera error, and shows no errors in depth or camera placement. CityCraft's user preference score is high, indicating user satisfaction with the realism and technical accuracy of the generated scenes. The framework is also tested for its ability to adapt to various user settings, proving its robustness across different scenes. The framework is expected to expand its capabilities to include dynamic elements such as moving traffic and integrate real-time user feedback mechanisms for interactive scene customization.