17 Mar 2024 | Jumin Lee, Sebin Lee, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon
**SemCity: Semantic Scene Generation with Triplane Diffusion**
**Abstract:**
We introduce SemCity, a 3D diffusion model designed for generating semantic scenes in real-world outdoor environments. Unlike existing models that focus on synthetic indoor or outdoor scenes, SemCity addresses the challenge of generating scenes from real-world outdoor datasets, which often contain more empty spaces due to sensor limitations. To tackle this, we employ a triplane representation, which factorizes 3D data into three orthogonal 2D planes, effectively capturing the vastness of outdoor environments while reducing unnecessary information. Our model learns to generate novel triplanes, which are then used to reconstruct 3D scenes. We extend SemCity to various practical tasks, including scene inpainting, outpainting, and semantic scene completion refinement. Experimental results on the SemanticKITTI dataset demonstrate the effectiveness of our method in generating detailed and coherent 3D scenes, outperforming existing methods in terms of fidelity and diversity. Our code is available at <https://github.com/zoomin-lee/SemCity>.
**Contributions:**
- We introduce the use of triplane representation for generating real-world outdoor scenes.
- We propose a triplane manipulation method to extend our model to practical tasks such as scene inpainting, outpainting, and semantic scene completion refinement.
- Our method significantly enhances the quality of generated scenes in real-world outdoor environments.
**Related Work:**
- Diffusion models have shown promise in generating realistic 3D scenes, but most work focuses on single objects or synthetic datasets.
- Previous scene generation models for outdoor scenes struggle with the complexity and sparsity of real-world data.
**Method:**
- **Triplane Representation:** Our model encodes 3D scenes into triplanes, which are then used to generate new scenes.
- **Triplane Diffusion Model:** We train a diffusion model to generate novel triplanes, which are used to reconstruct 3D scenes.
- **Applications:** We extend our model to scene inpainting, outpainting, and semantic scene completion refinement, demonstrating its versatility and effectiveness.
**Experiments:**
- **Dataset:** We validate our method on the SemanticKITTI and CarlaSC datasets.
- **Evaluation Metrics:** We use recall, precision, inception score, FID, and KID to evaluate the diversity and fidelity of generated scenes.
- **Results:** Our method outperforms existing methods in generating detailed and coherent 3D scenes, with improvements in both fidelity and diversity.
**Conclusion:**
SemCity is a novel 3D diffusion model for generating semantic scenes in real-world outdoor environments. By leveraging triplane representation, our model effectively addresses the challenges of generating complex and realistic scenes from sparse and incomplete data. Our method demonstrates significant progress in scene generation and has potential applications in various downstream tasks.**SemCity: Semantic Scene Generation with Triplane Diffusion**
**Abstract:**
We introduce SemCity, a 3D diffusion model designed for generating semantic scenes in real-world outdoor environments. Unlike existing models that focus on synthetic indoor or outdoor scenes, SemCity addresses the challenge of generating scenes from real-world outdoor datasets, which often contain more empty spaces due to sensor limitations. To tackle this, we employ a triplane representation, which factorizes 3D data into three orthogonal 2D planes, effectively capturing the vastness of outdoor environments while reducing unnecessary information. Our model learns to generate novel triplanes, which are then used to reconstruct 3D scenes. We extend SemCity to various practical tasks, including scene inpainting, outpainting, and semantic scene completion refinement. Experimental results on the SemanticKITTI dataset demonstrate the effectiveness of our method in generating detailed and coherent 3D scenes, outperforming existing methods in terms of fidelity and diversity. Our code is available at <https://github.com/zoomin-lee/SemCity>.
**Contributions:**
- We introduce the use of triplane representation for generating real-world outdoor scenes.
- We propose a triplane manipulation method to extend our model to practical tasks such as scene inpainting, outpainting, and semantic scene completion refinement.
- Our method significantly enhances the quality of generated scenes in real-world outdoor environments.
**Related Work:**
- Diffusion models have shown promise in generating realistic 3D scenes, but most work focuses on single objects or synthetic datasets.
- Previous scene generation models for outdoor scenes struggle with the complexity and sparsity of real-world data.
**Method:**
- **Triplane Representation:** Our model encodes 3D scenes into triplanes, which are then used to generate new scenes.
- **Triplane Diffusion Model:** We train a diffusion model to generate novel triplanes, which are used to reconstruct 3D scenes.
- **Applications:** We extend our model to scene inpainting, outpainting, and semantic scene completion refinement, demonstrating its versatility and effectiveness.
**Experiments:**
- **Dataset:** We validate our method on the SemanticKITTI and CarlaSC datasets.
- **Evaluation Metrics:** We use recall, precision, inception score, FID, and KID to evaluate the diversity and fidelity of generated scenes.
- **Results:** Our method outperforms existing methods in generating detailed and coherent 3D scenes, with improvements in both fidelity and diversity.
**Conclusion:**
SemCity is a novel 3D diffusion model for generating semantic scenes in real-world outdoor environments. By leveraging triplane representation, our model effectively addresses the challenges of generating complex and realistic scenes from sparse and incomplete data. Our method demonstrates significant progress in scene generation and has potential applications in various downstream tasks.