14 Jun 2024 | Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
**WonderWorld: Interactive 3D Scene Generation from a Single Image**
**Authors:** Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
**Institution:** Stanford University, MIT
**Abstract:**
WonderWorld is a novel framework for interactive 3D scene extrapolation, enabling users to explore and shape virtual environments based on a single input image and user-specified text. Unlike existing methods that are offline and take tens to hours to generate a scene, WonderWorld significantly reduces computational time by leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method. This framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, allowing real-time user interaction and exploration. The approach addresses the limitations of existing methods by improving scene generation speed and reducing geometric distortion, making it suitable for applications in virtual reality, gaming, and creative design.
**Key Contributions:**
1. **Fast Gaussian Surfels (FGS):** A lightweight representation of 3D scenes that allows for principled geometry-based initialization, reducing optimization time to less than 1 second.
2. **Layer-wise Scene Generation:** A strategy to fill disocclusion holes in generated scenes by parsing the geometric layer structure and generating content to fill these regions.
3. **Guided Depth Diffusion:** A method to ensure geometric consistency between extrapolated and existing scenes by conditioning depth estimation on observed depth and new scene geometry.
**Methods:**
- **Fast Gaussian Surfels:** Introduces a lightweight version of 3D Gaussian Splatting (3DGS) with principled geometry-based initialization.
- **Layer-wise Scene Generation:** Parses the scene into layers and fills disocclusion holes by generating content for each layer.
- **Guided Depth Diffusion:** Uses a diffusion model to sample from a conditional depth distribution, ensuring geometric consistency during extrapolation.
**Results:**
- **Qualitative Examples:** Demonstrates diverse and coherent 3D scenes generated from a single input image.
- **Generation Speed:** Reports scene generation times significantly faster than existing methods, enabling interactive exploration.
**Conclusion:**
WonderWorld represents a significant advancement in interactive 3D scene generation, offering real-time interaction and high-quality, diverse scenes. The framework is open-source and will be released for reproducibility.**WonderWorld: Interactive 3D Scene Generation from a Single Image**
**Authors:** Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
**Institution:** Stanford University, MIT
**Abstract:**
WonderWorld is a novel framework for interactive 3D scene extrapolation, enabling users to explore and shape virtual environments based on a single input image and user-specified text. Unlike existing methods that are offline and take tens to hours to generate a scene, WonderWorld significantly reduces computational time by leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method. This framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, allowing real-time user interaction and exploration. The approach addresses the limitations of existing methods by improving scene generation speed and reducing geometric distortion, making it suitable for applications in virtual reality, gaming, and creative design.
**Key Contributions:**
1. **Fast Gaussian Surfels (FGS):** A lightweight representation of 3D scenes that allows for principled geometry-based initialization, reducing optimization time to less than 1 second.
2. **Layer-wise Scene Generation:** A strategy to fill disocclusion holes in generated scenes by parsing the geometric layer structure and generating content to fill these regions.
3. **Guided Depth Diffusion:** A method to ensure geometric consistency between extrapolated and existing scenes by conditioning depth estimation on observed depth and new scene geometry.
**Methods:**
- **Fast Gaussian Surfels:** Introduces a lightweight version of 3D Gaussian Splatting (3DGS) with principled geometry-based initialization.
- **Layer-wise Scene Generation:** Parses the scene into layers and fills disocclusion holes by generating content for each layer.
- **Guided Depth Diffusion:** Uses a diffusion model to sample from a conditional depth distribution, ensuring geometric consistency during extrapolation.
**Results:**
- **Qualitative Examples:** Demonstrates diverse and coherent 3D scenes generated from a single input image.
- **Generation Speed:** Reports scene generation times significantly faster than existing methods, enabling interactive exploration.
**Conclusion:**
WonderWorld represents a significant advancement in interactive 3D scene generation, offering real-time interaction and high-quality, diverse scenes. The framework is open-source and will be released for reproducibility.