Understanding DreamScene360%3A Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

**DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting** **Authors:** Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, and Achuta Kadambi **Institutional Affiliations:** University of California, Los Angeles (UCLA), University of Texas at Austin (UT Austin), and DEVCOM Army Research Laboratory **Project Website:** <http://dreamscene360.github.io/> **Abstract:** DreamScene360 is a novel framework for generating high-quality, immersive 3D scenes with complete 360° coverage from text inputs. The method leverages a 2D diffusion model to generate panoramic images, which are then refined using a self-refinement process involving GPT-4V for improved visual quality and text-image alignment. The panoramic image serves as a preliminary "flat" scene representation, which is lifted into 3D Gaussians using splatting techniques. A spatially coherent structure is constructed by aligning monocular depth information into a globally optimized point cloud, serving as the initial state for the centroids of 3D Gaussians. Semantic and geometric constraints are imposed on both synthesized and input camera views to address issues in single-view inputs, ensuring consistent 3D geometry. This approach offers a globally consistent 3D scene, enhancing the immersive experience over existing techniques. **Introduction:** The increasing demand for virtual reality applications has highlighted the need for efficient and high-quality 3D scene generation. DreamScene360 addresses the challenges of creating holistic 360° scenes by utilizing a 2D diffusion model to generate panoramic images and a self-refinement process to enhance the quality and alignment of the generated scenes. The method initializes 3D Gaussians with monocular depth information and optimizes them using semantic and geometric constraints, ensuring consistent and detailed 3D representations. This framework enables the creation of immersive and realistic 3D environments from simple text prompts, reducing the reliance on manual effort. **Related Works:** The paper discusses the state-of-the-art in 2D asset generation and text-to-3D scene generation, highlighting the limitations of previous methods in handling unconstrained 360° scenes. It compares DreamScene360 with state-of-the-art methods like LucidDreamer, demonstrating superior performance in terms of global consistency and visual quality. **Experiments:** The experiments evaluate the effectiveness of DreamScene360 through various metrics, including CLIP embedding distance, non-reference image quality assessment, and perceptual quality. The results show that DreamScene360 generates diverse and high-fidelity 3D scenes with complete 360° coverage, outperforming baseline methods in terms of novel-view rendering ability and realistic scene geometry. ****DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting** **Authors:** Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, and Achuta Kadambi **Institutional Affiliations:** University of California, Los Angeles (UCLA), University of Texas at Austin (UT Austin), and DEVCOM Army Research Laboratory **Project Website:** <http://dreamscene360.github.io/> **Abstract:** DreamScene360 is a novel framework for generating high-quality, immersive 3D scenes with complete 360° coverage from text inputs. The method leverages a 2D diffusion model to generate panoramic images, which are then refined using a self-refinement process involving GPT-4V for improved visual quality and text-image alignment. The panoramic image serves as a preliminary "flat" scene representation, which is lifted into 3D Gaussians using splatting techniques. A spatially coherent structure is constructed by aligning monocular depth information into a globally optimized point cloud, serving as the initial state for the centroids of 3D Gaussians. Semantic and geometric constraints are imposed on both synthesized and input camera views to address issues in single-view inputs, ensuring consistent 3D geometry. This approach offers a globally consistent 3D scene, enhancing the immersive experience over existing techniques. **Introduction:** The increasing demand for virtual reality applications has highlighted the need for efficient and high-quality 3D scene generation. DreamScene360 addresses the challenges of creating holistic 360° scenes by utilizing a 2D diffusion model to generate panoramic images and a self-refinement process to enhance the quality and alignment of the generated scenes. The method initializes 3D Gaussians with monocular depth information and optimizes them using semantic and geometric constraints, ensuring consistent and detailed 3D representations. This framework enables the creation of immersive and realistic 3D environments from simple text prompts, reducing the reliance on manual effort. **Related Works:** The paper discusses the state-of-the-art in 2D asset generation and text-to-3D scene generation, highlighting the limitations of previous methods in handling unconstrained 360° scenes. It compares DreamScene360 with state-of-the-art methods like LucidDreamer, demonstrating superior performance in terms of global consistency and visual quality. **Experiments:** The experiments evaluate the effectiveness of DreamScene360 through various metrics, including CLIP embedding distance, non-reference image quality assessment, and perceptual quality. The results show that DreamScene360 generates diverse and high-fidelity 3D scenes with complete 360° coverage, outperforming baseline methods in terms of novel-view rendering ability and realistic scene geometry. **

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

25 Jul 2024 | Shijie Zhou1*, Zhiwen Fan2*, Dejia Xu2*, Haoran Chang1, Pradyumna Chari1*, Tejas Bharadwaj1*, Suya You3*, Zhangyang Wang2*, and Achuta Kadambi1*

25 Jul 2024 | Shijie Zhou1, Zhiwen Fan2, Dejia Xu2, Haoran Chang1, Pradyumna Chari1, Tejas Bharadwaj1, Suya You3, Zhangyang Wang2, and Achuta Kadambi1