GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

2024 | Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
GALA3D is a text-to-3D generation framework that uses layout-guided generative Gaussian splatting to create high-quality, complex 3D scenes. The framework leverages large language models (LLMs) to generate initial layouts and then uses a layout-guided 3D Gaussian representation to produce scenes with accurate geometric, texture, and interaction details. It introduces an instance-scene compositional optimization mechanism with diffusion priors to collaboratively generate realistic 3D scenes while adjusting layout priors to align with the generated scene. The method also includes a layout refinement module to improve layout alignment with the generated scene. GALA3D outperforms existing methods in text-to-3D generation by providing a user-friendly, end-to-end framework for high-quality scene-level 3D content generation and controllable editing. The framework uses adaptive geometry control to optimize the shape and distribution of Gaussians for high-quality geometry and integrates a compositional optimization strategy with diffusion priors to ensure semantic and spatial consistency among multiple objects. The method also addresses misalignment between layouts generated by LLMs and the generated scene by iteratively optimizing the spatial position and scale of the layouts. The framework's contributions include introducing a scene-level text-to-3D framework based on generative 3D Gaussian splatting, bridging text description and compositional scene generation through layout priors obtained from LLMs and a layout refinement module, and introducing a layout-guided Gaussian representation with adaptive geometry control to model complex 3D scenes. The method also provides a user-friendly, end-to-end framework for high-quality complex 3D content generation and controllable editing conversationally. Experiments show that GALA3D achieves impressive results on compositional text-to-3D scene generation while ensuring high fidelity of object-level entities within the scene. The method is evaluated against state-of-the-art approaches, including NeRF-based methods, voxel-based methods, 3DGS-based methods, and compositional NeRF-based generation with layout. The results demonstrate that GALA3D excels in generating complex 3D scenes with multiple interacting objects, achieving outstanding texture and geometry. The method also facilitates interactive and controllable scene editing, achieving an efficient and user-friendly 3D scene generation and editing framework.GALA3D is a text-to-3D generation framework that uses layout-guided generative Gaussian splatting to create high-quality, complex 3D scenes. The framework leverages large language models (LLMs) to generate initial layouts and then uses a layout-guided 3D Gaussian representation to produce scenes with accurate geometric, texture, and interaction details. It introduces an instance-scene compositional optimization mechanism with diffusion priors to collaboratively generate realistic 3D scenes while adjusting layout priors to align with the generated scene. The method also includes a layout refinement module to improve layout alignment with the generated scene. GALA3D outperforms existing methods in text-to-3D generation by providing a user-friendly, end-to-end framework for high-quality scene-level 3D content generation and controllable editing. The framework uses adaptive geometry control to optimize the shape and distribution of Gaussians for high-quality geometry and integrates a compositional optimization strategy with diffusion priors to ensure semantic and spatial consistency among multiple objects. The method also addresses misalignment between layouts generated by LLMs and the generated scene by iteratively optimizing the spatial position and scale of the layouts. The framework's contributions include introducing a scene-level text-to-3D framework based on generative 3D Gaussian splatting, bridging text description and compositional scene generation through layout priors obtained from LLMs and a layout refinement module, and introducing a layout-guided Gaussian representation with adaptive geometry control to model complex 3D scenes. The method also provides a user-friendly, end-to-end framework for high-quality complex 3D content generation and controllable editing conversationally. Experiments show that GALA3D achieves impressive results on compositional text-to-3D scene generation while ensuring high fidelity of object-level entities within the scene. The method is evaluated against state-of-the-art approaches, including NeRF-based methods, voxel-based methods, 3DGS-based methods, and compositional NeRF-based generation with layout. The results demonstrate that GALA3D excels in generating complex 3D scenes with multiple interacting objects, achieving outstanding texture and geometry. The method also facilitates interactive and controllable scene editing, achieving an efficient and user-friendly 3D scene generation and editing framework.
Reach us at info@study.space
[slides and audio] GALA3D%3A Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting