Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

5 Feb 2024 | RIO AGUINA-KANG*, UC San Diego, USA; MAXIM GUMIN*, Brown University, USA; DO HEON HAN*, Brown University, USA; STEWART MORRIS*, Brown University, USA; SEUNG JEAN YOO*, Brown University, USA; ADITYA GANESHAN, Brown University, USA; R. KENNY JONES, Brown University, USA; QIUHONG ANNA WEI, Brown University, USA; KAILIANG FU, Dymaxion, LLC, USA; DANIEL RITCHIE, Brown University, USA
The paper presents a system for generating 3D indoor scenes in response to open-ended text prompts, without being restricted to a fixed set of room types or object categories. Unlike previous methods that require large datasets of existing 3D scenes, this system leverages pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language, which describe objects and their spatial relations. These programs are then executed using a gradient-based optimization scheme to produce object positions and orientations. The system also uses vision-language models (VLMs) to retrieve 3D meshes from uncurated, inconsistently aligned databases, ensuring high-quality object geometry. Experimental results show that the system outperforms both closed-universe scene generation methods and a recent LLM-based layout generation method in generating diverse and realistic indoor scenes. The contributions of the paper include a declarative domain-specific language for specifying indoor scene layouts, a robust prompting workflow leveraging LLMs, and a pipeline for retrieving and orienting 3D meshes from large, uncurated databases.The paper presents a system for generating 3D indoor scenes in response to open-ended text prompts, without being restricted to a fixed set of room types or object categories. Unlike previous methods that require large datasets of existing 3D scenes, this system leverages pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language, which describe objects and their spatial relations. These programs are then executed using a gradient-based optimization scheme to produce object positions and orientations. The system also uses vision-language models (VLMs) to retrieve 3D meshes from uncurated, inconsistently aligned databases, ensuring high-quality object geometry. Experimental results show that the system outperforms both closed-universe scene generation methods and a recent LLM-based layout generation method in generating diverse and realistic indoor scenes. The contributions of the paper include a declarative domain-specific language for specifying indoor scene layouts, a robust prompting workflow leveraging LLMs, and a pipeline for retrieving and orienting 3D meshes from large, uncurated databases.
Reach us at info@study.space