3 Apr 2024 | Ata Çelen¹, Guo Han¹, Konrad Schindler¹, Luc Van Gool¹, Iro Armeni²*, Anton Obukhov¹*, and Xi Wang¹*
I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication. It uses a team of large language model (LLM) agents to engage in dialogues and logical reasoning, transforming textual user input into feasible scene graph designs with relative object relationships. An effective placement algorithm then determines optimal locations for each object within the scene. The final design is constructed in 3D by retrieving and integrating assets from an existing object database. Additionally, a new evaluation protocol utilizing a vision-language model complements the design pipeline. Extensive experiments show that I-Design outperforms existing methods in delivering high-quality 3D design solutions that align with abstract concepts in user input.
The system addresses the challenges of 3D Indoor Scene Synthesis (3DISS) by using multiple LLM agents to interpret abstract input, identify objects to incorporate into the scene, and determine plausible spatial relationships. It employs scene graphs as a high-level abstraction of objects and their relationships, which can be creatively developed with LLMs, refined through rule-based feedback, and visualized. The system also provides an interpretable pipeline, enabling iterative design without redoing the entire process.
I-Design starts with unstructured textual user input and transforms it into a viable design proposal represented as a scene graph through querying LLM agents. It then solves for absolute object placement in the scene graph using a backtracking algorithm, retrieves 3D assets according to functional and stylistic specifications, and composes the final result in 3D. A novel evaluation protocol based on a vision-language model is proposed to evaluate the design pipeline.
The system's contributions include a novel method that takes unstructured, grammar-free natural language input and provides 3D design solutions aligned with user preferences, a new approach to 3DISS through the reasoning and conversation of multiple LLM agents, a procedural scene graph layout transformation, an interpretable pipeline, and a VLM-based evaluation for 3D scenes. The system outperforms existing methods in delivering high-quality 3D design solutions that align with abstract concepts in user input.I-Design is a personalized interior designer that allows users to generate and visualize their design goals through natural language communication. It uses a team of large language model (LLM) agents to engage in dialogues and logical reasoning, transforming textual user input into feasible scene graph designs with relative object relationships. An effective placement algorithm then determines optimal locations for each object within the scene. The final design is constructed in 3D by retrieving and integrating assets from an existing object database. Additionally, a new evaluation protocol utilizing a vision-language model complements the design pipeline. Extensive experiments show that I-Design outperforms existing methods in delivering high-quality 3D design solutions that align with abstract concepts in user input.
The system addresses the challenges of 3D Indoor Scene Synthesis (3DISS) by using multiple LLM agents to interpret abstract input, identify objects to incorporate into the scene, and determine plausible spatial relationships. It employs scene graphs as a high-level abstraction of objects and their relationships, which can be creatively developed with LLMs, refined through rule-based feedback, and visualized. The system also provides an interpretable pipeline, enabling iterative design without redoing the entire process.
I-Design starts with unstructured textual user input and transforms it into a viable design proposal represented as a scene graph through querying LLM agents. It then solves for absolute object placement in the scene graph using a backtracking algorithm, retrieves 3D assets according to functional and stylistic specifications, and composes the final result in 3D. A novel evaluation protocol based on a vision-language model is proposed to evaluate the design pipeline.
The system's contributions include a novel method that takes unstructured, grammar-free natural language input and provides 3D design solutions aligned with user preferences, a new approach to 3DISS through the reasoning and conversation of multiple LLM agents, a procedural scene graph layout transformation, an interpretable pipeline, and a VLM-based evaluation for 3D scenes. The system outperforms existing methods in delivering high-quality 3D design solutions that align with abstract concepts in user input.