L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

14 Feb 2024 | Yutaro Yamada, Khyathi Chandu, Yuchen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi
The paper introduces L3GO, a language agent designed to generate 3D objects from text instructions using a chain-of-thought approach. L3GO addresses the limitations of current diffusion-based models, which struggle with precise spatial reasoning for unconventional objects. The agent uses large language models to compose objects through trial and error within a 3D simulation environment, specifically developed for this purpose. The authors also introduce the Unconventionally Feasible Objects (UFO) benchmark, which tests the ability of models to create objects with specific and unconventional attributes. Human and automated evaluations show that L3GO outperforms other state-of-the-art models, including GPT-4, ReAct, and Reflexion, in generating 3D meshes on the ShapeNet dataset. Additionally, L3GO surpasses other text-to-2D image and text-to-3D models on the UFO benchmark, demonstrating its effectiveness in handling complex and unconventional spatial configurations. The research highlights the potential of integrating language models with 3D modeling to enhance the capabilities of generative AI tools, particularly in design, engineering, and creative fields.The paper introduces L3GO, a language agent designed to generate 3D objects from text instructions using a chain-of-thought approach. L3GO addresses the limitations of current diffusion-based models, which struggle with precise spatial reasoning for unconventional objects. The agent uses large language models to compose objects through trial and error within a 3D simulation environment, specifically developed for this purpose. The authors also introduce the Unconventionally Feasible Objects (UFO) benchmark, which tests the ability of models to create objects with specific and unconventional attributes. Human and automated evaluations show that L3GO outperforms other state-of-the-art models, including GPT-4, ReAct, and Reflexion, in generating 3D meshes on the ShapeNet dataset. Additionally, L3GO surpasses other text-to-2D image and text-to-3D models on the UFO benchmark, demonstrating its effectiveness in handling complex and unconventional spatial configurations. The research highlights the potential of integrating language models with 3D modeling to enhance the capabilities of generative AI tools, particularly in design, engineering, and creative fields.
Reach us at info@study.space
[slides and audio] L3GO%3A Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects