SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

2 Mar 2024 | Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi
SceneCraft is an LLM-powered agent designed to convert text descriptions into executable Blender scripts, generating complex 3D scenes with up to a hundred assets. The process involves advanced abstraction, strategic planning, and library learning. SceneCraft first models the scene graph as a blueprint, detailing spatial relationships among assets. It then writes Python scripts based on this graph, translating relationships into numerical constraints for asset layout. The agent leverages GPT-V to analyze rendered images and iteratively refine the scene. Additionally, SceneCraft features a library learning mechanism that compiles common script functions into a reusable library, enabling continuous self-improvement without expensive LLM parameter tuning. Evaluations show that SceneCraft outperforms existing LLM-based agents in rendering complex scenes, as demonstrated by its adherence to constraints and favorable human assessments. The paper also showcases the broader application potential of SceneCraft by reconstructing detailed 3D scenes from the Sintel movie and guiding a video generative model with generated scenes as control signals.SceneCraft is an LLM-powered agent designed to convert text descriptions into executable Blender scripts, generating complex 3D scenes with up to a hundred assets. The process involves advanced abstraction, strategic planning, and library learning. SceneCraft first models the scene graph as a blueprint, detailing spatial relationships among assets. It then writes Python scripts based on this graph, translating relationships into numerical constraints for asset layout. The agent leverages GPT-V to analyze rendered images and iteratively refine the scene. Additionally, SceneCraft features a library learning mechanism that compiles common script functions into a reusable library, enabling continuous self-improvement without expensive LLM parameter tuning. Evaluations show that SceneCraft outperforms existing LLM-based agents in rendering complex scenes, as demonstrated by its adherence to constraints and favorable human assessments. The paper also showcases the broader application potential of SceneCraft by reconstructing detailed 3D scenes from the Sintel movie and guiding a video generative model with generated scenes as control signals.
Reach us at info@study.space