Understanding InterDreamer%3A Zero-Shot Text to 3D Dynamic Human-Object Interaction

InterDreamer is a novel framework for zero-shot text-to-3D dynamic human-object interaction generation. The paper introduces a method that decouples interaction semantics from dynamics, allowing the generation of human-object interaction sequences without direct training on text-interaction pair data. The framework leverages pretrained large language models and text-to-motion models to capture high-level semantics, while a world model is used to model low-level dynamics based on simple physics. This approach enables the generation of realistic and coherent interaction sequences that align with text descriptions. The framework is evaluated on the BEHAVE and CHAIRS datasets, demonstrating its ability to generate realistic interactions in a zero-shot setting. The key contributions include the initiation of a new task of synthesizing whole-body interactions with dynamic objects guided by textual commands, the introduction of a framework that decomposes semantics and dynamics, and the use of external knowledge from large language models and text-to-motion models. The framework is shown to be effective in generating realistic human-object interactions and generalizes beyond existing HOI datasets. The paper also discusses related work, methodology, and experimental results, highlighting the effectiveness of the proposed approach in generating realistic and coherent human-object interactions.InterDreamer is a novel framework for zero-shot text-to-3D dynamic human-object interaction generation. The paper introduces a method that decouples interaction semantics from dynamics, allowing the generation of human-object interaction sequences without direct training on text-interaction pair data. The framework leverages pretrained large language models and text-to-motion models to capture high-level semantics, while a world model is used to model low-level dynamics based on simple physics. This approach enables the generation of realistic and coherent interaction sequences that align with text descriptions. The framework is evaluated on the BEHAVE and CHAIRS datasets, demonstrating its ability to generate realistic interactions in a zero-shot setting. The key contributions include the initiation of a new task of synthesizing whole-body interactions with dynamic objects guided by textual commands, the introduction of a framework that decomposes semantics and dynamics, and the use of external knowledge from large language models and text-to-motion models. The framework is shown to be effective in generating realistic human-object interactions and generalizes beyond existing HOI datasets. The paper also discusses related work, methodology, and experimental results, highlighting the effectiveness of the proposed approach in generating realistic and coherent human-object interactions.

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

28 Mar 2024 | Sirui Xu, Ziyin Wang, Yu-Xiong Wang, and Liang-Yan Gui