28 Mar 2024 | Sirui Xu1†, Ziyin Wang2†, Yu-Xiong Wang1†, and Liang-Yan Gui1†
InterDreamer is a novel framework for generating 3D dynamic human-object interactions (HOIs) guided by textual descriptions, without direct training on text-interaction pair data. The key insight is that interaction semantics and dynamics can be decoupled. High-level semantics, aligned with textual descriptions, are informed by human motion and initial object pose, while low-level dynamics are governed by forces exerted by the human, constrained by physical laws. The framework integrates a large language model (LLM) and a text-to-motion model for high-level planning and low-level control, respectively. A world model predicts object states based on applied actions, ensuring realistic and coherent interactions. Experimental results on the BEHAVE and CHAIRS datasets demonstrate InterDreamer's capability to generate semantically aligned and realistic HOI sequences, showcasing its zero-shot learning potential.InterDreamer is a novel framework for generating 3D dynamic human-object interactions (HOIs) guided by textual descriptions, without direct training on text-interaction pair data. The key insight is that interaction semantics and dynamics can be decoupled. High-level semantics, aligned with textual descriptions, are informed by human motion and initial object pose, while low-level dynamics are governed by forces exerted by the human, constrained by physical laws. The framework integrates a large language model (LLM) and a text-to-motion model for high-level planning and low-level control, respectively. A world model predicts object states based on applied actions, ensuring realistic and coherent interactions. Experimental results on the BEHAVE and CHAIRS datasets demonstrate InterDreamer's capability to generate semantically aligned and realistic HOI sequences, showcasing its zero-shot learning potential.