Generating Human Interaction Motions in Scenes with Text Control

Generating Human Interaction Motions in Scenes with Text Control

16 Apr 2024 | Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe
The paper introduces TeSMo, a method for generating realistic and diverse human-scene interactions in 3D scenes using text control. TeSMo combines a scene-agnostic text-to-motion diffusion model pre-trained on large-scale motion capture datasets with a scene-aware component fine-tuned using detailed scene information, including ground plane and object shapes. The method decomposes the task into two components: navigation and interaction. The navigation component generates a pelvis trajectory to reach a goal pose near the interaction object, while the interaction component generates a full-body motion conditioned on the goal pose and detailed 3D object representation. Extensive experiments demonstrate that TeSMo outperforms prior techniques in terms of goal-reaching accuracy, obstacle avoidance, and realism of generated motions. The method also shows superior performance in generating object-specific interactions with fewer object penetrations. The paper includes a detailed evaluation, ablation study, and user study to validate the effectiveness of TeSMo.The paper introduces TeSMo, a method for generating realistic and diverse human-scene interactions in 3D scenes using text control. TeSMo combines a scene-agnostic text-to-motion diffusion model pre-trained on large-scale motion capture datasets with a scene-aware component fine-tuned using detailed scene information, including ground plane and object shapes. The method decomposes the task into two components: navigation and interaction. The navigation component generates a pelvis trajectory to reach a goal pose near the interaction object, while the interaction component generates a full-body motion conditioned on the goal pose and detailed 3D object representation. Extensive experiments demonstrate that TeSMo outperforms prior techniques in terms of goal-reaching accuracy, obstacle avoidance, and realism of generated motions. The method also shows superior performance in generating object-specific interactions with fewer object penetrations. The paper includes a detailed evaluation, ablation study, and user study to validate the effectiveness of TeSMo.
Reach us at info@study.space
Understanding Generating Human Interaction Motions in Scenes with Text Control