[slides] Human-Object Interaction from Human-Level Instructions

The paper "Human-Object Interaction from Human-Level Instructions" by Zhen Wu, Jiaman Li, and C. Karen Liu from Stanford University addresses the challenge of synthesizing continuous human-object interactions in contextual environments, guided by human-level instructions. The authors aim to generate synchronized object motion, full-body human motion, and detailed finger motion to achieve realistic interactions. Their framework consists of a large language model (LLM) planning module and a low-level motion generator. The LLM planning module processes human-level instructions to deduce spatial object relationships and determine precise positions and orientations of objects in the scene. It also outlines a detailed task plan specifying a sequence of sub-tasks. The low-level motion generator, trained on datasets like FullBodyManipulation, HumanML3D, and GRAB, synthesizes the actual motion sequences, including object and human movements, and detailed finger motions. The paper highlights the contributions of their system, which is the first complete system capable of synthesizing such interactions from human-level instructions. The authors evaluate their approach using metrics such as waypoints matching and interaction quality, demonstrating the effectiveness of their high-level planner and low-level motion generator in generating plausible target layouts and realistic interactions for diverse objects.The paper "Human-Object Interaction from Human-Level Instructions" by Zhen Wu, Jiaman Li, and C. Karen Liu from Stanford University addresses the challenge of synthesizing continuous human-object interactions in contextual environments, guided by human-level instructions. The authors aim to generate synchronized object motion, full-body human motion, and detailed finger motion to achieve realistic interactions. Their framework consists of a large language model (LLM) planning module and a low-level motion generator. The LLM planning module processes human-level instructions to deduce spatial object relationships and determine precise positions and orientations of objects in the scene. It also outlines a detailed task plan specifying a sequence of sub-tasks. The low-level motion generator, trained on datasets like FullBodyManipulation, HumanML3D, and GRAB, synthesizes the actual motion sequences, including object and human movements, and detailed finger motions. The paper highlights the contributions of their system, which is the first complete system capable of synthesizing such interactions from human-level instructions. The authors evaluate their approach using metrics such as waypoints matching and interaction quality, demonstrating the effectiveness of their high-level planner and low-level motion generator in generating plausible target layouts and realistic interactions for diverse objects.

Human-Object Interaction from Human-Level Instructions

25 Jun 2024 | Zhen Wu, Jiaman Li, C. Karen Liu