OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

28 Mar 2024 | Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu
OAKINK2 is a new dataset focusing on bimanual object manipulation tasks for complex daily activities. It introduces three levels of abstraction to organize manipulation tasks: Affordance, Primitive Task, and Complex Task. The dataset provides allocentric and egocentric videos of human manipulation processes, along with corresponding 3D-pose annotations and task specifications. OAKINK2 features an object-centric perspective, treating complex tasks as sequences of object affordance fulfillment. The dataset includes 627 sequences of real-world bimanual manipulation, with 264 sequences for Complex Tasks. These sequences contain 4.01M frames from four different views. The dataset includes four manipulation scenarios, 75 objects, and 9 invited subjects. OAKINK2 supports applications such as interaction reconstruction and motion synthesis. The dataset is available at https://oakink.net/v2. The paper presents a task-oriented framework for Complex Task Completion (CTC), which involves text-based decomposition of complex tasks into sequences of Primitive Tasks and task-aware motion generation for each Primitive Task. The framework uses Large Language Models (LLMs) for task decomposition and a Motion Fulfillment Model for motion generation. The paper also discusses related works in hand-object interaction datasets, manipulation task decomposition, motion synthesis, and foundation models in manipulation tasks. The dataset construction process involves task initialization, object affordance analysis, primitive task design, and complex task decomposition. Data collection and annotation involve multi-camera systems for recording manipulation processes and optical MoCap systems for pose tracking. The dataset includes detailed annotations for object poses, human poses and surfaces, and task execution commentaries. The paper also presents selected applications of OAKINK2, including hand mesh reconstruction, task-aware motion fulfillment, and complex task completion. The dataset is expected to support large-scale language-manipulation pre-training and end-to-end text-to-manipulation generation.OAKINK2 is a new dataset focusing on bimanual object manipulation tasks for complex daily activities. It introduces three levels of abstraction to organize manipulation tasks: Affordance, Primitive Task, and Complex Task. The dataset provides allocentric and egocentric videos of human manipulation processes, along with corresponding 3D-pose annotations and task specifications. OAKINK2 features an object-centric perspective, treating complex tasks as sequences of object affordance fulfillment. The dataset includes 627 sequences of real-world bimanual manipulation, with 264 sequences for Complex Tasks. These sequences contain 4.01M frames from four different views. The dataset includes four manipulation scenarios, 75 objects, and 9 invited subjects. OAKINK2 supports applications such as interaction reconstruction and motion synthesis. The dataset is available at https://oakink.net/v2. The paper presents a task-oriented framework for Complex Task Completion (CTC), which involves text-based decomposition of complex tasks into sequences of Primitive Tasks and task-aware motion generation for each Primitive Task. The framework uses Large Language Models (LLMs) for task decomposition and a Motion Fulfillment Model for motion generation. The paper also discusses related works in hand-object interaction datasets, manipulation task decomposition, motion synthesis, and foundation models in manipulation tasks. The dataset construction process involves task initialization, object affordance analysis, primitive task design, and complex task decomposition. Data collection and annotation involve multi-camera systems for recording manipulation processes and optical MoCap systems for pose tracking. The dataset includes detailed annotations for object poses, human poses and surfaces, and task execution commentaries. The paper also presents selected applications of OAKINK2, including hand mesh reconstruction, task-aware motion fulfillment, and complex task completion. The dataset is expected to support large-scale language-manipulation pre-training and end-to-end text-to-manipulation generation.
Reach us at info@study.space
[slides and audio] OakInk2 %3A A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion