28 Mar 2024 | Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu
OAKInK2 is a dataset designed for bimanual object manipulation tasks in complex daily activities. It introduces a three-level abstraction framework to structure and understand these tasks: Affordance, Primitive Task, and Complex Task. The dataset provides multi-view image streams and precise pose annotations for human body, hands, and objects, supporting applications such as interaction reconstruction and motion synthesis. OAKInK2 includes 627 sequences of real-world bimanual manipulation, with 264 sequences for Complex Tasks. The dataset is constructed through a detailed process that involves task initialization, object affordance analysis, primitive task design, and complex task decomposition. It also includes a task-oriented framework, CTC, for complex task and motion planning, which consists of a Large Language Model (LLM)-based task interpreter and a diffusion-based motion generator. The dataset's versatility and task-driven nature make it suitable for a wide range of applications, particularly in complex task completion (CTC).OAKInK2 is a dataset designed for bimanual object manipulation tasks in complex daily activities. It introduces a three-level abstraction framework to structure and understand these tasks: Affordance, Primitive Task, and Complex Task. The dataset provides multi-view image streams and precise pose annotations for human body, hands, and objects, supporting applications such as interaction reconstruction and motion synthesis. OAKInK2 includes 627 sequences of real-world bimanual manipulation, with 264 sequences for Complex Tasks. The dataset is constructed through a detailed process that involves task initialization, object affordance analysis, primitive task design, and complex task decomposition. It also includes a task-oriented framework, CTC, for complex task and motion planning, which consists of a Large Language Model (LLM)-based task interpreter and a diffusion-based motion generator. The dataset's versatility and task-driven nature make it suitable for a wide range of applications, particularly in complex task completion (CTC).