25 Mar 2024 | Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi
TACO is a large-scale bimanual hand-object manipulation dataset that covers a wide range of tool-action-object combinations in real-world scenarios. It includes 2.5K motion sequences with third-person and egocentric views, precise hand-object 3D meshes, and action labels. The dataset was created using a fully automatic data acquisition pipeline combining multi-view sensing with an optical motion capture system. TACO supports test-time generalization to unseen object geometries and novel behavior triplets and benchmarks various generalizable research topics, such as action recognition, motion forecasting, and cooperative grasp synthesis. The dataset includes 131 <tool category, action label, target object category> triplets across 20 object categories, 196 object instances, and 15 daily actions. TACO provides a comprehensive knowledge base for multi-object cooperation and supports various studies in hand-object interaction. The dataset includes 5.2M video frames incorporating 3rd-person and egocentric views. TACO is designed to support generalizable studies by providing a diverse set of tool-action-object compositions and object geometries. The dataset includes 2.5K motion sequences and 5.2M video frames. TACO is used to benchmark three tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. The dataset provides a rich set of motion sequences that allow for the study of generalizable hand-object interaction. The dataset includes 2.5K motion sequences and 5.2M video frames. TACO is used to benchmark three tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. The dataset provides a rich set of motion sequences that allow for the study of generalizable hand-object interaction.TACO is a large-scale bimanual hand-object manipulation dataset that covers a wide range of tool-action-object combinations in real-world scenarios. It includes 2.5K motion sequences with third-person and egocentric views, precise hand-object 3D meshes, and action labels. The dataset was created using a fully automatic data acquisition pipeline combining multi-view sensing with an optical motion capture system. TACO supports test-time generalization to unseen object geometries and novel behavior triplets and benchmarks various generalizable research topics, such as action recognition, motion forecasting, and cooperative grasp synthesis. The dataset includes 131 <tool category, action label, target object category> triplets across 20 object categories, 196 object instances, and 15 daily actions. TACO provides a comprehensive knowledge base for multi-object cooperation and supports various studies in hand-object interaction. The dataset includes 5.2M video frames incorporating 3rd-person and egocentric views. TACO is designed to support generalizable studies by providing a diverse set of tool-action-object compositions and object geometries. The dataset includes 2.5K motion sequences and 5.2M video frames. TACO is used to benchmark three tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. The dataset provides a rich set of motion sequences that allow for the study of generalizable hand-object interaction. The dataset includes 2.5K motion sequences and 5.2M video frames. TACO is used to benchmark three tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. The dataset provides a rich set of motion sequences that allow for the study of generalizable hand-object interaction.