Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

13 Jun 2024 | Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan
HOT3D is a publicly available dataset for egocentric hand and object tracking in 3D. It includes over 833 minutes of multi-view RGB/non-chrome image streams showing 19 subjects interacting with 33 diverse rigid objects, along with multi-modal signals such as eye gaze or scene point clouds, and comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. The dataset contains scenarios resembling typical actions in a kitchen, office, and living room environment. It was recorded using two head-mounted devices from Meta: Project Aria and Quest 3. Ground-truth poses were obtained using a professional motion-capture system with optical markers. Hand annotations are provided in UmeTrack and MANO formats, while objects are represented by 3D mesh models from an in-house scanner. The dataset is intended for training and evaluating methods for model-based and model-free tracking of hands and objects in 3D, and on localized, multi-camera video streams. It also includes curated clips for benchmarking tracking and pose estimation methods. The dataset is used in two public challenges at ECCV 2024: BOP Challenge 2024 and Hand Tracking Challenge 2024. The dataset is available for download from the project website.HOT3D is a publicly available dataset for egocentric hand and object tracking in 3D. It includes over 833 minutes of multi-view RGB/non-chrome image streams showing 19 subjects interacting with 33 diverse rigid objects, along with multi-modal signals such as eye gaze or scene point clouds, and comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. The dataset contains scenarios resembling typical actions in a kitchen, office, and living room environment. It was recorded using two head-mounted devices from Meta: Project Aria and Quest 3. Ground-truth poses were obtained using a professional motion-capture system with optical markers. Hand annotations are provided in UmeTrack and MANO formats, while objects are represented by 3D mesh models from an in-house scanner. The dataset is intended for training and evaluating methods for model-based and model-free tracking of hands and objects in 3D, and on localized, multi-camera video streams. It also includes curated clips for benchmarking tracking and pose estimation methods. The dataset is used in two public challenges at ECCV 2024: BOP Challenge 2024 and Hand Tracking Challenge 2024. The dataset is available for download from the project website.
Reach us at info@study.space