13 Jun 2024 | Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan
**HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking**
**Introduction:**
HOT3D is a publicly available dataset designed for egocentric hand and object tracking in 3D. It includes over 833 minutes of multi-view RGB/monochrome image streams, featuring 19 subjects interacting with 33 diverse rigid objects. The dataset offers multi-modal signals such as eye gaze and scene point clouds, along with comprehensive ground-truth annotations, including 3D poses of objects, hands, and cameras. Recorded using Meta's Project Aria and Quest 3 devices, the dataset provides high-quality 3D models of hands and objects, enabling realistic training images.
**Dataset Details:**
- **Recordings:** Over 1.5M multi-view frames (3.7M images) from Project Aria and Quest 3.
- **Subjects:** 19 diverse participants with different hand shapes and nationalities.
- **Objects:** 33 objects with high-resolution 3D models and PBR materials.
- **Scenarios:** Typical actions in kitchen, office, and living room environments.
- **Annotations:** Per-frame ground-truth poses of hands and objects, available for the training split.
**Additional Features:**
- **Curated Clips:** 4117 clips for benchmarking tracking and pose estimation methods.
- **Object-Onboarding Sequences:** Two types of sequences for model-free object tracking and 3D object reconstruction.
- **Public Challenges:** Co-organized with ECCV 2024, focusing on BOP Challenge 2024 and Hand Tracking Challenge 2024.
**References:**
The dataset and its collection process are detailed in the paper, with references to related work and technical documentation available on the project website.**HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking**
**Introduction:**
HOT3D is a publicly available dataset designed for egocentric hand and object tracking in 3D. It includes over 833 minutes of multi-view RGB/monochrome image streams, featuring 19 subjects interacting with 33 diverse rigid objects. The dataset offers multi-modal signals such as eye gaze and scene point clouds, along with comprehensive ground-truth annotations, including 3D poses of objects, hands, and cameras. Recorded using Meta's Project Aria and Quest 3 devices, the dataset provides high-quality 3D models of hands and objects, enabling realistic training images.
**Dataset Details:**
- **Recordings:** Over 1.5M multi-view frames (3.7M images) from Project Aria and Quest 3.
- **Subjects:** 19 diverse participants with different hand shapes and nationalities.
- **Objects:** 33 objects with high-resolution 3D models and PBR materials.
- **Scenarios:** Typical actions in kitchen, office, and living room environments.
- **Annotations:** Per-frame ground-truth poses of hands and objects, available for the training split.
**Additional Features:**
- **Curated Clips:** 4117 clips for benchmarking tracking and pose estimation methods.
- **Object-Onboarding Sequences:** Two types of sequences for model-free object tracking and 3D object reconstruction.
- **Public Challenges:** Co-organized with ECCV 2024, focusing on BOP Challenge 2024 and Hand Tracking Challenge 2024.
**References:**
The dataset and its collection process are detailed in the paper, with references to related work and technical documentation available on the project website.