20 Sep 2024 | Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, and Richard Newcombe
**Introduction:**
The paper introduces Nymeria, a large-scale, diverse, and richly annotated human motion dataset collected in the wild using multiple multimodal egocentric devices. The dataset includes full-body ground-truth motion, multiple multimodal egocentric data from Project Aria devices ( videos, eye tracking, IMUs, etc.), and a third-person perspective. The devices are synchronized and localized in a single metric 3D world.
**Dataset Details:**
- **Scale and Diversity:** Nymeria captures 300 hours of daily activities from 264 participants across 50 locations, with a total travel distance of over 399 km.
- **Data Modality:** It provides 201.2 million egocentric images, 11.7 billion IMU samples, and 10.8 million gaze point clouds.
- **Language Descriptions:** The dataset includes 310.5K sentences in 8.64 million words, derived from coarse-to-fine motion narrations, atomic actions, and activity summaries.
**Technical Challenges:**
- **Long-term Ground-truth Motion:** Overcoming limitations of vision-based and inertial-based motion capture methods.
- **Multi-device Alignment:** Accurate temporal and spatial alignment of multiple devices.
- **Data Processing and Annotations:** Synchronized and localized data processing, including motion retargeting and 6DoF localization.
**Research Opportunities:**
- **Motion Tasks:** Full-body tracking, motion synthesis, motion forecasting, path planning, action recognition, and human behavior analysis.
- **Multimodal Spatial Reasoning and Video Understanding:** Scene reconstruction, image retrieval, and relocalization.
- **Simulation:** Driving in-context character animation and 3D scene simulation.
**Benchmark Tasks and Baselines:**
- **Motion Tracking and Synthesis:** Evaluating methods for 1-point/3-point body tracking and motion synthesis using Nymeria data.
- **Motion and Language:** Training models for motion-to-text generation using high-quality hierarchical narrations.
**Conclusion:**
Nymeria is the largest collection of human motion in the wild, providing rich multimodal data and detailed language descriptions. It offers significant research opportunities in egocentric motion understanding and related fields.**Introduction:**
The paper introduces Nymeria, a large-scale, diverse, and richly annotated human motion dataset collected in the wild using multiple multimodal egocentric devices. The dataset includes full-body ground-truth motion, multiple multimodal egocentric data from Project Aria devices ( videos, eye tracking, IMUs, etc.), and a third-person perspective. The devices are synchronized and localized in a single metric 3D world.
**Dataset Details:**
- **Scale and Diversity:** Nymeria captures 300 hours of daily activities from 264 participants across 50 locations, with a total travel distance of over 399 km.
- **Data Modality:** It provides 201.2 million egocentric images, 11.7 billion IMU samples, and 10.8 million gaze point clouds.
- **Language Descriptions:** The dataset includes 310.5K sentences in 8.64 million words, derived from coarse-to-fine motion narrations, atomic actions, and activity summaries.
**Technical Challenges:**
- **Long-term Ground-truth Motion:** Overcoming limitations of vision-based and inertial-based motion capture methods.
- **Multi-device Alignment:** Accurate temporal and spatial alignment of multiple devices.
- **Data Processing and Annotations:** Synchronized and localized data processing, including motion retargeting and 6DoF localization.
**Research Opportunities:**
- **Motion Tasks:** Full-body tracking, motion synthesis, motion forecasting, path planning, action recognition, and human behavior analysis.
- **Multimodal Spatial Reasoning and Video Understanding:** Scene reconstruction, image retrieval, and relocalization.
- **Simulation:** Driving in-context character animation and 3D scene simulation.
**Benchmark Tasks and Baselines:**
- **Motion Tracking and Synthesis:** Evaluating methods for 1-point/3-point body tracking and motion synthesis using Nymeria data.
- **Motion and Language:** Training models for motion-to-text generation using high-quality hierarchical narrations.
**Conclusion:**
Nymeria is the largest collection of human motion in the wild, providing rich multimodal data and detailed language descriptions. It offers significant research opportunities in egocentric motion understanding and related fields.