20 Sep 2024 | Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, and Richard Newcombe
Nymeria is a large-scale, diverse, and richly annotated human motion dataset collected in the wild using multiple multimodal egocentric devices. The dataset includes full-body ground-truth motion, multiple multimodal egocentric data from Project Aria devices (videos, eye tracking, IMUs, etc.), and a third-person perspective from an additional observer. All devices are precisely synchronized and localized in a single metric 3D world. The dataset provides 300 hours of daily activities from 264 participants across 50 locations, with a total traveling distance of over 399 km. It contains 310.5K sentences in 8.64M words from a vocabulary of 6545 words, making it the world's largest motion-language dataset. The dataset includes hierarchical protocol for adding in-context language descriptions of human motion, from fine-grain motion narrations to simplified atomic actions and high-level activity summarization. Nymeria is the first-of-its-kind dataset recorded with multiple multimodal egocentric devices, including XSens mocap suit, Project Aria glasses, and Aria-alike wristbands. The dataset is synchronized with a non-intrusive hardware solution with sub-millisecond accuracy and localized into a single metric 3D world using Project Aria Machine Perception Service (MPS). Novel algorithms are developed to retarget XSens skeleton motion into a full parametric human model and correct global drift with optimization. A coarse-to-fine narration schema is developed to describe in-context human motion at different granularities. Nymeria provides 300 hours of daily activities from 264 participants across 50 locations, with 399 km of traveling distance, 260M body poses, 201.2M images, 11.7B IMUs, and 10.8M gazes. The dataset is used to evaluate several state-of-the-art algorithms for egocentric body tracking, motion synthesis, and action recognition. The dataset is also used to explore novel research directions in motion tasks, multimodal spatial reasoning and video understanding, and simulation. The dataset is the world's largest collection of human motion in the wild with 300 hours of daily activity, 260M body poses of 264 participants across 50 locations. It provides accurate 6DoF tracking, 3D scene points and gaze, with all modalities synchronized and aligned into one metric 3D world. The dataset captures 399 km of travel by the participants for a total of 201.2M egocentric images, 11.7B IMU samples and 10.8M gaze point. The Nymeria dataset also stands out as the world largest motion-language dataset with 310.5K sentences in 8.64Nymeria is a large-scale, diverse, and richly annotated human motion dataset collected in the wild using multiple multimodal egocentric devices. The dataset includes full-body ground-truth motion, multiple multimodal egocentric data from Project Aria devices (videos, eye tracking, IMUs, etc.), and a third-person perspective from an additional observer. All devices are precisely synchronized and localized in a single metric 3D world. The dataset provides 300 hours of daily activities from 264 participants across 50 locations, with a total traveling distance of over 399 km. It contains 310.5K sentences in 8.64M words from a vocabulary of 6545 words, making it the world's largest motion-language dataset. The dataset includes hierarchical protocol for adding in-context language descriptions of human motion, from fine-grain motion narrations to simplified atomic actions and high-level activity summarization. Nymeria is the first-of-its-kind dataset recorded with multiple multimodal egocentric devices, including XSens mocap suit, Project Aria glasses, and Aria-alike wristbands. The dataset is synchronized with a non-intrusive hardware solution with sub-millisecond accuracy and localized into a single metric 3D world using Project Aria Machine Perception Service (MPS). Novel algorithms are developed to retarget XSens skeleton motion into a full parametric human model and correct global drift with optimization. A coarse-to-fine narration schema is developed to describe in-context human motion at different granularities. Nymeria provides 300 hours of daily activities from 264 participants across 50 locations, with 399 km of traveling distance, 260M body poses, 201.2M images, 11.7B IMUs, and 10.8M gazes. The dataset is used to evaluate several state-of-the-art algorithms for egocentric body tracking, motion synthesis, and action recognition. The dataset is also used to explore novel research directions in motion tasks, multimodal spatial reasoning and video understanding, and simulation. The dataset is the world's largest collection of human motion in the wild with 300 hours of daily activity, 260M body poses of 264 participants across 50 locations. It provides accurate 6DoF tracking, 3D scene points and gaze, with all modalities synchronized and aligned into one metric 3D world. The dataset captures 399 km of travel by the participants for a total of 201.2M egocentric images, 11.7B IMU samples and 10.8M gaze point. The Nymeria dataset also stands out as the world largest motion-language dataset with 310.5K sentences in 8.64