EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

16 Jul 2024 | Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng
EgoExo-Fitness is a new full-body action understanding dataset featuring fitness sequence videos recorded from synchronized egocentric and exocentric (third-person) cameras. Unlike existing datasets, EgoExo-Fitness includes videos from first-person perspectives and provides rich annotations, including two-level temporal boundaries for localizing single actions and sub-steps. It also introduces innovative annotations for interpretable action judgment, such as technical keypoint verification, natural language comments on action execution, and action quality scores. The dataset spans 32 hours with 1276 cross-view action sequence videos featuring over 6000 single fitness actions. It provides new resources for studying egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research, benchmarks are constructed on tasks such as action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification. The dataset and benchmarks are available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main. The dataset includes synchronized ego-exo videos and rich annotations, enabling studies on view characteristics, cross-view modeling, and action guiding. It also introduces novel annotations on interpretable action judgment, making it different from existing datasets. The dataset provides videos captured from two downward ego-cameras and annotations of two-level temporal boundaries to enable benchmark constructions. The dataset is compared with related datasets, including Ego-Exo4D, and is found to have unique characteristics such as a new scenario, novel annotations, and other unique features. The dataset is used to conduct benchmarks on action classification, cross-view sequence verification, and guidance-based execution verification, providing insights into the performance of baseline models and highlighting challenges in cross-view modeling with unbalanced data. The dataset and benchmarks are expected to inspire future research on egocentric and exocentric full-body action understanding.EgoExo-Fitness is a new full-body action understanding dataset featuring fitness sequence videos recorded from synchronized egocentric and exocentric (third-person) cameras. Unlike existing datasets, EgoExo-Fitness includes videos from first-person perspectives and provides rich annotations, including two-level temporal boundaries for localizing single actions and sub-steps. It also introduces innovative annotations for interpretable action judgment, such as technical keypoint verification, natural language comments on action execution, and action quality scores. The dataset spans 32 hours with 1276 cross-view action sequence videos featuring over 6000 single fitness actions. It provides new resources for studying egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research, benchmarks are constructed on tasks such as action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification. The dataset and benchmarks are available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main. The dataset includes synchronized ego-exo videos and rich annotations, enabling studies on view characteristics, cross-view modeling, and action guiding. It also introduces novel annotations on interpretable action judgment, making it different from existing datasets. The dataset provides videos captured from two downward ego-cameras and annotations of two-level temporal boundaries to enable benchmark constructions. The dataset is compared with related datasets, including Ego-Exo4D, and is found to have unique characteristics such as a new scenario, novel annotations, and other unique features. The dataset is used to conduct benchmarks on action classification, cross-view sequence verification, and guidance-based execution verification, providing insights into the performance of baseline models and highlighting challenges in cross-view modeling with unbalanced data. The dataset and benchmarks are expected to inspire future research on egocentric and exocentric full-body action understanding.
Reach us at info@study.space
[slides] EgoExo-Fitness%3A Towards Egocentric and Exocentric Full-Body Action Understanding | StudySpace