**EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding**
**Authors:** Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng
**Institution:** Sun Yat-sen University, Peng Cheng Laboratory, Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education
**Abstract:**
EgoExo-Fitness is a new dataset for full-body action understanding, featuring synchronized egocentric and exocentric videos of fitness activities. Unlike existing datasets that primarily use exocentric cameras, EgoExo-Fitness provides rich annotations, including two-level temporal boundaries and interpretable action judgment. The dataset includes 1276 cross-view action sequence videos, spanning 32 hours, with over 6000 single fitness actions. It introduces novel annotations such as technical keypoint verification, natural language comments, and action quality scores. The dataset is used to benchmark five tasks: action classification, action localization, cross-view sequence verification, cross-view skill determination, and guidance-based execution verification. The results highlight the challenges and opportunities in egocentric and exocentric full-body action understanding.
**Keywords:**
Egocentric video dataset, Full-body action understanding, Fitness practising, Interpretable action judgment
**Introduction:**
The paper introduces EgoExo-Fitness, a dataset for studying egocentric and exocentric full-body action understanding. It features synchronized egocentric and exocentric videos of fitness activities, providing rich annotations for action localization and interpretable action judgment. The dataset includes 1276 cross-view action sequence videos, spanning 32 hours, with over 6000 single fitness actions. It introduces novel annotations such as technical keypoint verification, natural language comments, and action quality scores. The dataset is used to benchmark five tasks: action classification, action localization, cross-view sequence verification, cross-view skill determination, and guidance-based execution verification.
**Contributions:**
1. Introduction of EgoExo-Fitness, a new dataset for full-body action understanding.
2. Introduction of rich annotations on action judgment.
3. Construction of benchmarks on five relevant tasks.
**Related Work:**
The paper revisits current datasets for full-body action understanding and egocentric video understanding, highlighting the differences between EgoExo-Fitness and existing datasets. It also discusses relevant tasks such as action classification, sequence verification, and action assessment.
**Dataset Details:**
- **Recording System:** A headset with multiple cameras captures egocentric videos, and exocentric cameras are located at the participant's front, left-front, and right-front sides.
- **Action Sequence and Recording Protocols:** 12 types of fitness actions are selected, and 86 action sequences are defined by combining 3 to 6 different actions.
- **Annotations:****EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding**
**Authors:** Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng
**Institution:** Sun Yat-sen University, Peng Cheng Laboratory, Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education
**Abstract:**
EgoExo-Fitness is a new dataset for full-body action understanding, featuring synchronized egocentric and exocentric videos of fitness activities. Unlike existing datasets that primarily use exocentric cameras, EgoExo-Fitness provides rich annotations, including two-level temporal boundaries and interpretable action judgment. The dataset includes 1276 cross-view action sequence videos, spanning 32 hours, with over 6000 single fitness actions. It introduces novel annotations such as technical keypoint verification, natural language comments, and action quality scores. The dataset is used to benchmark five tasks: action classification, action localization, cross-view sequence verification, cross-view skill determination, and guidance-based execution verification. The results highlight the challenges and opportunities in egocentric and exocentric full-body action understanding.
**Keywords:**
Egocentric video dataset, Full-body action understanding, Fitness practising, Interpretable action judgment
**Introduction:**
The paper introduces EgoExo-Fitness, a dataset for studying egocentric and exocentric full-body action understanding. It features synchronized egocentric and exocentric videos of fitness activities, providing rich annotations for action localization and interpretable action judgment. The dataset includes 1276 cross-view action sequence videos, spanning 32 hours, with over 6000 single fitness actions. It introduces novel annotations such as technical keypoint verification, natural language comments, and action quality scores. The dataset is used to benchmark five tasks: action classification, action localization, cross-view sequence verification, cross-view skill determination, and guidance-based execution verification.
**Contributions:**
1. Introduction of EgoExo-Fitness, a new dataset for full-body action understanding.
2. Introduction of rich annotations on action judgment.
3. Construction of benchmarks on five relevant tasks.
**Related Work:**
The paper revisits current datasets for full-body action understanding and egocentric video understanding, highlighting the differences between EgoExo-Fitness and existing datasets. It also discusses relevant tasks such as action classification, sequence verification, and action assessment.
**Dataset Details:**
- **Recording System:** A headset with multiple cameras captures egocentric videos, and exocentric cameras are located at the participant's front, left-front, and right-front sides.
- **Action Sequence and Recording Protocols:** 12 types of fitness actions are selected, and 86 action sequences are defined by combining 3 to 6 different actions.
- **Annotations:**