2024 | Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
This paper introduces a new micro-action dataset, Micro-action-52 (MA-52), and proposes a benchmark network, Micro-Action Network (MANet), for micro-action recognition (MAR). MA-52 contains 52 micro-action categories and 22,422 video instances, collected from psychological interviews with 205 participants. The dataset includes whole-body movements, such as gestures, upper- and lower-limb actions, and provides a comprehensive view of micro-action cues. MANet integrates squeeze-and-excitation (SE) and temporal shift module (TSM) into the ResNet architecture to model spatiotemporal characteristics of micro-actions. A joint-embedding loss is designed to enhance semantic matching between video and action labels. The dataset and source code are available at https://github.com/VUT-HFUT/Micro-Action.
Micro-action recognition aims to detect and distinguish ephemeral body movements, typically occurring within a temporal span of 1/25s to 1/3s. The challenges of MAR include minor visual changes, approximate inter-class differences, and long-tailed distribution. Existing methods for generic action recognition focus on coarse-grained categorization, while recent efforts have focused on fine-grained recognition. However, these methods are limited to specific scenarios. The MA-52 dataset addresses these limitations by providing a comprehensive whole-body dataset with a large number of video instances, diverse participants, and a wide range of body parts and action categories.
The proposed MANet outperforms existing methods by integrating SE and TSM into the ResNet backbone and designing a joint-embedding loss to constrain the semantic distance between video data and action labels. The dataset is also applied to emotion analysis, demonstrating its practical value. The MA-52 dataset provides reliable indicators of individual characteristics, emotional states, thought processes, and intentions. The dataset and source code are available at https://github.com/VUT-HFUT/Micro-Action.Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
This paper introduces a new micro-action dataset, Micro-action-52 (MA-52), and proposes a benchmark network, Micro-Action Network (MANet), for micro-action recognition (MAR). MA-52 contains 52 micro-action categories and 22,422 video instances, collected from psychological interviews with 205 participants. The dataset includes whole-body movements, such as gestures, upper- and lower-limb actions, and provides a comprehensive view of micro-action cues. MANet integrates squeeze-and-excitation (SE) and temporal shift module (TSM) into the ResNet architecture to model spatiotemporal characteristics of micro-actions. A joint-embedding loss is designed to enhance semantic matching between video and action labels. The dataset and source code are available at https://github.com/VUT-HFUT/Micro-Action.
Micro-action recognition aims to detect and distinguish ephemeral body movements, typically occurring within a temporal span of 1/25s to 1/3s. The challenges of MAR include minor visual changes, approximate inter-class differences, and long-tailed distribution. Existing methods for generic action recognition focus on coarse-grained categorization, while recent efforts have focused on fine-grained recognition. However, these methods are limited to specific scenarios. The MA-52 dataset addresses these limitations by providing a comprehensive whole-body dataset with a large number of video instances, diverse participants, and a wide range of body parts and action categories.
The proposed MANet outperforms existing methods by integrating SE and TSM into the ResNet backbone and designing a joint-embedding loss to constrain the semantic distance between video data and action labels. The dataset is also applied to emotion analysis, demonstrating its practical value. The MA-52 dataset provides reliable indicators of individual characteristics, emotional states, thought processes, and intentions. The dataset and source code are available at https://github.com/VUT-HFUT/Micro-Action.