26 Apr 2019 | Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian
The paper introduces the Actional-Structural Graph Convolutional Network (AS-GCN) for skeleton-based action recognition. It addresses the limitations of traditional methods that rely on fixed skeleton graphs, which capture only local physical dependencies among joints. To capture richer dependencies, the authors propose an encoder-decoder structure called the A-link inference module (AIM) to infer action-specific latent dependencies (actional links) directly from actions. They also extend the existing skeleton graphs to represent higher-order dependencies (structural links). The generalized skeleton graphs are then fed into the AS-GCN, which stacks actional-structural graph convolutions and temporal convolutions to learn both spatial and temporal features for action recognition. Additionally, a future pose prediction head is added to capture more detailed action patterns through self-supervision. The proposed AS-GCN is evaluated on two large-scale datasets, NTU-RGB+D and Kinetics, and shows significant improvements over state-of-the-art methods. The code for AS-GCN is available at https://github.com/limaosen0/AS-GCN.The paper introduces the Actional-Structural Graph Convolutional Network (AS-GCN) for skeleton-based action recognition. It addresses the limitations of traditional methods that rely on fixed skeleton graphs, which capture only local physical dependencies among joints. To capture richer dependencies, the authors propose an encoder-decoder structure called the A-link inference module (AIM) to infer action-specific latent dependencies (actional links) directly from actions. They also extend the existing skeleton graphs to represent higher-order dependencies (structural links). The generalized skeleton graphs are then fed into the AS-GCN, which stacks actional-structural graph convolutions and temporal convolutions to learn both spatial and temporal features for action recognition. Additionally, a future pose prediction head is added to capture more detailed action patterns through self-supervision. The proposed AS-GCN is evaluated on two large-scale datasets, NTU-RGB+D and Kinetics, and shows significant improvements over state-of-the-art methods. The code for AS-GCN is available at https://github.com/limaosen0/AS-GCN.