Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

25 Jan 2018 | Sijie Yan, Yuanjun Xiong, Dahua Lin
This paper introduces a novel model called Spatial-Temporal Graph Convolutional Networks (ST-GCN) for skeleton-based action recognition. ST-GCN addresses the limitations of traditional methods that rely on hand-crafted parts or traversal rules, which often result in limited expressive power and difficulty in generalization. ST-GCN automatically learns both spatial and temporal patterns from data, leading to greater expressive power and stronger generalization capability. The model is designed to handle dynamic skeletons represented as a sequence of joint coordinates, where each joint corresponds to a node in the graph, and edges represent both spatial and temporal relationships. The spatial-temporal graph convolution operations integrate information along both dimensions, enabling the model to capture hierarchical and local properties of skeleton sequences. The paper also proposes several strategies for designing graph convolution kernels, inspired by image models, to improve the model's performance. Extensive experiments on two large datasets, Kinetics and NTU-RGBD, demonstrate that ST-GCN outperforms existing methods, achieving superior results with less manual design effort. The code and models are publicly available for further research.This paper introduces a novel model called Spatial-Temporal Graph Convolutional Networks (ST-GCN) for skeleton-based action recognition. ST-GCN addresses the limitations of traditional methods that rely on hand-crafted parts or traversal rules, which often result in limited expressive power and difficulty in generalization. ST-GCN automatically learns both spatial and temporal patterns from data, leading to greater expressive power and stronger generalization capability. The model is designed to handle dynamic skeletons represented as a sequence of joint coordinates, where each joint corresponds to a node in the graph, and edges represent both spatial and temporal relationships. The spatial-temporal graph convolution operations integrate information along both dimensions, enabling the model to capture hierarchical and local properties of skeleton sequences. The paper also proposes several strategies for designing graph convolution kernels, inspired by image models, to improve the model's performance. Extensive experiments on two large datasets, Kinetics and NTU-RGBD, demonstrate that ST-GCN outperforms existing methods, achieving superior results with less manual design effort. The code and models are publicly available for further research.
Reach us at info@study.space
[slides] Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition | StudySpace