24 Jul 2016 | Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang
This paper proposes a spatio-temporal long short-term memory (ST-LSTM) network for 3D human action recognition. The method extends traditional LSTM-based learning to both temporal and spatial domains to analyze hidden sources of action-related information. Inspired by the structure of the human skeleton, a tree-structured traversal method is introduced to better model spatial dependencies. To handle noise and occlusion in 3D skeleton data, a new gating mechanism, called the "trust gate," is added to the LSTM to assess the reliability of input data at each spatio-temporal step. This allows the network to adjust its effect on updating long-term context information stored in the memory cell. The proposed method achieves state-of-the-art performance on four benchmark datasets for 3D human action analysis.
The ST-LSTM model simultaneously models spatial dependencies between joints and temporal dependencies among frames. Each joint receives contextual information from neighboring joints and previous frames. A tree-based traversal method is used to explore the kinematic relationships between joints for better spatial dependency modeling. The trust gate mechanism helps the network to ignore unreliable input data, improving robustness to noise and occlusion. The model is evaluated on four datasets: NTU RGB+D, SBU Interaction, UT-Kinect, and Berkeley MHAD. The results show that the proposed method outperforms existing methods in terms of accuracy and robustness. The method is also effective in handling noisy input data, as demonstrated by experiments on the MSR Action3D dataset. The proposed approach is general and can be applied to other applications dealing with unreliable input data. The results validate the effectiveness of the proposed contributions and demonstrate the superiority of the method over existing state-of-the-art approaches.This paper proposes a spatio-temporal long short-term memory (ST-LSTM) network for 3D human action recognition. The method extends traditional LSTM-based learning to both temporal and spatial domains to analyze hidden sources of action-related information. Inspired by the structure of the human skeleton, a tree-structured traversal method is introduced to better model spatial dependencies. To handle noise and occlusion in 3D skeleton data, a new gating mechanism, called the "trust gate," is added to the LSTM to assess the reliability of input data at each spatio-temporal step. This allows the network to adjust its effect on updating long-term context information stored in the memory cell. The proposed method achieves state-of-the-art performance on four benchmark datasets for 3D human action analysis.
The ST-LSTM model simultaneously models spatial dependencies between joints and temporal dependencies among frames. Each joint receives contextual information from neighboring joints and previous frames. A tree-based traversal method is used to explore the kinematic relationships between joints for better spatial dependency modeling. The trust gate mechanism helps the network to ignore unreliable input data, improving robustness to noise and occlusion. The model is evaluated on four datasets: NTU RGB+D, SBU Interaction, UT-Kinect, and Berkeley MHAD. The results show that the proposed method outperforms existing methods in terms of accuracy and robustness. The method is also effective in handling noisy input data, as demonstrated by experiments on the MSR Action3D dataset. The proposed approach is general and can be applied to other applications dealing with unreliable input data. The results validate the effectiveness of the proposed contributions and demonstrate the superiority of the method over existing state-of-the-art approaches.