18 April 2024 | Zengzhao Chen, Wenkai Huang, Hai Liu, Zhuo Wang, Yuqun Wen and Shengming Wang
This paper proposes a teaching gesture recognition algorithm based on skeleton keypoints, called ST-TGR, to address the challenge of single-target dynamic gesture recognition in multi-person teaching scenarios. The algorithm uses human pose estimation technology to extract the coordinates of the teacher's skeleton key points from classroom teaching videos and then inputs the recognized sequence of the teacher's skeleton into the MoGRU action recognition network for gesture action classification. The MoGRU action recognition module mainly learns the spatio-temporal representation of target actions by stacking a multi-scale bidirectional gated recurrent unit (BiGRU) and using an improved attention mechanism module. The algorithm is validated on datasets including NTU RGB+D 60, UT-Kinect Action3D, SBU Kinect Interaction, and Florence 3D. The results indicate that the proposed model exhibits better performance in recognition accuracy and speed compared to most existing baseline models. Additionally, a teaching gesture action dataset (TGAD) is constructed based on a real classroom teaching scenario, which includes four types of teaching gesture actions, totaling 400 samples. The proposed method achieves 93.5% recognition accuracy on this dataset. The paper also compares the proposed model with existing baseline models and conducts ablation experiments to evaluate the effectiveness of the model structure. The results demonstrate that the proposed model has strong generalization ability and can complete recognition of actions beyond teaching gestures. The algorithm is expected to improve teaching quality and enhance student learning experience in education.This paper proposes a teaching gesture recognition algorithm based on skeleton keypoints, called ST-TGR, to address the challenge of single-target dynamic gesture recognition in multi-person teaching scenarios. The algorithm uses human pose estimation technology to extract the coordinates of the teacher's skeleton key points from classroom teaching videos and then inputs the recognized sequence of the teacher's skeleton into the MoGRU action recognition network for gesture action classification. The MoGRU action recognition module mainly learns the spatio-temporal representation of target actions by stacking a multi-scale bidirectional gated recurrent unit (BiGRU) and using an improved attention mechanism module. The algorithm is validated on datasets including NTU RGB+D 60, UT-Kinect Action3D, SBU Kinect Interaction, and Florence 3D. The results indicate that the proposed model exhibits better performance in recognition accuracy and speed compared to most existing baseline models. Additionally, a teaching gesture action dataset (TGAD) is constructed based on a real classroom teaching scenario, which includes four types of teaching gesture actions, totaling 400 samples. The proposed method achieves 93.5% recognition accuracy on this dataset. The paper also compares the proposed model with existing baseline models and conducts ablation experiments to evaluate the effectiveness of the model structure. The results demonstrate that the proposed model has strong generalization ability and can complete recognition of actions beyond teaching gestures. The algorithm is expected to improve teaching quality and enhance student learning experience in education.