The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017 | Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman
The Kinetics Human Action Video Dataset is a large-scale dataset containing 400 human action classes, each with at least 400 video clips. The dataset includes a wide range of human actions, such as human-object interactions (e.g., playing instruments) and human-human interactions (e.g., shaking hands). Each clip is approximately 10 seconds long and sourced from different YouTube videos, providing a diverse and varied collection of actions. The dataset is designed to be used for human action classification rather than temporal localization, and it includes sound, making it suitable for multi-modal analysis. The dataset was created to address the limitations of existing human action datasets, such as HMDB-51 and UCF-101, which are too small or lack sufficient variation for modern deep learning models. The Kinetics dataset includes 306,245 videos, divided into three splits: training, validation, and testing. The dataset was collected through a multi-step process involving the identification of action classes, the selection of candidate clips from YouTube, and manual labeling by workers on Amazon Mechanical Turk. The dataset was also analyzed for potential biases, including gender, age, and race imbalances. While some classes showed gender imbalance, the analysis found little evidence of classifier bias in action classification. The dataset is also used to evaluate different neural network architectures, including ConvNets with LSTM, two-stream networks, and 3D ConvNets. These models were tested on the Kinetics dataset and compared to their performance on UCF-101 and HMDB-51. The Kinetics dataset provides a valuable resource for research in human action classification, offering a large and diverse set of video clips that can be used to train and evaluate deep learning models. The dataset is available for use and includes trained baseline models in TensorFlow for further research and development.The Kinetics Human Action Video Dataset is a large-scale dataset containing 400 human action classes, each with at least 400 video clips. The dataset includes a wide range of human actions, such as human-object interactions (e.g., playing instruments) and human-human interactions (e.g., shaking hands). Each clip is approximately 10 seconds long and sourced from different YouTube videos, providing a diverse and varied collection of actions. The dataset is designed to be used for human action classification rather than temporal localization, and it includes sound, making it suitable for multi-modal analysis. The dataset was created to address the limitations of existing human action datasets, such as HMDB-51 and UCF-101, which are too small or lack sufficient variation for modern deep learning models. The Kinetics dataset includes 306,245 videos, divided into three splits: training, validation, and testing. The dataset was collected through a multi-step process involving the identification of action classes, the selection of candidate clips from YouTube, and manual labeling by workers on Amazon Mechanical Turk. The dataset was also analyzed for potential biases, including gender, age, and race imbalances. While some classes showed gender imbalance, the analysis found little evidence of classifier bias in action classification. The dataset is also used to evaluate different neural network architectures, including ConvNets with LSTM, two-stream networks, and 3D ConvNets. These models were tested on the Kinetics dataset and compared to their performance on UCF-101 and HMDB-51. The Kinetics dataset provides a valuable resource for research in human action classification, offering a large and diverse set of video clips that can be used to train and evaluate deep learning models. The dataset is available for use and includes trained baseline models in TensorFlow for further research and development.
Reach us at info@study.space
Understanding The Kinetics Human Action Video Dataset