Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

12 Feb 2018 | João Carreira† and Andrew Zisserman†,*
This paper addresses the challenge of action recognition in videos by evaluating state-of-the-art architectures on the new Kinetics Human Action Video dataset, which contains two orders of magnitude more data than existing datasets like UCF-101 and HMDB-51. The authors introduce a new model, the Two-Stream Inflated 3D ConvNet (I3D), which leverages successful ImageNet architecture designs and parameters by inflating 2D ConvNet filters and pooling kernels into 3D. I3D models achieve significant improvements in action classification, reaching 80.9% on HMDB-51 and 98.0% on UCF-101 after pre-training on Kinetics. The paper also discusses the benefits of pre-training on large video datasets and the transferability of features learned on Kinetics to smaller benchmarks. The results demonstrate that pre-training on Kinetics significantly enhances the performance of video models, highlighting the importance of using large-scale video datasets for advancing action recognition technology.This paper addresses the challenge of action recognition in videos by evaluating state-of-the-art architectures on the new Kinetics Human Action Video dataset, which contains two orders of magnitude more data than existing datasets like UCF-101 and HMDB-51. The authors introduce a new model, the Two-Stream Inflated 3D ConvNet (I3D), which leverages successful ImageNet architecture designs and parameters by inflating 2D ConvNet filters and pooling kernels into 3D. I3D models achieve significant improvements in action classification, reaching 80.9% on HMDB-51 and 98.0% on UCF-101 after pre-training on Kinetics. The paper also discusses the benefits of pre-training on large video datasets and the transferability of features learned on Kinetics to smaller benchmarks. The results demonstrate that pre-training on Kinetics significantly enhances the performance of video models, highlighting the importance of using large-scale video datasets for advancing action recognition technology.
Reach us at info@study.space
[slides] Quo Vadis%2C Action Recognition%3F A New Model and the Kinetics Dataset | StudySpace