YouTube-8M: A Large-Scale Video Classification Benchmark

YouTube-8M: A Large-Scale Video Classification Benchmark

27 Sep 2016 | Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan
The paper introduces YouTube-8M, a large-scale multi-label video classification dataset containing approximately 8 million videos and 500,000 hours of video content. The dataset is annotated with a vocabulary of 4,800 visual entities, derived from YouTube's video annotation system, which labels videos with main topics. The authors pre-process the videos using a Deep CNN (Inception) to extract frame-level features, which are then compressed and made available for download. The dataset includes frame-level features for over 1.9 billion video frames and 8 million videos, making it the largest public multi-label video dataset. The paper presents various classification models trained on the dataset and evaluates them using popular metrics. Despite the large size of the dataset, some models converge in less than a day on a single machine using TensorFlow. The authors demonstrate that pre-training on YouTube-8M generalizes to other datasets like Sports-1M and ActivityNet, achieving state-of-the-art results on ActivityNet with mAP improving from 53.8% to 77.6%. The dataset and experiments aim to advance video understanding and representation learning, providing a valuable resource for researchers to explore new technologies at an unprecedented scale.The paper introduces YouTube-8M, a large-scale multi-label video classification dataset containing approximately 8 million videos and 500,000 hours of video content. The dataset is annotated with a vocabulary of 4,800 visual entities, derived from YouTube's video annotation system, which labels videos with main topics. The authors pre-process the videos using a Deep CNN (Inception) to extract frame-level features, which are then compressed and made available for download. The dataset includes frame-level features for over 1.9 billion video frames and 8 million videos, making it the largest public multi-label video dataset. The paper presents various classification models trained on the dataset and evaluates them using popular metrics. Despite the large size of the dataset, some models converge in less than a day on a single machine using TensorFlow. The authors demonstrate that pre-training on YouTube-8M generalizes to other datasets like Sports-1M and ActivityNet, achieving state-of-the-art results on ActivityNet with mAP improving from 53.8% to 77.6%. The dataset and experiments aim to advance video understanding and representation learning, providing a valuable resource for researchers to explore new technologies at an unprecedented scale.
Reach us at info@study.space
[slides] YouTube-8M%3A A Large-Scale Video Classification Benchmark | StudySpace