VoxCeleb: a large-scale speaker identification dataset

VoxCeleb: a large-scale speaker identification dataset

30 May 2018 | Arsha Nagrani†, Joon Son Chung†, Andrew Zisserman
The paper introduces VoxCeleb, a large-scale dataset for speaker identification and verification, collected from open-source media. The authors propose an automated pipeline that uses computer vision techniques to obtain videos from YouTube, perform active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirm speaker identity using facial recognition. This pipeline curates VoxCeleb, which contains hundreds of thousands of real-world utterances for over 1,000 celebrities. The second contribution is the application and comparison of various state-of-the-art speaker identification techniques on the dataset, demonstrating that a CNN-based architecture performs best for both identification and verification tasks. The dataset is available for download and can be used for both speaker identification and verification, with detailed statistics provided. The paper also discusses the experimental setup and compares the performance of the proposed CNN baseline with traditional state-of-the-art methods, showing superior results.The paper introduces VoxCeleb, a large-scale dataset for speaker identification and verification, collected from open-source media. The authors propose an automated pipeline that uses computer vision techniques to obtain videos from YouTube, perform active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirm speaker identity using facial recognition. This pipeline curates VoxCeleb, which contains hundreds of thousands of real-world utterances for over 1,000 celebrities. The second contribution is the application and comparison of various state-of-the-art speaker identification techniques on the dataset, demonstrating that a CNN-based architecture performs best for both identification and verification tasks. The dataset is available for download and can be used for both speaker identification and verification, with detailed statistics provided. The paper also discusses the experimental setup and compares the performance of the proposed CNN baseline with traditional state-of-the-art methods, showing superior results.
Reach us at info@study.space