MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

27 Jul 2016 | Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, Jianfeng Gao
This paper introduces the MS-Celeb-1M dataset and benchmark for large-scale face recognition. The task involves recognizing one million celebrities from their face images and linking them to corresponding entity keys in a knowledge base. The dataset includes 10 million images, making it the largest publicly available dataset for this task. The benchmark includes a measurement set with carefully labeled images and a training dataset with 100,000 top celebrities selected based on web appearance frequency. The measurement set includes both popular and less popular celebrities to ensure comprehensive evaluation. The training dataset includes face regions cropped and aligned, and provides thumbnails of the original images. The benchmark task is designed to address two key gaps in current face recognition: disambiguation and scale. The task requires recognizing celebrities rather than a pre-selected group, which aligns with public interest and enables real-world applications. The benchmark also includes a challenging evaluation protocol that measures precision and coverage. The paper presents a baseline performance using a deep neural network, achieving 44.2% recognition accuracy on the measurement set. The dataset and benchmark aim to advance research in large-scale face recognition and related applications.This paper introduces the MS-Celeb-1M dataset and benchmark for large-scale face recognition. The task involves recognizing one million celebrities from their face images and linking them to corresponding entity keys in a knowledge base. The dataset includes 10 million images, making it the largest publicly available dataset for this task. The benchmark includes a measurement set with carefully labeled images and a training dataset with 100,000 top celebrities selected based on web appearance frequency. The measurement set includes both popular and less popular celebrities to ensure comprehensive evaluation. The training dataset includes face regions cropped and aligned, and provides thumbnails of the original images. The benchmark task is designed to address two key gaps in current face recognition: disambiguation and scale. The task requires recognizing celebrities rather than a pre-selected group, which aligns with public interest and enables real-world applications. The benchmark also includes a challenging evaluation protocol that measures precision and coverage. The paper presents a baseline performance using a deep neural network, achieving 44.2% recognition accuracy on the measurement set. The dataset and benchmark aim to advance research in large-scale face recognition and related applications.
Reach us at info@study.space