30 Jan 2015 | Olga Russakovsky* · Jia Deng* · Hao Su · Jonathan Krause · Sanjeev Satheesh · Sean Ma · Zhiheng Huang · Andrej Karpathy · Aditya Khosla · Michael Bernstein · Alexander C. Berg · Li Fei-Fei
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark for object category classification and detection across hundreds of object categories and millions of images. It has been held annually since 2010, attracting participation from over fifty institutions. This paper describes the creation of the benchmark dataset and the advances in object recognition that have resulted from it. It discusses the challenges of collecting large-scale ground truth annotations, highlights key breakthroughs in categorical object recognition, provides a detailed analysis of the current state of large-scale image classification and object detection, and compares state-of-the-art computer vision accuracy with human accuracy. The paper also outlines lessons learned over five years of the challenge and proposes future directions and improvements.
The ILSVRC dataset consists of manually annotated training images and test images with withheld annotations. Participants train their algorithms using the training images and then automatically annotate the test images. These predicted annotations are submitted to the evaluation server. Results are revealed at the end of the competition period, and authors are invited to share insights at the workshop.
The dataset includes two types of annotations: image-level annotations for the presence or absence of an object class, and object-level annotations with tight bounding boxes and class labels. The creation of the dataset involved addressing challenges in scaling up from PASCAL VOC 2010 to ILSVRC 2010, including the need for novel crowdsourcing approaches for large-scale annotations.
The ILSVRC challenge includes three main tasks: image classification, single-object localization, and object detection. Image classification involves identifying the object categories present in an image. Single-object localization involves identifying the presence of an object and its location. Object detection involves identifying and localizing all instances of all target objects in an image.
The dataset construction process involved defining target object categories, collecting a diverse set of candidate images, and annotating the images with bounding boxes. The image classification dataset consists of 1000 object classes and approximately 1.2 million training images, 50,000 validation images, and 100,000 test images. The single-object localization dataset consists of 1000 object classes and includes bounding boxes around every instance of the object. The object detection dataset consists of 200 object classes and includes bounding boxes around every instance of the object.
The paper discusses the challenges of creating the benchmark dataset, the developments in object classification and detection that have resulted from the effort, and the current state of the field of categorical object recognition. It also provides an analysis of the statistical properties of objects and their impact on recognition algorithms. The paper concludes with lessons learned from the challenge and proposes future directions and improvements.The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a benchmark for object category classification and detection across hundreds of object categories and millions of images. It has been held annually since 2010, attracting participation from over fifty institutions. This paper describes the creation of the benchmark dataset and the advances in object recognition that have resulted from it. It discusses the challenges of collecting large-scale ground truth annotations, highlights key breakthroughs in categorical object recognition, provides a detailed analysis of the current state of large-scale image classification and object detection, and compares state-of-the-art computer vision accuracy with human accuracy. The paper also outlines lessons learned over five years of the challenge and proposes future directions and improvements.
The ILSVRC dataset consists of manually annotated training images and test images with withheld annotations. Participants train their algorithms using the training images and then automatically annotate the test images. These predicted annotations are submitted to the evaluation server. Results are revealed at the end of the competition period, and authors are invited to share insights at the workshop.
The dataset includes two types of annotations: image-level annotations for the presence or absence of an object class, and object-level annotations with tight bounding boxes and class labels. The creation of the dataset involved addressing challenges in scaling up from PASCAL VOC 2010 to ILSVRC 2010, including the need for novel crowdsourcing approaches for large-scale annotations.
The ILSVRC challenge includes three main tasks: image classification, single-object localization, and object detection. Image classification involves identifying the object categories present in an image. Single-object localization involves identifying the presence of an object and its location. Object detection involves identifying and localizing all instances of all target objects in an image.
The dataset construction process involved defining target object categories, collecting a diverse set of candidate images, and annotating the images with bounding boxes. The image classification dataset consists of 1000 object classes and approximately 1.2 million training images, 50,000 validation images, and 100,000 test images. The single-object localization dataset consists of 1000 object classes and includes bounding boxes around every instance of the object. The object detection dataset consists of 200 object classes and includes bounding boxes around every instance of the object.
The paper discusses the challenges of creating the benchmark dataset, the developments in object classification and detection that have resulted from the effort, and the current state of the field of categorical object recognition. It also provides an analysis of the statistical properties of objects and their impact on recognition algorithms. The paper concludes with lessons learned from the challenge and proposes future directions and improvements.