21 Feb 2020 | Alina Kuznetsova · Hassan Rom · Neil Alldrin · Jasper Uijlings · Ivan Krasin · Jordi Pont-Tuset · Shahab Kamali · Stefan Popov · Matteo Malloci · Alexander Kolesnikov · Tom Duerig · Vittorio Ferrari
The Open Images Dataset V4 is a comprehensive resource for image classification, object detection, and visual relationship detection, featuring 9.2 million images with unified annotations. The dataset is released under a Creative Commons Attribution license, allowing for broad sharing and adaptation. Key features include:
- **Scale**: 30.1 million image-level labels for 19,800 concepts, 15.4 million bounding boxes for 600 object classes, and 375,000 visual relationship annotations involving 57 classes.
- **Complexity**: Images often contain multiple objects (average of 8 annotated objects per image), making it ideal for advanced detection models.
- **Unified Annotations**: Annotations for image classification, object detection, and visual relationship detection coexist in the same images, enabling cross-task training and analysis.
- **Quality**: In-depth statistics and validation of annotation quality, including geometric accuracy of bounding boxes and recall of image-level annotations.
- **Applications**: Demonstrates two applications: fine-grained object detection without fine-grained box labels and zero-shot visual relationship detection.
The dataset was collected from Flickr, avoiding predefined class names or tags, leading to natural class statistics and reducing bias. The acquisition process involved identifying CC-BY licensed images, removing those appearing elsewhere on the internet, and ensuring a high proportion of complex images with multiple objects. The dataset is available for research and innovation in computer vision, particularly in areas requiring structured reasoning and multi-type annotations.The Open Images Dataset V4 is a comprehensive resource for image classification, object detection, and visual relationship detection, featuring 9.2 million images with unified annotations. The dataset is released under a Creative Commons Attribution license, allowing for broad sharing and adaptation. Key features include:
- **Scale**: 30.1 million image-level labels for 19,800 concepts, 15.4 million bounding boxes for 600 object classes, and 375,000 visual relationship annotations involving 57 classes.
- **Complexity**: Images often contain multiple objects (average of 8 annotated objects per image), making it ideal for advanced detection models.
- **Unified Annotations**: Annotations for image classification, object detection, and visual relationship detection coexist in the same images, enabling cross-task training and analysis.
- **Quality**: In-depth statistics and validation of annotation quality, including geometric accuracy of bounding boxes and recall of image-level annotations.
- **Applications**: Demonstrates two applications: fine-grained object detection without fine-grained box labels and zero-shot visual relationship detection.
The dataset was collected from Flickr, avoiding predefined class names or tags, leading to natural class statistics and reducing bias. The acquisition process involved identifying CC-BY licensed images, removing those appearing elsewhere on the internet, and ensuring a high proportion of complex images with multiple objects. The dataset is available for research and innovation in computer vision, particularly in areas requiring structured reasoning and multi-type annotations.