21 Feb 2015 | Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár
The Microsoft Common Objects in Context (MS COCO) dataset is introduced to advance object recognition by placing it within the broader context of scene understanding. The dataset contains 2.5 million labeled instances of 91 object categories, with a focus on non-iconic views, contextual reasoning, and precise 2D localization. It is created using a novel pipeline that leverages Amazon Mechanical Turk for data collection and annotation. The dataset is significantly larger and more diverse than existing datasets like ImageNet, PASCAL VOC, and SUN, with more instances per category and a higher number of object instances per image. The paper also presents a detailed statistical analysis of the dataset and provides baseline performance analysis for bounding box and segmentation detection using a Deformable Parts Model. The MS COCO dataset aims to address the challenges of detecting objects in natural environments, performing contextual reasoning, and achieving precise localization, which are crucial for advancing scene understanding in computer vision.The Microsoft Common Objects in Context (MS COCO) dataset is introduced to advance object recognition by placing it within the broader context of scene understanding. The dataset contains 2.5 million labeled instances of 91 object categories, with a focus on non-iconic views, contextual reasoning, and precise 2D localization. It is created using a novel pipeline that leverages Amazon Mechanical Turk for data collection and annotation. The dataset is significantly larger and more diverse than existing datasets like ImageNet, PASCAL VOC, and SUN, with more instances per category and a higher number of object instances per image. The paper also presents a detailed statistical analysis of the dataset and provides baseline performance analysis for bounding box and segmentation detection using a Deformable Parts Model. The MS COCO dataset aims to address the challenges of detecting objects in natural environments, performing contextual reasoning, and achieving precise localization, which are crucial for advancing scene understanding in computer vision.