28 Mar 2018 | Holger Caesar1 Jasper Uijlings2 Vittorio Ferrari12 University of Edinburgh1 Google AI Perception2
The paper introduces COCO-Stuff, a dataset that enhances the COCO dataset by adding pixel-wise annotations for 91 stuff classes. Stuff classes, which are amorphous background regions, are often overlooked in computer vision research compared to thing classes (well-defined objects). However, stuff classes are crucial for understanding scenes, as they provide context and constraints for thing classes. The authors develop an efficient annotation protocol based on superpixels, leveraging existing thing annotations, to achieve high-quality and fast labeling. COCO-Stuff is used to analyze the importance of stuff and thing classes, their spatial relations, and the performance of semantic segmentation methods on these classes. The results show that stuff classes cover a significant portion of the image surface and are frequently mentioned in human captions, indicating their importance. Additionally, stuff classes exhibit more varied spatial contexts than thing classes, and the dataset's larger size improves the performance of semantic segmentation models. The paper concludes by highlighting the value of COCO-Stuff for advancing scene understanding and semantic segmentation research.The paper introduces COCO-Stuff, a dataset that enhances the COCO dataset by adding pixel-wise annotations for 91 stuff classes. Stuff classes, which are amorphous background regions, are often overlooked in computer vision research compared to thing classes (well-defined objects). However, stuff classes are crucial for understanding scenes, as they provide context and constraints for thing classes. The authors develop an efficient annotation protocol based on superpixels, leveraging existing thing annotations, to achieve high-quality and fast labeling. COCO-Stuff is used to analyze the importance of stuff and thing classes, their spatial relations, and the performance of semantic segmentation methods on these classes. The results show that stuff classes cover a significant portion of the image surface and are frequently mentioned in human captions, indicating their importance. Additionally, stuff classes exhibit more varied spatial contexts than thing classes, and the dataset's larger size improves the performance of semantic segmentation models. The paper concludes by highlighting the value of COCO-Stuff for advancing scene understanding and semantic segmentation research.