COCONut is a new large-scale universal segmentation dataset that addresses the limitations of the COCO benchmark. The COCO dataset has been widely used in computer vision for over a decade, but its annotations have been found to be inconsistent and of lower quality, particularly in terms of segmentation masks and semantic labels. COCONut was developed to provide high-quality, human-verified annotations for 383,000 images with over 5.18 million panoptic masks. It includes a validation set with 25,000 images and 437,000 masks, carefully re-labeled from the COCO validation set and additional images from Objects365.
COCONut was created through an assisted-manual annotation pipeline that combines machine-generated proposals with human editing. This pipeline includes four stages: machine-generated pseudo labels, human inspection and editing, mask generation or refinement, and quality verification. The pipeline was designed to ensure high-quality annotations while maintaining scalability. The dataset also includes a data engine that leverages the COCONut-S dataset to improve neural networks and generate larger training sets, such as COCONut-B and COCONut-L.
The COCONut dataset includes 133 semantic classes, with 80 categorized as 'thing' and 53 as 'stuff'. It provides a more comprehensive and consistent set of annotations compared to COCO, which has been found to have inconsistencies and biases in its annotations. COCONut's annotations are more accurate and consistent, with sharper boundaries and fewer errors compared to COCO. The dataset is also more diverse, including images from both COCO and Objects365, providing a broader range of classes and masks.
COCONut has been used to evaluate the performance of various segmentation models, including panoptic segmentation, instance segmentation, semantic segmentation, object detection, and open-vocabulary segmentation. The results show that COCONut provides a more challenging and accurate benchmark for these tasks compared to COCO. The dataset has also been used to train and evaluate models, with COCONut-B and COCONut-L providing larger and more diverse training sets. The results indicate that COCONut's high-quality annotations lead to better performance in segmentation tasks compared to pseudo-labels.COCONut is a new large-scale universal segmentation dataset that addresses the limitations of the COCO benchmark. The COCO dataset has been widely used in computer vision for over a decade, but its annotations have been found to be inconsistent and of lower quality, particularly in terms of segmentation masks and semantic labels. COCONut was developed to provide high-quality, human-verified annotations for 383,000 images with over 5.18 million panoptic masks. It includes a validation set with 25,000 images and 437,000 masks, carefully re-labeled from the COCO validation set and additional images from Objects365.
COCONut was created through an assisted-manual annotation pipeline that combines machine-generated proposals with human editing. This pipeline includes four stages: machine-generated pseudo labels, human inspection and editing, mask generation or refinement, and quality verification. The pipeline was designed to ensure high-quality annotations while maintaining scalability. The dataset also includes a data engine that leverages the COCONut-S dataset to improve neural networks and generate larger training sets, such as COCONut-B and COCONut-L.
The COCONut dataset includes 133 semantic classes, with 80 categorized as 'thing' and 53 as 'stuff'. It provides a more comprehensive and consistent set of annotations compared to COCO, which has been found to have inconsistencies and biases in its annotations. COCONut's annotations are more accurate and consistent, with sharper boundaries and fewer errors compared to COCO. The dataset is also more diverse, including images from both COCO and Objects365, providing a broader range of classes and masks.
COCONut has been used to evaluate the performance of various segmentation models, including panoptic segmentation, instance segmentation, semantic segmentation, object detection, and open-vocabulary segmentation. The results show that COCONut provides a more challenging and accurate benchmark for these tasks compared to COCO. The dataset has also been used to train and evaluate models, with COCONut-B and COCONut-L providing larger and more diverse training sets. The results indicate that COCONut's high-quality annotations lead to better performance in segmentation tasks compared to pseudo-labels.