Understanding COCONut%3A Modernizing COCO Segmentation

**COCONut: Modernizing COCO Segmentation** **Authors:** Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen **Affiliation:** ByteDance **Link:** [COCONut GitHub](https://xdeng7.github.io/coconut.github.io/) **Abstract:** The COCO benchmark has significantly advanced visual recognition, particularly in object detection and segmentation. However, the COCO segmentation benchmark has seen slow improvement over the last decade. This study reevaluates the COCO segmentation annotations, enhancing their quality and expanding the dataset to include 383K images with over 5.18M panoptic masks. The resulting dataset, COCONut, harmonizes segmentation annotations across semantic, instance, and panoptic segmentation, providing a robust benchmark for all segmentation tasks. COCONut is the first large-scale universal segmentation dataset verified by human raters, aiming to improve the assessment of novel neural networks. **Key Contributions:** 1. **COCONut Dataset:** A modern, universal segmentation dataset with 383K images and 5.18M human-verified segmentation masks, significantly expanding the scale and quality of annotations. 2. **Error Analysis:** In-depth analysis of COCO annotations reveals various inconsistencies and ambiguities, leading to enhanced consistency and reduced label map ambiguity in COCONut. 3. **Experimental Results:** Demonstrates the efficacy of scaling up datasets with high-quality annotations and highlights the superior value of human annotations over pseudo-labels. **Related Work:** The study reviews existing segmentation datasets, focusing on daily images, and discusses the challenges and limitations of COCO's annotations. It also compares COCONut with other datasets like EntitySeg and SA-IB, emphasizing the importance of high-quality annotations and large-scale datasets. **Construction of COCONut:** - **Class Map Definition:** COCONut retains COCO's class map but refines it to offer clearer annotations. - **Image Sources and Data Splits:** Images are sourced from COCO and Objects365, with COCONut-S, COCONut-B, and COCONut-L representing different training sizes. - **Assisted-Manual Annotation Pipeline:** Utilizes neural networks to augment human annotators, improving annotation speed and quality. - **Data Engine:** Enhances neural networks using high-quality annotations, reducing the workload for human raters. **Analysis and Results:** - **Annotation Analysis:** Compares assisted-manual and purely-manual annotation pipelines, showing improved quality and speed. - **Data Engine Analysis:** Demonstrates the impact of high-quality training data on model performance. - **Dataset Statistics:** Highlights the significant improvements in category and mask distribution compared to COCO. - **Discussion:** Discusses the necessity of extensive human annotations and the limitations of pseudo-labels. **Conclusion:** COCONut addresses the limitations of COCO by providing a more**COCONut: Modernizing COCO Segmentation** **Authors:** Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen **Affiliation:** ByteDance **Link:** [COCONut GitHub](https://xdeng7.github.io/coconut.github.io/) **Abstract:** The COCO benchmark has significantly advanced visual recognition, particularly in object detection and segmentation. However, the COCO segmentation benchmark has seen slow improvement over the last decade. This study reevaluates the COCO segmentation annotations, enhancing their quality and expanding the dataset to include 383K images with over 5.18M panoptic masks. The resulting dataset, COCONut, harmonizes segmentation annotations across semantic, instance, and panoptic segmentation, providing a robust benchmark for all segmentation tasks. COCONut is the first large-scale universal segmentation dataset verified by human raters, aiming to improve the assessment of novel neural networks. **Key Contributions:** 1. **COCONut Dataset:** A modern, universal segmentation dataset with 383K images and 5.18M human-verified segmentation masks, significantly expanding the scale and quality of annotations. 2. **Error Analysis:** In-depth analysis of COCO annotations reveals various inconsistencies and ambiguities, leading to enhanced consistency and reduced label map ambiguity in COCONut. 3. **Experimental Results:** Demonstrates the efficacy of scaling up datasets with high-quality annotations and highlights the superior value of human annotations over pseudo-labels. **Related Work:** The study reviews existing segmentation datasets, focusing on daily images, and discusses the challenges and limitations of COCO's annotations. It also compares COCONut with other datasets like EntitySeg and SA-IB, emphasizing the importance of high-quality annotations and large-scale datasets. **Construction of COCONut:** - **Class Map Definition:** COCONut retains COCO's class map but refines it to offer clearer annotations. - **Image Sources and Data Splits:** Images are sourced from COCO and Objects365, with COCONut-S, COCONut-B, and COCONut-L representing different training sizes. - **Assisted-Manual Annotation Pipeline:** Utilizes neural networks to augment human annotators, improving annotation speed and quality. - **Data Engine:** Enhances neural networks using high-quality annotations, reducing the workload for human raters. **Analysis and Results:** - **Annotation Analysis:** Compares assisted-manual and purely-manual annotation pipelines, showing improved quality and speed. - **Data Engine Analysis:** Demonstrates the impact of high-quality training data on model performance. - **Dataset Statistics:** Highlights the significant improvements in category and mask distribution compared to COCO. - **Discussion:** Discusses the necessity of extensive human annotations and the limitations of pseudo-labels. **Conclusion:** COCONut addresses the limitations of COCO by providing a more

COCONut: Modernizing COCO Segmentation

12 Apr 2024 | Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen