2017 | Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
This paper introduces the ADE20K dataset, which provides dense annotations for a wide range of scenes, objects, and object parts. The dataset includes 20,210 training images, 2,000 validation images, and 3,000 test images, with each image annotated in detail. The dataset includes 150 object and stuff classes, and annotations are provided for both objects and their parts. The dataset is annotated by a single expert, resulting in highly detailed and consistent annotations. The paper also presents a novel network design called the Cascade Segmentation Module, which enables neural networks to segment scenes into stuff, objects, and object parts in a cascaded manner. The module is shown to improve upon existing segmentation baselines. The paper evaluates the performance of the Cascade Segmentation Module on the ADE20K dataset and demonstrates its effectiveness in tasks such as image content removal and scene synthesis. The results show that the Cascade Segmentation Module significantly improves the accuracy of scene parsing, particularly for discrete objects. The paper also compares the ADE20K dataset with other existing datasets, showing that it provides a more comprehensive set of annotations. The paper concludes that the ADE20K dataset and the Cascade Segmentation Module represent significant advancements in the field of scene parsing.This paper introduces the ADE20K dataset, which provides dense annotations for a wide range of scenes, objects, and object parts. The dataset includes 20,210 training images, 2,000 validation images, and 3,000 test images, with each image annotated in detail. The dataset includes 150 object and stuff classes, and annotations are provided for both objects and their parts. The dataset is annotated by a single expert, resulting in highly detailed and consistent annotations. The paper also presents a novel network design called the Cascade Segmentation Module, which enables neural networks to segment scenes into stuff, objects, and object parts in a cascaded manner. The module is shown to improve upon existing segmentation baselines. The paper evaluates the performance of the Cascade Segmentation Module on the ADE20K dataset and demonstrates its effectiveness in tasks such as image content removal and scene synthesis. The results show that the Cascade Segmentation Module significantly improves the accuracy of scene parsing, particularly for discrete objects. The paper also compares the ADE20K dataset with other existing datasets, showing that it provides a more comprehensive set of annotations. The paper concludes that the ADE20K dataset and the Cascade Segmentation Module represent significant advancements in the field of scene parsing.