2017 | Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
The paper introduces the ADE20K dataset, a comprehensive resource for scene parsing, which includes densely annotated images with detailed segmentation of scenes, objects, and object parts. The dataset aims to address the limitations of existing datasets like COCO and Pascal by providing a wider range of scenes and object categories. The authors describe the dataset's collection process, annotation details, and statistical analysis, highlighting its richness and diversity.
To improve scene parsing, the paper proposes a novel network design called the Cascade Segmentation Module. This module is designed to segment stuff, objects, and object parts in a cascaded manner, enhancing the performance of existing semantic segmentation models. The module is evaluated on the ADE20K benchmark, showing significant improvements over baseline models in terms of pixel accuracy and mean IoU.
The paper also explores the applications of scene parsing networks, including image content removal and scene synthesis. The results demonstrate the effectiveness of the proposed network design and its potential for practical computer vision tasks.The paper introduces the ADE20K dataset, a comprehensive resource for scene parsing, which includes densely annotated images with detailed segmentation of scenes, objects, and object parts. The dataset aims to address the limitations of existing datasets like COCO and Pascal by providing a wider range of scenes and object categories. The authors describe the dataset's collection process, annotation details, and statistical analysis, highlighting its richness and diversity.
To improve scene parsing, the paper proposes a novel network design called the Cascade Segmentation Module. This module is designed to segment stuff, objects, and object parts in a cascaded manner, enhancing the performance of existing semantic segmentation models. The module is evaluated on the ADE20K benchmark, showing significant improvements over baseline models in terms of pixel accuracy and mean IoU.
The paper also explores the applications of scene parsing networks, including image content removal and scene synthesis. The results demonstrate the effectiveness of the proposed network design and its potential for practical computer vision tasks.