24 Jan 2018 | Kaiming He Georgia Gkioxari Piotr Dollár Ross Girshick
Mask R-CNN is a flexible and general framework for object instance segmentation, extending Faster R-CNN by adding a branch for predicting object masks. This approach efficiently detects objects and generates high-quality segmentation masks, achieving top results in the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Mask R-CNN is simple to train, runs at 5 fps, and can be easily generalized to other tasks like human pose estimation. The method introduces RoIAlign, a layer that preserves exact spatial locations, improving mask accuracy by 10% to 50%. Mask R-CNN outperforms all previous state-of-the-art single-model entries on the COCO instance segmentation task, including the 2016 challenge winners. The framework's flexibility and accuracy make it a solid baseline for future research in instance-level recognition.Mask R-CNN is a flexible and general framework for object instance segmentation, extending Faster R-CNN by adding a branch for predicting object masks. This approach efficiently detects objects and generates high-quality segmentation masks, achieving top results in the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Mask R-CNN is simple to train, runs at 5 fps, and can be easily generalized to other tasks like human pose estimation. The method introduces RoIAlign, a layer that preserves exact spatial locations, improving mask accuracy by 10% to 50%. Mask R-CNN outperforms all previous state-of-the-art single-model entries on the COCO instance segmentation task, including the 2016 challenge winners. The framework's flexibility and accuracy make it a solid baseline for future research in instance-level recognition.