26 Jul 2016 | Pedro O. Pinheiro*, Tsung-Yi Lin*, Ronan Collobert, Piotr Dollár
This paper introduces a novel approach for object instance segmentation by augmenting feedforward networks with a top-down refinement module. The proposed method, called SharpMask, improves upon the DeepMask network by refining initial object masks generated in a bottom-up pass. The refinement process involves successively integrating information from earlier layers to generate high-fidelity object masks. The approach is simple, fast, and effective, achieving a 50% speed improvement over DeepMask while improving average recall by 10-20% on various setups.
The key idea is to first generate a coarse 'mask encoding' in a feedforward pass, then refine this mask encoding in a top-down pass using features from lower layers. This allows the model to effectively combine spatially rich information from low-level features with high-level semantic information from upper layers. The refinement module is efficient and fully backpropable, enabling the model to produce sharper, more accurate object masks.
The paper also explores optimizations to the network architecture, including reducing the number of channels and improving inference speed. These optimizations lead to a more efficient model that achieves state-of-the-art performance on object proposal generation and object detection tasks. SharpMask outperforms previous methods in terms of both speed and accuracy, particularly for small objects.
The proposed method is evaluated on the COCO dataset, where it achieves high accuracy in object segmentation and detection. The results show that SharpMask significantly improves the quality of object masks and enhances object detection performance when combined with the Fast R-CNN detector. The paper concludes that the proposed approach is general and can be applied to other pixel-labeling tasks.This paper introduces a novel approach for object instance segmentation by augmenting feedforward networks with a top-down refinement module. The proposed method, called SharpMask, improves upon the DeepMask network by refining initial object masks generated in a bottom-up pass. The refinement process involves successively integrating information from earlier layers to generate high-fidelity object masks. The approach is simple, fast, and effective, achieving a 50% speed improvement over DeepMask while improving average recall by 10-20% on various setups.
The key idea is to first generate a coarse 'mask encoding' in a feedforward pass, then refine this mask encoding in a top-down pass using features from lower layers. This allows the model to effectively combine spatially rich information from low-level features with high-level semantic information from upper layers. The refinement module is efficient and fully backpropable, enabling the model to produce sharper, more accurate object masks.
The paper also explores optimizations to the network architecture, including reducing the number of channels and improving inference speed. These optimizations lead to a more efficient model that achieves state-of-the-art performance on object proposal generation and object detection tasks. SharpMask outperforms previous methods in terms of both speed and accuracy, particularly for small objects.
The proposed method is evaluated on the COCO dataset, where it achieves high accuracy in object segmentation and detection. The results show that SharpMask significantly improves the quality of object masks and enhances object detection performance when combined with the Fast R-CNN detector. The paper concludes that the proposed approach is general and can be applied to other pixel-labeling tasks.