24 Oct 2019 | Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
YOLACT is a real-time instance segmentation model that achieves 29.8 mAP on MS COCO at 33.5 fps using a single Titan Xp GPU. It is significantly faster than previous methods and is trained on only one GPU. The model breaks instance segmentation into two parallel tasks: generating prototype masks and predicting mask coefficients. These are then combined to produce instance masks. The approach does not require repooling, leading to high-quality masks and temporal stability. The prototypes learn to localize instances in a translation variant manner, and the model also introduces Fast NMS, a 12 ms faster alternative to standard NMS with minimal performance loss.
The model is based on a one-stage architecture, using ResNet-101 with FPN as the backbone. It produces prototype masks and mask coefficients in parallel, which are then combined to form instance masks. The model's design allows for efficient computation, with the mask branch taking only ~5 ms to evaluate. The model's masks are of higher quality than those of other methods due to the absence of repooling, and it achieves good performance on both small and large objects.
YOLACT's prototypes exhibit emergent behavior, such as spatially partitioning the image, localizing instances, detecting contours, and encoding position-sensitive directional maps. The model's design allows for a distributed representation, where each instance is segmented using a combination of prototypes shared across categories. This leads to efficient and accurate instance segmentation.
The model is also fast and general, as it can be integrated into almost any modern object detector. It achieves competitive results on the challenging MS COCO dataset, with a speed-performance trade-off that is favorable for real-time applications. The model's performance is validated on COCO and Pascal 2012 SBD datasets, with YOLACT outperforming other methods in terms of both speed and accuracy. The model's masks are of higher quality than those of other methods, and it produces temporally stable masks on videos. The model's design is efficient and effective, making it a strong candidate for real-time instance segmentation.YOLACT is a real-time instance segmentation model that achieves 29.8 mAP on MS COCO at 33.5 fps using a single Titan Xp GPU. It is significantly faster than previous methods and is trained on only one GPU. The model breaks instance segmentation into two parallel tasks: generating prototype masks and predicting mask coefficients. These are then combined to produce instance masks. The approach does not require repooling, leading to high-quality masks and temporal stability. The prototypes learn to localize instances in a translation variant manner, and the model also introduces Fast NMS, a 12 ms faster alternative to standard NMS with minimal performance loss.
The model is based on a one-stage architecture, using ResNet-101 with FPN as the backbone. It produces prototype masks and mask coefficients in parallel, which are then combined to form instance masks. The model's design allows for efficient computation, with the mask branch taking only ~5 ms to evaluate. The model's masks are of higher quality than those of other methods due to the absence of repooling, and it achieves good performance on both small and large objects.
YOLACT's prototypes exhibit emergent behavior, such as spatially partitioning the image, localizing instances, detecting contours, and encoding position-sensitive directional maps. The model's design allows for a distributed representation, where each instance is segmented using a combination of prototypes shared across categories. This leads to efficient and accurate instance segmentation.
The model is also fast and general, as it can be integrated into almost any modern object detector. It achieves competitive results on the challenging MS COCO dataset, with a speed-performance trade-off that is favorable for real-time applications. The model's performance is validated on COCO and Pascal 2012 SBD datasets, with YOLACT outperforming other methods in terms of both speed and accuracy. The model's masks are of higher quality than those of other methods, and it produces temporally stable masks on videos. The model's design is efficient and effective, making it a strong candidate for real-time instance segmentation.