Bottom-up Object Detection by Grouping Extreme and Center Points

Bottom-up Object Detection by Grouping Extreme and Center Points

25 Apr 2019 | Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl
This paper introduces ExtremeNet, a bottom-up object detection framework that detects four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a keypoint estimation network. The framework groups these five points into a bounding box if they are geometrically aligned, transforming object detection into a purely appearance-based keypoint estimation problem. The method achieves a bounding box AP of 43.7% on COCO test-dev, outperforming all reported one-stage detectors and matching sophisticated two-stage detectors. Additionally, the estimated extreme points directly span a coarse octagonal mask, achieving a COCO Mask AP of 18.9%, which is significantly better than the Mask AP of vanilla bounding boxes. Extreme point guided segmentation further improves this to 34.6% Mask AP. The method uses a state-of-the-art keypoint estimation framework to find extreme points by predicting four multi-peak heatmaps for each object category. It also predicts a center heatmap as the average of two bounding box edges. Extreme points are grouped into objects using a purely geometric approach, where the geometric center of the four extreme points must be predicted in the center heatmap with a score above a threshold. The method is efficient, with a runtime of O(n^4), where n is the number of extreme points per cardinal direction. A faster O(n^2) algorithm is also presented, though it is less effective on GPU for the COCO dataset. To address false positives, the method includes a ghost box suppression step that removes high-confidence false detections. Edge aggregation is used to enhance the confidence of extreme points on aligned edges. The method also incorporates Deep Extreme Cut (DEXTR) for instance segmentation, achieving a Mask AP of 34.6% on COCO val2017. Experiments show that ExtremeNet outperforms other state-of-the-art methods on COCO test-dev, achieving a 1.6% higher AP than CornerNet. The method also performs well in instance segmentation, achieving a Mask AP of 34.6% when combined with DEXTR. The results demonstrate that ExtremeNet is competitive with state-of-the-art methods in both object detection and instance segmentation.This paper introduces ExtremeNet, a bottom-up object detection framework that detects four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a keypoint estimation network. The framework groups these five points into a bounding box if they are geometrically aligned, transforming object detection into a purely appearance-based keypoint estimation problem. The method achieves a bounding box AP of 43.7% on COCO test-dev, outperforming all reported one-stage detectors and matching sophisticated two-stage detectors. Additionally, the estimated extreme points directly span a coarse octagonal mask, achieving a COCO Mask AP of 18.9%, which is significantly better than the Mask AP of vanilla bounding boxes. Extreme point guided segmentation further improves this to 34.6% Mask AP. The method uses a state-of-the-art keypoint estimation framework to find extreme points by predicting four multi-peak heatmaps for each object category. It also predicts a center heatmap as the average of two bounding box edges. Extreme points are grouped into objects using a purely geometric approach, where the geometric center of the four extreme points must be predicted in the center heatmap with a score above a threshold. The method is efficient, with a runtime of O(n^4), where n is the number of extreme points per cardinal direction. A faster O(n^2) algorithm is also presented, though it is less effective on GPU for the COCO dataset. To address false positives, the method includes a ghost box suppression step that removes high-confidence false detections. Edge aggregation is used to enhance the confidence of extreme points on aligned edges. The method also incorporates Deep Extreme Cut (DEXTR) for instance segmentation, achieving a Mask AP of 34.6% on COCO val2017. Experiments show that ExtremeNet outperforms other state-of-the-art methods on COCO test-dev, achieving a 1.6% higher AP than CornerNet. The method also performs well in instance segmentation, achieving a Mask AP of 34.6% when combined with DEXTR. The results demonstrate that ExtremeNet is competitive with state-of-the-art methods in both object detection and instance segmentation.
Reach us at info@study.space
[slides and audio] Bottom-Up Object Detection by Grouping Extreme and Center Points