25 Apr 2019 | Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl
This paper introduces a novel bottom-up object detection framework called ExtremeNet, which detects four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a standard keypoint estimation network. The five keypoints are then grouped into bounding boxes based on their geometric alignment. This approach transforms object detection into a purely appearance-based keypoint estimation problem, avoiding the limitations of top-down methods that often rely on rectangular bounding boxes. The proposed method achieves a bounding box AP of 43.7% on the COCO test-dev dataset, performing on-par with state-of-the-art region-based detection methods. Additionally, the estimated extreme points directly span a coarse octagonal mask, achieving a COCO Mask AP of 18.9%, significantly better than the Mask AP of vanilla bounding boxes. Further improvements are achieved through extreme point-guided segmentation, resulting in a Mask AP of 34.6%. The paper also discusses related work, including two-stage and one-stage object detectors, deformable part models, and implicit keypoint detection.This paper introduces a novel bottom-up object detection framework called ExtremeNet, which detects four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a standard keypoint estimation network. The five keypoints are then grouped into bounding boxes based on their geometric alignment. This approach transforms object detection into a purely appearance-based keypoint estimation problem, avoiding the limitations of top-down methods that often rely on rectangular bounding boxes. The proposed method achieves a bounding box AP of 43.7% on the COCO test-dev dataset, performing on-par with state-of-the-art region-based detection methods. Additionally, the estimated extreme points directly span a coarse octagonal mask, achieving a COCO Mask AP of 18.9%, significantly better than the Mask AP of vanilla bounding boxes. Further improvements are achieved through extreme point-guided segmentation, resulting in a Mask AP of 34.6%. The paper also discusses related work, including two-stage and one-stage object detectors, deformable part models, and implicit keypoint detection.