25 Apr 2019 | Xingyi Zhou, Dequan Wang, Philipp Krähenbühl
This paper introduces CenterNet, a novel object detection method that represents objects as single points — the center of their bounding boxes. Instead of using axis-aligned bounding boxes, CenterNet uses keypoint estimation to find center points and regresses to other object properties such as size, 3D location, orientation, and pose. The method is end-to-end differentiable, simpler, faster, and more accurate than traditional bounding box-based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. The method is also applied to 3D bounding box estimation in the KITTI benchmark and human pose estimation on the COCO keypoint dataset, performing competitively with sophisticated multi-stage methods and running in real-time. The approach is general and can be extended to other tasks with minor modifications. The method uses a fully convolutional network to generate a heatmap, with peaks in the heatmap corresponding to object centers. Image features at each peak predict the object's bounding box dimensions. The model trains using standard dense supervised learning and inference is a single network forward-pass without non-maxima suppression. The paper also discusses related work, including anchor-based one-stage detectors and keypoint estimation for object detection. The method is compared to other state-of-the-art detectors on the COCO and KITTI datasets, showing superior performance in terms of speed and accuracy. The paper concludes that CenterNet is a simple, fast, and accurate method for object detection that can be applied to a wide range of tasks.This paper introduces CenterNet, a novel object detection method that represents objects as single points — the center of their bounding boxes. Instead of using axis-aligned bounding boxes, CenterNet uses keypoint estimation to find center points and regresses to other object properties such as size, 3D location, orientation, and pose. The method is end-to-end differentiable, simpler, faster, and more accurate than traditional bounding box-based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. The method is also applied to 3D bounding box estimation in the KITTI benchmark and human pose estimation on the COCO keypoint dataset, performing competitively with sophisticated multi-stage methods and running in real-time. The approach is general and can be extended to other tasks with minor modifications. The method uses a fully convolutional network to generate a heatmap, with peaks in the heatmap corresponding to object centers. Image features at each peak predict the object's bounding box dimensions. The model trains using standard dense supervised learning and inference is a single network forward-pass without non-maxima suppression. The paper also discusses related work, including anchor-based one-stage detectors and keypoint estimation for object detection. The method is compared to other state-of-the-art detectors on the COCO and KITTI datasets, showing superior performance in terms of speed and accuracy. The paper concludes that CenterNet is a simple, fast, and accurate method for object detection that can be applied to a wide range of tasks.