25 Apr 2019 | Xingyi Zhou, Dequan Wang, Philipp Krähenbühl
The paper introduces CenterNet, a novel approach to object detection that models objects as single points, specifically the center points of their bounding boxes. This method simplifies the detection process by using keypoint estimation to find these center points and then regressing other object properties such as size, 3D location, orientation, and pose. CenterNet is end-to-end differentiable, simpler, faster, and more accurate compared to traditional bounding box-based detectors. It achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. The method is also applied to 3D object detection and human pose estimation, performing competitively with sophisticated multi-stage methods while maintaining real-time inference speed. The paper discusses the advantages of CenterNet over anchor-based detectors and provides detailed implementation details, including architectural choices and hyperparameter settings.The paper introduces CenterNet, a novel approach to object detection that models objects as single points, specifically the center points of their bounding boxes. This method simplifies the detection process by using keypoint estimation to find these center points and then regressing other object properties such as size, 3D location, orientation, and pose. CenterNet is end-to-end differentiable, simpler, faster, and more accurate compared to traditional bounding box-based detectors. It achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. The method is also applied to 3D object detection and human pose estimation, performing competitively with sophisticated multi-stage methods while maintaining real-time inference speed. The paper discusses the advantages of CenterNet over anchor-based detectors and provides detailed implementation details, including architectural choices and hyperparameter settings.