This paper addresses the issue of bounding box regression in object detection, a crucial step in the detection process. Traditional methods often use $\ell_n$-norm loss, which is not optimized for the Intersection over Union (IoU) metric. To improve this, the authors propose two new losses: Distance-IoU (DIOU) and Complete IoU (CIoU). DIOU incorporates the normalized distance between the predicted and target boxes, leading to faster convergence compared to IoU and Generalized IoU (GIOU) losses. CIoU further enhances DIOU by considering three geometric factors—overlap area, central point distance, and aspect ratio—thereby improving both convergence speed and regression accuracy.
The proposed losses are evaluated on popular datasets such as PASCAL VOC and MS COCO, integrated into state-of-the-art object detection algorithms like YOLO v3, SSD, and Faster R-CNN. The results show significant performance improvements in terms of both IoU and GIOU metrics. Additionally, DIOU can be used in non-maximum suppression (NMS) to enhance the robustness of redundant box suppression, particularly in occluded scenes.
The paper also includes a detailed analysis of the limitations of IoU and GIOU losses, demonstrating that they suffer from slow convergence and inaccurate regression, especially for non-overlapping boxes. The proposed losses address these issues by directly minimizing the normalized distance between central points, which leads to faster and more accurate regression. The effectiveness of the proposed methods is validated through extensive experiments and comparisons with existing techniques.This paper addresses the issue of bounding box regression in object detection, a crucial step in the detection process. Traditional methods often use $\ell_n$-norm loss, which is not optimized for the Intersection over Union (IoU) metric. To improve this, the authors propose two new losses: Distance-IoU (DIOU) and Complete IoU (CIoU). DIOU incorporates the normalized distance between the predicted and target boxes, leading to faster convergence compared to IoU and Generalized IoU (GIOU) losses. CIoU further enhances DIOU by considering three geometric factors—overlap area, central point distance, and aspect ratio—thereby improving both convergence speed and regression accuracy.
The proposed losses are evaluated on popular datasets such as PASCAL VOC and MS COCO, integrated into state-of-the-art object detection algorithms like YOLO v3, SSD, and Faster R-CNN. The results show significant performance improvements in terms of both IoU and GIOU metrics. Additionally, DIOU can be used in non-maximum suppression (NMS) to enhance the robustness of redundant box suppression, particularly in occluded scenes.
The paper also includes a detailed analysis of the limitations of IoU and GIOU losses, demonstrating that they suffer from slow convergence and inaccurate regression, especially for non-overlapping boxes. The proposed losses address these issues by directly minimizing the normalized distance between central points, which leads to faster and more accurate regression. The effectiveness of the proposed methods is validated through extensive experiments and comparisons with existing techniques.