The paper "Cascade R-CNN: Delving into High Quality Object Detection" by Zhaowei Cai and Nuno Vasconcelos from UC San Diego introduces a multi-stage object detection architecture called Cascade R-CNN. The main challenge addressed by this architecture is the trade-off between training quality and inference performance in object detection, where a detector trained with a low IoU threshold (e.g., 0.5) often produces noisy detections, while increasing the IoU threshold degrades detection performance due to overfitting and mismatch between training and inference hypotheses.
Cascade R-CNN addresses these issues by training a sequence of detectors with successively higher IoU thresholds. Each detector is trained to be more selective against close false positives, and the output of one detector is used to train the next higher-quality detector. This process ensures that all detectors have a balanced set of positive examples, reducing overfitting. The same cascade procedure is applied during inference, ensuring a better match between hypotheses and detector quality.
The authors demonstrate that a simple implementation of Cascade R-CNN outperforms all single-model object detectors on the COCO dataset, achieving significant gains across various evaluation metrics. They also show that the architecture is widely applicable across different detector architectures, achieving consistent improvements regardless of the baseline detector's strength. The code for Cascade R-CNN is available at https://github.com/zhaoweicai/cascade-rcnn.The paper "Cascade R-CNN: Delving into High Quality Object Detection" by Zhaowei Cai and Nuno Vasconcelos from UC San Diego introduces a multi-stage object detection architecture called Cascade R-CNN. The main challenge addressed by this architecture is the trade-off between training quality and inference performance in object detection, where a detector trained with a low IoU threshold (e.g., 0.5) often produces noisy detections, while increasing the IoU threshold degrades detection performance due to overfitting and mismatch between training and inference hypotheses.
Cascade R-CNN addresses these issues by training a sequence of detectors with successively higher IoU thresholds. Each detector is trained to be more selective against close false positives, and the output of one detector is used to train the next higher-quality detector. This process ensures that all detectors have a balanced set of positive examples, reducing overfitting. The same cascade procedure is applied during inference, ensuring a better match between hypotheses and detector quality.
The authors demonstrate that a simple implementation of Cascade R-CNN outperforms all single-model object detectors on the COCO dataset, achieving significant gains across various evaluation metrics. They also show that the architecture is widely applicable across different detector architectures, achieving consistent improvements regardless of the baseline detector's strength. The code for Cascade R-CNN is available at https://github.com/zhaoweicai/cascade-rcnn.