Cascade R-CNN is a multi-stage object detection architecture that improves detection quality by training detectors with increasing IoU thresholds. The detectors are trained sequentially, using the output of one as training data for the next, which progressively improves hypothesis quality and reduces overfitting. This approach also addresses the quality mismatch between detectors and test hypotheses at inference. Cascade R-CNN achieves state-of-the-art performance on the COCO dataset and significantly improves high-quality detection on other datasets like VOC, KITTI, CityPersons, and WiderFace. It is also generalized to instance segmentation, showing non-trivial improvements over Mask R-CNN. The architecture is implemented in Caffe and Detectron, and has been adopted by winning teams in several challenges. The paper also discusses the challenges of high-quality detection, including the paradox of high-quality detection, and compares Cascade R-CNN with previous methods like iterative bounding box regression and integral loss. Experimental results show that Cascade R-CNN outperforms these methods in detection accuracy and instance segmentation. The architecture is effective for both object detection and instance segmentation, and is implemented with a simple end-to-end training approach. The paper also presents results on various datasets, showing that Cascade R-CNN achieves significant improvements in detection performance.Cascade R-CNN is a multi-stage object detection architecture that improves detection quality by training detectors with increasing IoU thresholds. The detectors are trained sequentially, using the output of one as training data for the next, which progressively improves hypothesis quality and reduces overfitting. This approach also addresses the quality mismatch between detectors and test hypotheses at inference. Cascade R-CNN achieves state-of-the-art performance on the COCO dataset and significantly improves high-quality detection on other datasets like VOC, KITTI, CityPersons, and WiderFace. It is also generalized to instance segmentation, showing non-trivial improvements over Mask R-CNN. The architecture is implemented in Caffe and Detectron, and has been adopted by winning teams in several challenges. The paper also discusses the challenges of high-quality detection, including the paradox of high-quality detection, and compares Cascade R-CNN with previous methods like iterative bounding box regression and integral loss. Experimental results show that Cascade R-CNN outperforms these methods in detection accuracy and instance segmentation. The architecture is effective for both object detection and instance segmentation, and is implemented with a simple end-to-end training approach. The paper also presents results on various datasets, showing that Cascade R-CNN achieves significant improvements in detection performance.