4 Apr 2019 | Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin
Libra R-CNN: Towards Balanced Learning for Object Detection
This paper addresses the imbalance issues in the training process of object detectors, which can limit the performance of well-designed model architectures. The authors identify three levels of imbalance: sample level, feature level, and objective level. To mitigate these issues, they propose Libra R-CNN, a framework that integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss.
1. **IoU-balanced Sampling**: This method mines hard samples based on their intersection-over-union (IoU) with ground-truth boxes, ensuring that hard negatives are more likely to be selected.
2. **Balanced Feature Pyramid**: This component integrates multi-level features using a balanced semantic feature pyramid, ensuring that each resolution in the pyramid receives equal information from other resolutions.
3. **Balanced L1 Loss**: This loss function promotes crucial gradients by balancing the classification and localization tasks, ensuring that the training process is more balanced.
Libra R-CNN achieves significant improvements over state-of-the-art detectors on the MS COCO dataset, with 2.5 points higher Average Precision (AP) compared to FPN Faster R-CNN and 2.0 points higher AP compared to RetinaNet. The framework is also effective for both two-stage and single-stage detectors, demonstrating its versatility and robustness.Libra R-CNN: Towards Balanced Learning for Object Detection
This paper addresses the imbalance issues in the training process of object detectors, which can limit the performance of well-designed model architectures. The authors identify three levels of imbalance: sample level, feature level, and objective level. To mitigate these issues, they propose Libra R-CNN, a framework that integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss.
1. **IoU-balanced Sampling**: This method mines hard samples based on their intersection-over-union (IoU) with ground-truth boxes, ensuring that hard negatives are more likely to be selected.
2. **Balanced Feature Pyramid**: This component integrates multi-level features using a balanced semantic feature pyramid, ensuring that each resolution in the pyramid receives equal information from other resolutions.
3. **Balanced L1 Loss**: This loss function promotes crucial gradients by balancing the classification and localization tasks, ensuring that the training process is more balanced.
Libra R-CNN achieves significant improvements over state-of-the-art detectors on the MS COCO dataset, with 2.5 points higher Average Precision (AP) compared to FPN Faster R-CNN and 2.0 points higher AP compared to RetinaNet. The framework is also effective for both two-stage and single-stage detectors, demonstrating its versatility and robustness.