Understanding Sparse R-CNN%3A End-to-End Object Detection with Learnable Proposals

Sparse R-CNN is a novel approach to object detection that eliminates the need for dense object candidates, such as anchor boxes or reference points, by using a fixed set of learnable object proposals. This method directly provides a small set of learned object proposals to the object recognition head for classification and location tasks, avoiding the complexity and inefficiency associated with dense prior methods. The key components of Sparse R-CNN include a backbone network (e.g., ResNet-50 FPN), a dynamic instance interactive head, and a set prediction loss mechanism. The dynamic head uses proposal features to interact with RoI features, generating final predictions without non-maximum suppression. Sparse R-CNN demonstrates competitive performance on the COCO dataset, achieving 45.0 AP with a standard 3× training schedule and running at 22 fps using ResNet-50 FPN. The method also shows faster training convergence compared to dense-to-sparse methods like DETR. The paper discusses the advantages of sparse methods, including reduced redundancy and improved efficiency, and provides detailed experimental results to support its effectiveness.Sparse R-CNN is a novel approach to object detection that eliminates the need for dense object candidates, such as anchor boxes or reference points, by using a fixed set of learnable object proposals. This method directly provides a small set of learned object proposals to the object recognition head for classification and location tasks, avoiding the complexity and inefficiency associated with dense prior methods. The key components of Sparse R-CNN include a backbone network (e.g., ResNet-50 FPN), a dynamic instance interactive head, and a set prediction loss mechanism. The dynamic head uses proposal features to interact with RoI features, generating final predictions without non-maximum suppression. Sparse R-CNN demonstrates competitive performance on the COCO dataset, achieving 45.0 AP with a standard 3× training schedule and running at 22 fps using ResNet-50 FPN. The method also shows faster training convergence compared to dense-to-sparse methods like DETR. The paper discusses the advantages of sparse methods, including reduced redundancy and improved efficiency, and provides detailed experimental results to support its effectiveness.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

26 Apr 2021 | Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo