26 Apr 2021 | Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo
Sparse R-CNN is a novel object detection method that replaces traditional dense object candidates with a small set of learnable proposals. Unlike existing methods that rely on dense anchor boxes or reference points, Sparse R-CNN directly provides a sparse set of learned object proposals, significantly reducing computational overhead and improving efficiency. The method eliminates the need for dense candidate generation, label assignment, and non-maximum suppression, leading to faster inference and better training convergence. Sparse R-CNN achieves competitive performance on the COCO dataset, achieving 45.0 AP with a standard 3× training schedule and running at 22 fps using a ResNet-50 FPN model. The method is purely sparse, with both proposal boxes and features being learnable and optimized together. It introduces a dynamic instance interactive head that allows each proposal to interact with its corresponding feature, enhancing accuracy and flexibility. The framework is efficient, with a simple structure that supports sparse-in-sparse-out processing. Experiments show that Sparse R-CNN outperforms existing detectors in terms of accuracy, speed, and training efficiency. The method also demonstrates strong performance on crowded scenes and is applicable to various detection scenarios. Sparse R-CNN is a significant advancement in object detection, offering a new approach to the dense prior convention and paving the way for future research in the field.Sparse R-CNN is a novel object detection method that replaces traditional dense object candidates with a small set of learnable proposals. Unlike existing methods that rely on dense anchor boxes or reference points, Sparse R-CNN directly provides a sparse set of learned object proposals, significantly reducing computational overhead and improving efficiency. The method eliminates the need for dense candidate generation, label assignment, and non-maximum suppression, leading to faster inference and better training convergence. Sparse R-CNN achieves competitive performance on the COCO dataset, achieving 45.0 AP with a standard 3× training schedule and running at 22 fps using a ResNet-50 FPN model. The method is purely sparse, with both proposal boxes and features being learnable and optimized together. It introduces a dynamic instance interactive head that allows each proposal to interact with its corresponding feature, enhancing accuracy and flexibility. The framework is efficient, with a simple structure that supports sparse-in-sparse-out processing. Experiments show that Sparse R-CNN outperforms existing detectors in terms of accuracy, speed, and training efficiency. The method also demonstrates strong performance on crowded scenes and is applicable to various detection scenarios. Sparse R-CNN is a significant advancement in object detection, offering a new approach to the dense prior convention and paving the way for future research in the field.