23 Apr 2015 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
The paper introduces Spatial Pyramid Pooling (SPP) as a novel pooling strategy for deep convolutional neural networks (CNNs), addressing the issue of fixed-size input images. SPP allows the network to generate fixed-length representations from images of arbitrary sizes and scales, improving recognition accuracy and robustness to object deformations. The authors propose SPP-net, which integrates SPP into the last convolutional layer of a CNN, enabling it to handle images of various sizes during training and testing. Experiments on the ImageNet 2012 dataset demonstrate that SPP-net enhances the performance of different CNN architectures. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation without fine-tuning. In object detection, SPP-net significantly reduces the computational cost compared to the R-CNN method, achieving better or comparable accuracy. The method also ranks second in object detection and third in image classification in the ILSVRC 2014 competition.The paper introduces Spatial Pyramid Pooling (SPP) as a novel pooling strategy for deep convolutional neural networks (CNNs), addressing the issue of fixed-size input images. SPP allows the network to generate fixed-length representations from images of arbitrary sizes and scales, improving recognition accuracy and robustness to object deformations. The authors propose SPP-net, which integrates SPP into the last convolutional layer of a CNN, enabling it to handle images of various sizes during training and testing. Experiments on the ImageNet 2012 dataset demonstrate that SPP-net enhances the performance of different CNN architectures. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation without fine-tuning. In object detection, SPP-net significantly reduces the computational cost compared to the R-CNN method, achieving better or comparable accuracy. The method also ranks second in object detection and third in image classification in the ILSVRC 2014 competition.