Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

2010 | Dominik Scherer, Andreas Müller*, and Sven Behnke
This paper evaluates different pooling operations in convolutional architectures for object recognition. The authors compare various aggregation functions to determine which is more effective for vision tasks. They find that maximum pooling significantly outperforms subsampling operations. Overlapping pooling windows do not provide a significant improvement over non-overlapping ones. Using this knowledge, they achieve state-of-the-art error rates on two NORB datasets. The paper discusses how object recognition models are inspired by the mammalian visual cortex. The visual area V1 consists of simple and complex cells. Simple cells extract features, while complex cells combine local features from a small spatial neighborhood. Spatial pooling is crucial for translation-invariant features. Supervised models based on these findings include the Neocognitron and Convolutional Neural Networks (CNNs). Many state-of-the-art feature extractors use similar aggregation techniques, including HOG, SIFT, Gist, and HMAX. The paper compares different aggregation functions, finding that maximum pooling is more effective than subsampling. It also investigates if signal processing concepts like overlapping receptive fields can improve recognition performance. The results show that maximum pooling achieves better performance on two NORB datasets. The study highlights the importance of choosing the right aggregation function for vision tasks.This paper evaluates different pooling operations in convolutional architectures for object recognition. The authors compare various aggregation functions to determine which is more effective for vision tasks. They find that maximum pooling significantly outperforms subsampling operations. Overlapping pooling windows do not provide a significant improvement over non-overlapping ones. Using this knowledge, they achieve state-of-the-art error rates on two NORB datasets. The paper discusses how object recognition models are inspired by the mammalian visual cortex. The visual area V1 consists of simple and complex cells. Simple cells extract features, while complex cells combine local features from a small spatial neighborhood. Spatial pooling is crucial for translation-invariant features. Supervised models based on these findings include the Neocognitron and Convolutional Neural Networks (CNNs). Many state-of-the-art feature extractors use similar aggregation techniques, including HOG, SIFT, Gist, and HMAX. The paper compares different aggregation functions, finding that maximum pooling is more effective than subsampling. It also investigates if signal processing concepts like overlapping receptive fields can improve recognition performance. The results show that maximum pooling achieves better performance on two NORB datasets. The study highlights the importance of choosing the right aggregation function for vision tasks.
Reach us at info@futurestudyspace.com