CNN Features off-the-shelf: an Astounding Baseline for Recognition

CNN Features off-the-shelf: an Astounding Baseline for Recognition

12 May 2014 | Ali Sharif Razavian Hossein Azizpour Josephine Sullivan Stefan Carlsson
This paper presents the results of using off-the-shelf features from the OverFeat convolutional neural network (CNN) for various visual recognition tasks. The OverFeat network was trained on the ImageNet ILSVRC 2013 dataset for object classification. The study evaluates the effectiveness of these features for tasks such as object classification, scene recognition, fine-grained recognition, attribute detection, and image retrieval. The features are extracted from a layer of the network and combined with a linear SVM classifier or L2 distance for retrieval. The results show that the OverFeat features consistently outperform state-of-the-art methods in most tasks, including image classification, scene recognition, and attribute detection. The features are also effective for instance retrieval, outperforming low-memory footprint methods on most datasets except for the sculptures dataset. The study highlights the power of deep learning features for visual recognition tasks and suggests that they should be the primary candidate for most visual recognition tasks. The experiments demonstrate that even without fine-tuning, the OverFeat features perform well on various tasks, and further optimization could potentially improve performance even more. The results confirm and extend previous findings, showing that CNN features are a strong baseline for visual recognition.This paper presents the results of using off-the-shelf features from the OverFeat convolutional neural network (CNN) for various visual recognition tasks. The OverFeat network was trained on the ImageNet ILSVRC 2013 dataset for object classification. The study evaluates the effectiveness of these features for tasks such as object classification, scene recognition, fine-grained recognition, attribute detection, and image retrieval. The features are extracted from a layer of the network and combined with a linear SVM classifier or L2 distance for retrieval. The results show that the OverFeat features consistently outperform state-of-the-art methods in most tasks, including image classification, scene recognition, and attribute detection. The features are also effective for instance retrieval, outperforming low-memory footprint methods on most datasets except for the sculptures dataset. The study highlights the power of deep learning features for visual recognition tasks and suggests that they should be the primary candidate for most visual recognition tasks. The experiments demonstrate that even without fine-tuning, the OverFeat features perform well on various tasks, and further optimization could potentially improve performance even more. The results confirm and extend previous findings, showing that CNN features are a strong baseline for visual recognition.
Reach us at info@study.space
[slides] CNN Features Off-the-Shelf%3A An Astounding Baseline for Recognition | StudySpace