Beyond bags of features: spatial pyramid matching for recognizing natural scene categories

Date:2006-06
Author:Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Pages:9
Summary:This paper introduces a method for recognizing natural scene categories using spatial pyramid matching, which is an extension of the bag-of-features (BoF) representation. The method involves partitioning images into sub-regions and computing histograms of local features within each sub-region to form a "spatial pyramid." This approach significantly improves performance on challenging scene categorization tasks, outperforming state-of-the-art methods on the Caltech-101 database and achieving high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also provides insights into the effectiveness of other image descriptions, such as Torralba’s "gist" and Lowe’s SIFT descriptors. The paper discusses the advantages of the spatial pyramid method over traditional BoF representations, including its ability to capture global geometric correspondence and its robustness to clutter and viewpoint changes. Experiments on various datasets, including the scene category database, Caltech-101, and Graz, demonstrate the method's effectiveness in both global scene classification and object recognition tasks.