2005 | R. Fergus, L. Fei-Fei, P. Perona, A. Zisserman
The paper presents a novel approach to object category recognition that leverages raw output from image search engines, specifically Google's Image Search, to learn visual models without the need for manually curated training datasets. The authors propose a new model called TSL-pLSA, which extends the probabilistic Latent Semantic Analysis (pLSA) to incorporate spatial information in a translation and scale-invariant manner. This approach addresses the high intra-class variability and the large proportion of unrelated images often returned by search engines.
The paper reviews the problem of object category recognition, highlighting the challenges of obtaining large and diverse training sets. It introduces the concept of using Google's image search as a source of training data, acknowledging the presence of visually unrelated images and the need for robust models to handle such noise. The authors extend pLSA to include spatial information, developing two models: ABS-pLSA, which uses absolute position, and TSI-pLSA, which is translation and scale invariant.
The implementation details include image preprocessing, region detection, and the use of SIFT descriptors for feature extraction. The experiments are conducted on various datasets, including Caltech and PASCAL datasets, and evaluate the performance of the proposed models in classification and localization tasks. The results show that the TSI-pLSA model outperforms other methods in many cases, particularly in handling pose variability and multiple object instances.
The paper also discusses the selection of the optimal number of topics and the improvement of Google's image search through re-ranking using the learned models. The authors conclude by highlighting the potential of their approach, noting the need for further research in feature selection, centroid proposal, and the use of more sophisticated LDA models.The paper presents a novel approach to object category recognition that leverages raw output from image search engines, specifically Google's Image Search, to learn visual models without the need for manually curated training datasets. The authors propose a new model called TSL-pLSA, which extends the probabilistic Latent Semantic Analysis (pLSA) to incorporate spatial information in a translation and scale-invariant manner. This approach addresses the high intra-class variability and the large proportion of unrelated images often returned by search engines.
The paper reviews the problem of object category recognition, highlighting the challenges of obtaining large and diverse training sets. It introduces the concept of using Google's image search as a source of training data, acknowledging the presence of visually unrelated images and the need for robust models to handle such noise. The authors extend pLSA to include spatial information, developing two models: ABS-pLSA, which uses absolute position, and TSI-pLSA, which is translation and scale invariant.
The implementation details include image preprocessing, region detection, and the use of SIFT descriptors for feature extraction. The experiments are conducted on various datasets, including Caltech and PASCAL datasets, and evaluate the performance of the proposed models in classification and localization tasks. The results show that the TSI-pLSA model outperforms other methods in many cases, particularly in handling pose variability and multiple object instances.
The paper also discusses the selection of the optimal number of topics and the improvement of Google's image search through re-ranking using the learned models. The authors conclude by highlighting the potential of their approach, noting the need for further research in feature selection, centroid proposal, and the use of more sophisticated LDA models.