2007-03-01 | Carneiro, Gustavo; Chan, Antoni B; Moreno, Pedro J; et al.
This paper proposes a supervised learning approach for semantic image annotation and retrieval. The method formulates the problem as a multiclass classification task, where each semantic concept is treated as a class. Images are represented as bags of localized feature vectors, and a mixture density is estimated for each image. These mixtures are pooled into a density estimate for the corresponding semantic class if the images are annotated with the same semantic label. This approach is justified by a multiple instance learning argument and implemented efficiently using a hierarchical extension of expectation-maximization. The supervised formulation is shown to achieve higher accuracy than previously published methods at a fraction of their computational cost and is robust to parameter tuning. The method is evaluated on large-scale databases and compared to state-of-the-art semantic image labeling and retrieval methods, demonstrating significant improvements in both accuracy and efficiency. The proposed method is also shown to be robust to weakly labeled data and does not require prior semantic segmentation of training images. The paper presents a detailed description of the training, annotation, and retrieval algorithms, along with an experimental evaluation of their performance. The results show that the supervised multiclass labeling approach outperforms existing methods in terms of annotation and retrieval accuracy, as well as computational efficiency. The method is also shown to be robust to parameter tuning and provides a common framework for comparing various semantic annotation and retrieval methods.This paper proposes a supervised learning approach for semantic image annotation and retrieval. The method formulates the problem as a multiclass classification task, where each semantic concept is treated as a class. Images are represented as bags of localized feature vectors, and a mixture density is estimated for each image. These mixtures are pooled into a density estimate for the corresponding semantic class if the images are annotated with the same semantic label. This approach is justified by a multiple instance learning argument and implemented efficiently using a hierarchical extension of expectation-maximization. The supervised formulation is shown to achieve higher accuracy than previously published methods at a fraction of their computational cost and is robust to parameter tuning. The method is evaluated on large-scale databases and compared to state-of-the-art semantic image labeling and retrieval methods, demonstrating significant improvements in both accuracy and efficiency. The proposed method is also shown to be robust to weakly labeled data and does not require prior semantic segmentation of training images. The paper presents a detailed description of the training, annotation, and retrieval algorithms, along with an experimental evaluation of their performance. The results show that the supervised multiclass labeling approach outperforms existing methods in terms of annotation and retrieval accuracy, as well as computational efficiency. The method is also shown to be robust to parameter tuning and provides a common framework for comparing various semantic annotation and retrieval methods.