Understanding Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

This paper presents a statistical modeling approach for automatic linguistic indexing of pictures. The method involves training a dictionary of hundreds of statistical models, each representing a concept, using categorized images. Images of a given concept are treated as instances of a stochastic process that characterizes the concept. The likelihood of an image being generated by this process is used to measure its association with the concept's textual description. The system is implemented using two-dimensional multiresolution hidden Markov models (2-D MHMMs) and tested on a database of 600 concepts, each with about 40 training images. The system is evaluated using over 4,600 images outside the training set and compared with a random annotation scheme. The results show that the system achieves good accuracy and has high potential for linguistic indexing of photographic images. The system's architecture includes feature extraction, multiresolution statistical modeling, and statistical linguistic indexing. The feature extraction process uses wavelets to extract color and texture features. The statistical modeling process builds a model for each concept based on its training images. The linguistic indexing process uses the likelihood of the image under the model to select annotation words. The system's major advantages include the ability to train and retrain models for different concepts independently, the ability to store a large number of concepts, and the consideration of spatial relations among image pixels. The system's performance is evaluated on a controlled subset of the COREL database, showing that it achieves a classification accuracy of 11.88% and a coverage percentage of 21.63% for annotation words. The system is also compared with a random annotation scheme, showing that it achieves a higher accuracy and coverage percentage. The system's limitations include the use of 2-D images without a sense of object size and the potential biases in the COREL database. Future work includes improving the system's indexing speed and processing automatically annotated words to eliminate conflicting semantics.This paper presents a statistical modeling approach for automatic linguistic indexing of pictures. The method involves training a dictionary of hundreds of statistical models, each representing a concept, using categorized images. Images of a given concept are treated as instances of a stochastic process that characterizes the concept. The likelihood of an image being generated by this process is used to measure its association with the concept's textual description. The system is implemented using two-dimensional multiresolution hidden Markov models (2-D MHMMs) and tested on a database of 600 concepts, each with about 40 training images. The system is evaluated using over 4,600 images outside the training set and compared with a random annotation scheme. The results show that the system achieves good accuracy and has high potential for linguistic indexing of photographic images. The system's architecture includes feature extraction, multiresolution statistical modeling, and statistical linguistic indexing. The feature extraction process uses wavelets to extract color and texture features. The statistical modeling process builds a model for each concept based on its training images. The linguistic indexing process uses the likelihood of the image under the model to select annotation words. The system's major advantages include the ability to train and retrain models for different concepts independently, the ability to store a large number of concepts, and the consideration of spatial relations among image pixels. The system's performance is evaluated on a controlled subset of the COREL database, showing that it achieves a classification accuracy of 11.88% and a coverage percentage of 21.63% for annotation words. The system is also compared with a random annotation scheme, showing that it achieves a higher accuracy and coverage percentage. The system's limitations include the use of 2-D images without a sense of object size and the potential biases in the COREL database. Future work includes improving the system's indexing speed and processing automatically annotated words to eliminate conflicting semantics.

Automatic Linguistic Indexing of Pictures By a Statistical Modeling Approach

| Jia Li; James Z. Wang