Understanding Automatic Word Sense Discrimination

This paper introduces context-group discrimination, an unsupervised algorithm for word sense disambiguation based on clustering. The algorithm represents words, contexts, and senses in a high-dimensional, real-valued space called Word Space, where similarity is determined by second-order co-occurrence. Context vectors are formed from the words that co-occur with the ambiguous word in the training corpus, and these vectors are clustered into groups to form senses. The algorithm is evaluated on both natural and artificial ambiguous words, demonstrating good performance. The paper also discusses the application of context-group discrimination in information retrieval, showing that it can improve the relevance of documents retrieved by a query. The experiments highlight the importance of feature selection and the choice of clustering granularity, with globally selected features and fine-grained clustering generally performing better.This paper introduces context-group discrimination, an unsupervised algorithm for word sense disambiguation based on clustering. The algorithm represents words, contexts, and senses in a high-dimensional, real-valued space called Word Space, where similarity is determined by second-order co-occurrence. Context vectors are formed from the words that co-occur with the ambiguous word in the training corpus, and these vectors are clustered into groups to form senses. The algorithm is evaluated on both natural and artificial ambiguous words, demonstrating good performance. The paper also discusses the application of context-group discrimination in information retrieval, showing that it can improve the relevance of documents retrieved by a query. The experiments highlight the importance of feature selection and the choice of clustering granularity, with globally selected features and fine-grained clustering generally performing better.

Automatic Word Sense Discrimination

1998 | Hinrich Schütze