Automatic Word Sense Discrimination

Automatic Word Sense Discrimination

1998 | Hinrich Schütze
This paper introduces context-group discrimination, an automatic and unsupervised algorithm for word sense disambiguation. The method uses clustering to group occurrences of an ambiguous word into clusters based on second-order co-occurrence. Words, contexts, and senses are represented in a high-dimensional vector space called Word Space, where similarity is determined by co-occurrence patterns. The algorithm clusters contexts into groups, with each cluster representing a sense. Sense vectors are derived from cluster centroids, and test contexts are assigned to the closest cluster. This approach avoids the need for external sense definitions, making it fully automatic. The algorithm is tested on both natural and artificial ambiguous words, showing good performance in distinguishing contexts. It is particularly effective for words with fine-grained senses, such as 'space' with multiple meanings. The method is applied to information retrieval, where it improves document-query similarity by focusing on word senses rather than words. This leads to better ranking of relevant documents. The algorithm also helps in designing interfaces that account for word ambiguity, allowing users to select the intended sense of a word. The paper evaluates the algorithm using different feature selection strategies (local vs. global) and clustering methods (2 vs. 10 clusters). Results show that global feature selection and fine clustering outperform local and coarse clustering. SVD-reduced representations also perform well, especially for sparse data. The algorithm achieves above-baseline performance, with accuracy ranging from 83% to 91%, though it is slightly less accurate than methods with minimal manual intervention. The study highlights the importance of separating training and test sets to avoid overfitting. It also shows that context-group discrimination is effective for information retrieval, where it improves performance by reducing irrelevant matches due to sense mismatches. The algorithm's ability to handle fine-grained senses and its automatic nature make it a valuable tool for word sense disambiguation in computational linguistics.This paper introduces context-group discrimination, an automatic and unsupervised algorithm for word sense disambiguation. The method uses clustering to group occurrences of an ambiguous word into clusters based on second-order co-occurrence. Words, contexts, and senses are represented in a high-dimensional vector space called Word Space, where similarity is determined by co-occurrence patterns. The algorithm clusters contexts into groups, with each cluster representing a sense. Sense vectors are derived from cluster centroids, and test contexts are assigned to the closest cluster. This approach avoids the need for external sense definitions, making it fully automatic. The algorithm is tested on both natural and artificial ambiguous words, showing good performance in distinguishing contexts. It is particularly effective for words with fine-grained senses, such as 'space' with multiple meanings. The method is applied to information retrieval, where it improves document-query similarity by focusing on word senses rather than words. This leads to better ranking of relevant documents. The algorithm also helps in designing interfaces that account for word ambiguity, allowing users to select the intended sense of a word. The paper evaluates the algorithm using different feature selection strategies (local vs. global) and clustering methods (2 vs. 10 clusters). Results show that global feature selection and fine clustering outperform local and coarse clustering. SVD-reduced representations also perform well, especially for sparse data. The algorithm achieves above-baseline performance, with accuracy ranging from 83% to 91%, though it is slightly less accurate than methods with minimal manual intervention. The study highlights the importance of separating training and test sets to avoid overfitting. It also shows that context-group discrimination is effective for information retrieval, where it improves performance by reducing irrelevant matches due to sense mismatches. The algorithm's ability to handle fine-grained senses and its automatic nature make it a valuable tool for word sense disambiguation in computational linguistics.
Reach us at info@study.space
Understanding Automatic Word Sense Discrimination