[slides and audio] Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

This paper presents an unsupervised learning algorithm for word sense disambiguation that, when trained on unannotated English text, can achieve performance comparable to supervised techniques that require time-consuming manual annotations. The algorithm leverages two key properties of human language: one sense per collocation and one sense per discourse. These properties are exploited in an iterative bootstrapping procedure to incrementally identify collocations for target senses, using a small set of seed collocations for each sense. The algorithm is robust and self-correcting, incorporating many strengths of supervised approaches, such as sensitivity to word-order information. The method is evaluated on a large, untagged corpus, demonstrating accuracy exceeding 96%. The paper also discusses various strategies for selecting seed words and techniques for correcting initial misclassifications. Compared to previous unsupervised methods and supervised algorithms, the proposed algorithm shows superior performance, particularly when using the one-sense-per-discourse constraint.This paper presents an unsupervised learning algorithm for word sense disambiguation that, when trained on unannotated English text, can achieve performance comparable to supervised techniques that require time-consuming manual annotations. The algorithm leverages two key properties of human language: one sense per collocation and one sense per discourse. These properties are exploited in an iterative bootstrapping procedure to incrementally identify collocations for target senses, using a small set of seed collocations for each sense. The algorithm is robust and self-correcting, incorporating many strengths of supervised approaches, such as sensitivity to word-order information. The method is evaluated on a large, untagged corpus, demonstrating accuracy exceeding 96%. The paper also discusses various strategies for selecting seed words and techniques for correcting initial misclassifications. Compared to previous unsupervised methods and supervised algorithms, the proposed algorithm shows superior performance, particularly when using the one-sense-per-discourse constraint.

UNSUPERVISED WORD SENSE DISAMBIGUATION RIVALING SUPERVISED METHODS

| David Yarowsky