[slides and audio] Class-Based n-gram Models of Natural Language

The paper addresses the problem of predicting words in a sequence of text based on previous words, focusing on n-gram models that use classes of words. The authors discuss statistical algorithms for assigning words to classes based on their co-occurrence frequency and find that these classes can capture syntactic or semantic groupings. They introduce n-gram class models, where the probability of a word sequence is decomposed into the product of the probability of the word given its context and the probability of the context given the class. The maximum likelihood assignment of words to classes is shown to be equivalent to maximizing the average mutual information of adjacent classes. Two algorithms are described for finding suboptimal assignments of words to classes. The authors also apply mutual information to identify pairs of words that function together as lexical entities and to find classes with semantic coherence. The paper concludes with a discussion on the value of these methods and their potential for improving language models.The paper addresses the problem of predicting words in a sequence of text based on previous words, focusing on n-gram models that use classes of words. The authors discuss statistical algorithms for assigning words to classes based on their co-occurrence frequency and find that these classes can capture syntactic or semantic groupings. They introduce n-gram class models, where the probability of a word sequence is decomposed into the product of the probability of the word given its context and the probability of the context given the class. The maximum likelihood assignment of words to classes is shown to be equivalent to maximizing the average mutual information of adjacent classes. Two algorithms are described for finding suboptimal assignments of words to classes. The authors also apply mutual information to identify pairs of words that function together as lexical entities and to find classes with semantic coherence. The paper concludes with a discussion on the value of these methods and their potential for improving language models.

Class-Based n-gram Models of Natural Language

1992 | Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Robert L. Mercer, Jenifer C. Lai