Understanding A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

The paper presents a stochastic part-of-speech (POS) tagging program and a noun phrase parser designed for unrestricted text. The author, Kenneth Ward Church, highlights the importance of POS tagging in various applications such as speech synthesis, recognition, spelling correction, and machine translation. The program uses a linear-time dynamic programming algorithm to optimize the product of lexical and contextual probabilities, trained on the Tagged Brown Corpus. The performance is encouraging, with 95-99% accuracy, and the method is particularly effective for simple noun phrases. The paper also discusses the challenges of lexical ambiguity and smoothing issues, emphasizing the need for frequency counts and smoothing to handle rare words. The proposed method is a stochastic analog of precedence parsing, using a table of probabilities to insert brackets into a sequence of parts of speech to identify noun phrases. The paper concludes with a sample output and references to related research.The paper presents a stochastic part-of-speech (POS) tagging program and a noun phrase parser designed for unrestricted text. The author, Kenneth Ward Church, highlights the importance of POS tagging in various applications such as speech synthesis, recognition, spelling correction, and machine translation. The program uses a linear-time dynamic programming algorithm to optimize the product of lexical and contextual probabilities, trained on the Tagged Brown Corpus. The performance is encouraging, with 95-99% accuracy, and the method is particularly effective for simple noun phrases. The paper also discusses the challenges of lexical ambiguity and smoothing issues, emphasizing the need for frequency counts and smoothing to handle rare words. The proposed method is a stochastic analog of precedence parsing, using a table of probabilities to insert brackets into a sequence of parts of speech to identify noun phrases. The paper concludes with a sample output and references to related research.

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

| Kenneth Ward Church