A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

| Kenneth Ward Church
A stochastic parts program and noun phrase parser for unrestricted text has been developed by Kenneth Ward Church of Bell Laboratories. The program tags each word in an input sentence with the most likely part of speech. For example, the word "table" can be a verb or a noun depending on context. The program uses a linear time dynamic programming algorithm to find the optimal assignment of parts of speech that maximizes the product of lexical and contextual probabilities. Lexical probabilities are derived from the Tagged Brown Corpus, while contextual probabilities are based on the previous two parts of speech. The program performs well, with a high accuracy rate (95-99% correct), and is particularly useful for research projects that collect large corpora of text. The program's performance is encouraging, and it is especially effective for speech synthesis applications where part of speech affects pronunciation. For example, the word "wind" has different pronunciations as a noun and a verb. The program also handles cases where the part of speech affects stress, such as "oily FLUID" versus "TRANSMISSION fluid." The program uses a stochastic approach that is simpler than previous methods, such as Marcus' "LR(k)-like" parser. It relies on bigram and trigram statistics rather than complex grammatical rules. The program can naturally take advantage of lexical probabilities, which are not easily captured by traditional parsers. It also handles lexical ambiguity effectively, as most words have a single part of speech in most contexts. The program has been applied to parse noun phrases with high accuracy. It uses a stochastic analog of precedence parsing, which involves inserting brackets into a sequence of parts of speech to identify noun phrases. The parser uses a table of probabilities to determine where to insert brackets. The program also addresses smoothing issues, which are common in probabilistic models. It uses smoothing techniques to handle rare events and ensure that probabilities are not zero. Proper nouns and capitalized words are particularly challenging, and the program uses prepass techniques to label them correctly. Overall, the program is a valuable tool for research and practical applications, including speech synthesis, speech recognition, and text processing. It provides accurate part of speech tagging and noun phrase parsing, which are essential for many natural language processing tasks.A stochastic parts program and noun phrase parser for unrestricted text has been developed by Kenneth Ward Church of Bell Laboratories. The program tags each word in an input sentence with the most likely part of speech. For example, the word "table" can be a verb or a noun depending on context. The program uses a linear time dynamic programming algorithm to find the optimal assignment of parts of speech that maximizes the product of lexical and contextual probabilities. Lexical probabilities are derived from the Tagged Brown Corpus, while contextual probabilities are based on the previous two parts of speech. The program performs well, with a high accuracy rate (95-99% correct), and is particularly useful for research projects that collect large corpora of text. The program's performance is encouraging, and it is especially effective for speech synthesis applications where part of speech affects pronunciation. For example, the word "wind" has different pronunciations as a noun and a verb. The program also handles cases where the part of speech affects stress, such as "oily FLUID" versus "TRANSMISSION fluid." The program uses a stochastic approach that is simpler than previous methods, such as Marcus' "LR(k)-like" parser. It relies on bigram and trigram statistics rather than complex grammatical rules. The program can naturally take advantage of lexical probabilities, which are not easily captured by traditional parsers. It also handles lexical ambiguity effectively, as most words have a single part of speech in most contexts. The program has been applied to parse noun phrases with high accuracy. It uses a stochastic analog of precedence parsing, which involves inserting brackets into a sequence of parts of speech to identify noun phrases. The parser uses a table of probabilities to determine where to insert brackets. The program also addresses smoothing issues, which are common in probabilistic models. It uses smoothing techniques to handle rare events and ensure that probabilities are not zero. Proper nouns and capitalized words are particularly challenging, and the program uses prepass techniques to label them correctly. Overall, the program is a valuable tool for research and practical applications, including speech synthesis, speech recognition, and text processing. It provides accurate part of speech tagging and noun phrase parsing, which are essential for many natural language processing tasks.
Reach us at info@study.space
Understanding A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text