A Maximum-Entropy-Inspired Parser

A Maximum-Entropy-Inspired Parser

| Eugene Charniak
This paper presents a new parser that achieves high precision and recall for parsing sentences into Penn tree-bank style parse trees. The parser uses a "maximum-entropy-inspired" model for conditioning and smoothing, allowing it to effectively combine multiple conditioning events. When trained and tested on the standard sections of the Wall Street Journal treebank, the parser achieves 90.1% average precision/recall for sentences of length ≤40 and 89.5% for sentences of length ≤100, representing a 13% decrease in error rate over the best single-parser results on this corpus. The parser is based on a probabilistic generative model that assigns probabilities to parses by considering each constituent in the parse and guessing its pre-terminal, lexical head, and expansion. The probability of a parse is calculated using a product of probabilities for each constituent. The model uses a Markov grammar to assign probabilities to possible expansions, which allows for more flexibility and better performance compared to traditional PCFG approaches. The parser uses a maximum-entropy-inspired model for conditioning and smoothing, which allows for the inclusion of various conditioning events. This model enables the parser to handle sparse data more effectively by using features that can be combined in different ways. The parser also incorporates additional conditioning events, such as the grandparent label and left sibling label, which improve performance by 0.45% in average precision/recall. The parser's performance is evaluated on the Penn Wall Street Journal treebank, with results showing that it outperforms previous parsers. The parser's success is attributed to its flexibility and ability to handle various conditioning events, as well as its use of a maximum-entropy-inspired model for smoothing. The paper concludes that the parser's performance demonstrates the effectiveness of the maximum-entropy-inspired approach in parsing tasks.This paper presents a new parser that achieves high precision and recall for parsing sentences into Penn tree-bank style parse trees. The parser uses a "maximum-entropy-inspired" model for conditioning and smoothing, allowing it to effectively combine multiple conditioning events. When trained and tested on the standard sections of the Wall Street Journal treebank, the parser achieves 90.1% average precision/recall for sentences of length ≤40 and 89.5% for sentences of length ≤100, representing a 13% decrease in error rate over the best single-parser results on this corpus. The parser is based on a probabilistic generative model that assigns probabilities to parses by considering each constituent in the parse and guessing its pre-terminal, lexical head, and expansion. The probability of a parse is calculated using a product of probabilities for each constituent. The model uses a Markov grammar to assign probabilities to possible expansions, which allows for more flexibility and better performance compared to traditional PCFG approaches. The parser uses a maximum-entropy-inspired model for conditioning and smoothing, which allows for the inclusion of various conditioning events. This model enables the parser to handle sparse data more effectively by using features that can be combined in different ways. The parser also incorporates additional conditioning events, such as the grandparent label and left sibling label, which improve performance by 0.45% in average precision/recall. The parser's performance is evaluated on the Penn Wall Street Journal treebank, with results showing that it outperforms previous parsers. The parser's success is attributed to its flexibility and ability to handle various conditioning events, as well as its use of a maximum-entropy-inspired model for smoothing. The paper concludes that the parser's performance demonstrates the effectiveness of the maximum-entropy-inspired approach in parsing tasks.
Reach us at info@study.space
[slides and audio] A Maximum-Entropy-Inspired Parser