Three Generative, Lexicalised Models for Statistical Parsing

Three Generative, Lexicalised Models for Statistical Parsing

17 Jun 1997 | Michael Collins
This paper introduces three new statistical parsing models for lexicalised context-free grammars. The first model is a generative version of the model described in Collins (1996), while the second extends the parser to handle the complement/adjunct distinction through subcategorisation frames. The third model provides a probabilistic treatment of wh-movement, derived from Generalized Phrase Structure Grammar. These models improve parsing performance on Wall Street Journal text, achieving 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over Collins (1996). The models also provide information about wh-movement and subcategorisation, which is essential for many NLP applications. Model 1 performs significantly better than Collins (1996), and Models 2 and 3 further improve performance. The models use a generative approach, allowing them to condition on any structure previously generated, which enhances parsing accuracy. The paper also discusses practical issues such as smoothing for unknown words and part-of-speech tagging. Results show that the models outperform previous work, particularly in handling complex structures like traces and wh-movement. The models are compared to other approaches, including dependency models and decision tree parsing. The paper concludes that these models provide a statistically grounded approach to parsing, incorporating linguistically fundamental concepts like subcategorisation and wh-movement.This paper introduces three new statistical parsing models for lexicalised context-free grammars. The first model is a generative version of the model described in Collins (1996), while the second extends the parser to handle the complement/adjunct distinction through subcategorisation frames. The third model provides a probabilistic treatment of wh-movement, derived from Generalized Phrase Structure Grammar. These models improve parsing performance on Wall Street Journal text, achieving 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over Collins (1996). The models also provide information about wh-movement and subcategorisation, which is essential for many NLP applications. Model 1 performs significantly better than Collins (1996), and Models 2 and 3 further improve performance. The models use a generative approach, allowing them to condition on any structure previously generated, which enhances parsing accuracy. The paper also discusses practical issues such as smoothing for unknown words and part-of-speech tagging. Results show that the models outperform previous work, particularly in handling complex structures like traces and wh-movement. The models are compared to other approaches, including dependency models and decision tree parsing. The paper concludes that these models provide a statistically grounded approach to parsing, incorporating linguistically fundamental concepts like subcategorisation and wh-movement.
Reach us at info@study.space
[slides and audio] Three Generative%2C Lexicalised Models for Statistical Parsing