Sydney, July 2006 | Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein
The paper presents an automatic approach to tree annotation, aiming to learn a grammar that is both accurate and compact. Starting with a simple X-bar grammar, the method alternates between splitting and merging nonterminal symbols to maximize the likelihood of a training treebank. Unlike previous work, this approach allows for more flexible splitting of terminals based on their actual complexity in the data. The learned grammars are more compact and accurate than those from previous studies, achieving an F1 score of 90.2% on the Penn Treebank, surpassing fully lexicalized systems. The method combines manual and automatic approaches, using a split-and-merge strategy to adaptively allocate subsymbols where they are most effective. Hierarchical splitting and merging techniques are employed to improve accuracy and control grammar size, while smoothing is used to prevent overfitting. The resulting parser ranks among the best lexicalized parsers, demonstrating significant improvements over previous work.The paper presents an automatic approach to tree annotation, aiming to learn a grammar that is both accurate and compact. Starting with a simple X-bar grammar, the method alternates between splitting and merging nonterminal symbols to maximize the likelihood of a training treebank. Unlike previous work, this approach allows for more flexible splitting of terminals based on their actual complexity in the data. The learned grammars are more compact and accurate than those from previous studies, achieving an F1 score of 90.2% on the Penn Treebank, surpassing fully lexicalized systems. The method combines manual and automatic approaches, using a split-and-merge strategy to adaptively allocate subsymbols where they are most effective. Hierarchical splitting and merging techniques are employed to improve accuracy and control grammar size, while smoothing is used to prevent overfitting. The resulting parser ranks among the best lexicalized parsers, demonstrating significant improvements over previous work.