OntoNotes: The 90% Solution

OntoNotes: The 90% Solution

June 2006 | Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, Ralph Weischedel
The OntoNotes project presents a methodology for creating a large, multilingual, richly annotated corpus with 90% inter-annotator agreement. The project focuses on representing literal meaning through predicate structure, word sense, ontology linking, and coreference. Initial data, including 300,000 words of English newswire and 250,000 words of Chinese newswire, will be available in 2007. The project includes Treebanking, which uses the Penn Treebank to annotate predicate-argument structure, and PropBanking, which focuses on verb argument structures. Word sense ambiguity is addressed by grouping fine-grained WordNet senses into coarser ones, improving inter-annotator agreement and system performance. This process is applied to multiple languages and genres. Coreference annotation links referring expressions, such as proper nouns and pronouns, to discourse entities. The OntoNotes project also links word senses to the Omega ontology, which includes various semantic resources. The project aims to enable automated semantic analysis by providing a large amount of training data. The OntoNotes representation extends existing annotations, allowing for the inclusion of additional semantic representations. The project is compatible with other efforts in semantic annotation and may contribute to a larger multilingual corpus integration effort. The project has been developed in collaboration with multiple institutions and has been applied to various languages and domains.The OntoNotes project presents a methodology for creating a large, multilingual, richly annotated corpus with 90% inter-annotator agreement. The project focuses on representing literal meaning through predicate structure, word sense, ontology linking, and coreference. Initial data, including 300,000 words of English newswire and 250,000 words of Chinese newswire, will be available in 2007. The project includes Treebanking, which uses the Penn Treebank to annotate predicate-argument structure, and PropBanking, which focuses on verb argument structures. Word sense ambiguity is addressed by grouping fine-grained WordNet senses into coarser ones, improving inter-annotator agreement and system performance. This process is applied to multiple languages and genres. Coreference annotation links referring expressions, such as proper nouns and pronouns, to discourse entities. The OntoNotes project also links word senses to the Omega ontology, which includes various semantic resources. The project aims to enable automated semantic analysis by providing a large amount of training data. The OntoNotes representation extends existing annotations, allowing for the inclusion of additional semantic representations. The project is compatible with other efforts in semantic annotation and may contribute to a larger multilingual corpus integration effort. The project has been developed in collaboration with multiple institutions and has been applied to various languages and domains.
Reach us at info@study.space
[slides and audio] OntoNotes%3A The 90%25 Solution