Understanding The Penn Treebank%3A Annotating Predicate Argument Structure

The Penn Treebank has introduced a new syntactic annotation scheme to highlight predicate-argument structure. This paper discusses the implementation of key aspects of this new annotation scheme. The new scheme provides a more consistent treatment of various grammatical phenomena, including co-indexed null elements for phenomena like wh-movement, passive, and infinitival constructions. It also allows for non-context free annotational mechanisms to recover discontinuous constituents and provides a clear, concise tagging system for semantic roles. The new annotation scheme includes four crucial aspects: consistent treatment of grammatical phenomena, co-indexed null elements for underlying structures, non-context free mechanisms for discontinuous constituents, and clear distinctions between verb arguments and adjuncts. A detailed style-book was created to ensure consistency among annotators and to facilitate the annotation task. This style-book is essential for achieving high levels of inter-annotator agreement. The current treebank materials suffer from differing annotation regimes across syntactic categories. The new scheme unifies these analyses, treating the predicate as either the lowest VP or the phrasal structure under copular BE. In cases where the predicate cannot be identified, it is tagged -PRD. The new scheme also distinguishes between arguments and adjuncts, using semantic roles and syntactic distinctions. It uses functional tags to label constituents, allowing for the identification of arguments and adjuncts. Null elements are used to indicate missing elements in grammatical structures, such as in passives and infinitive complements. The new scheme addresses the issue of discontinuous constituents by using co-indexing to indicate where constituents should be interpreted within the predicate argument structure. This allows for the recovery of predicate argument structures even in complex nesting of WH-questions and passives. The new annotation scheme also addresses the issue of gapping by using structural templates to recover predicate argument structures. This allows for the mapping of arguments from gapped clauses onto a template. The new annotation scheme provides a consistent database for linguistic research and is useful for training stochastic parsers on surface syntax and beyond. It also serves as a basis for evaluating parsing technology. The new scheme is expected to provide a more comprehensive and accurate representation of predicate-argument structure, making it a valuable resource for linguistic research and natural language processing.The Penn Treebank has introduced a new syntactic annotation scheme to highlight predicate-argument structure. This paper discusses the implementation of key aspects of this new annotation scheme. The new scheme provides a more consistent treatment of various grammatical phenomena, including co-indexed null elements for phenomena like wh-movement, passive, and infinitival constructions. It also allows for non-context free annotational mechanisms to recover discontinuous constituents and provides a clear, concise tagging system for semantic roles. The new annotation scheme includes four crucial aspects: consistent treatment of grammatical phenomena, co-indexed null elements for underlying structures, non-context free mechanisms for discontinuous constituents, and clear distinctions between verb arguments and adjuncts. A detailed style-book was created to ensure consistency among annotators and to facilitate the annotation task. This style-book is essential for achieving high levels of inter-annotator agreement. The current treebank materials suffer from differing annotation regimes across syntactic categories. The new scheme unifies these analyses, treating the predicate as either the lowest VP or the phrasal structure under copular BE. In cases where the predicate cannot be identified, it is tagged -PRD. The new scheme also distinguishes between arguments and adjuncts, using semantic roles and syntactic distinctions. It uses functional tags to label constituents, allowing for the identification of arguments and adjuncts. Null elements are used to indicate missing elements in grammatical structures, such as in passives and infinitive complements. The new scheme addresses the issue of discontinuous constituents by using co-indexing to indicate where constituents should be interpreted within the predicate argument structure. This allows for the recovery of predicate argument structures even in complex nesting of WH-questions and passives. The new annotation scheme also addresses the issue of gapping by using structural templates to recover predicate argument structures. This allows for the mapping of arguments from gapped clauses onto a template. The new annotation scheme provides a consistent database for linguistic research and is useful for training stochastic parsers on surface syntax and beyond. It also serves as a basis for evaluating parsing technology. The new scheme is expected to provide a more comprehensive and accurate representation of predicate-argument structure, making it a valuable resource for linguistic research and natural language processing.

THE PENN TREEBANK: ANNOTATING PREDICATE ARGUMENT STRUCTURE

| Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, Britta Schasberger