Understanding The Penn Discourse TreeBank 2.0.

The Penn Discourse TreeBank (PDTB) is a large-scale annotated corpus of discourse relations and their arguments over the 1 million word Wall Street Journal (WSJ) corpus. The second version, PDTB-2.0, introduces a lexically-grounded approach to discourse relation annotation, providing detailed annotations of discourse relations, their arguments, and their attributions. The PDTB-2.0 includes a hierarchical classification of sense annotations for discourse relations, as well as annotations for the attribution of discourse relations and their arguments. The annotation process involves two annotators and a team of experts to ensure reliability and consistency. The PDTB-2.0 also includes a detailed analysis of the distribution of discourse relations and their arguments, as well as the differences between PDTB-1.0 and PDTB-2.0. The PDTB-2.0 is a valuable resource for researchers in natural language processing, particularly in the areas of discourse analysis, summarization, and natural language generation. The corpus provides a rich source of data for studying the relationship between sentence-level and discourse-level structures, as well as for evaluating and developing algorithms for discourse analysis. The PDTB-2.0 is available through the Linguistic Data Consortium and is an important resource for the study of discourse structure and its relationship to syntax and semantics.The Penn Discourse TreeBank (PDTB) is a large-scale annotated corpus of discourse relations and their arguments over the 1 million word Wall Street Journal (WSJ) corpus. The second version, PDTB-2.0, introduces a lexically-grounded approach to discourse relation annotation, providing detailed annotations of discourse relations, their arguments, and their attributions. The PDTB-2.0 includes a hierarchical classification of sense annotations for discourse relations, as well as annotations for the attribution of discourse relations and their arguments. The annotation process involves two annotators and a team of experts to ensure reliability and consistency. The PDTB-2.0 also includes a detailed analysis of the distribution of discourse relations and their arguments, as well as the differences between PDTB-1.0 and PDTB-2.0. The PDTB-2.0 is a valuable resource for researchers in natural language processing, particularly in the areas of discourse analysis, summarization, and natural language generation. The corpus provides a rich source of data for studying the relationship between sentence-level and discourse-level structures, as well as for evaluating and developing algorithms for discourse analysis. The PDTB-2.0 is available through the Linguistic Data Consortium and is an important resource for the study of discourse structure and its relationship to syntax and semantics.

The Penn Discourse TreeBank 2.0

| Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, Bonnie Webber