2001 | Wee Meng Soon, Hwee Tou Ng, Daniel Chung Yong Lim
This paper presents a machine learning approach to coreference resolution of noun phrases in unrestricted text. The method learns from a small, annotated corpus and can resolve general noun phrases, not just pronouns, across various entity types such as organizations, people, and others. The approach is evaluated on the MUC-6 and MUC-7 coreference corpora, achieving encouraging results with accuracy comparable to non-learning approaches. The system is the first learning-based system to achieve such performance on these datasets.
The coreference resolution process involves determining markables (noun phrases, named entities, etc.) and generating feature vectors from them. A decision tree learning algorithm (C5) is used to build a classifier. The system's performance is influenced by the accuracy of NLP modules, particularly named entity recognition and noun phrase identification.
The evaluation shows that the system achieves a balanced F-measure of 62.6% for MUC-6 and 60.4% for MUC-7, outperforming several MUC-6 and MUC-7 systems. The contribution of features is analyzed, with ALIAS, STR_MATCH, and APPOSITIVE being the most informative. Error analysis identifies common types of errors, including string match errors, noun phrase identification errors, and semantic class determination issues. The paper also compares the errors made by the system with those of the RESOLVE system, highlighting areas for improvement.This paper presents a machine learning approach to coreference resolution of noun phrases in unrestricted text. The method learns from a small, annotated corpus and can resolve general noun phrases, not just pronouns, across various entity types such as organizations, people, and others. The approach is evaluated on the MUC-6 and MUC-7 coreference corpora, achieving encouraging results with accuracy comparable to non-learning approaches. The system is the first learning-based system to achieve such performance on these datasets.
The coreference resolution process involves determining markables (noun phrases, named entities, etc.) and generating feature vectors from them. A decision tree learning algorithm (C5) is used to build a classifier. The system's performance is influenced by the accuracy of NLP modules, particularly named entity recognition and noun phrase identification.
The evaluation shows that the system achieves a balanced F-measure of 62.6% for MUC-6 and 60.4% for MUC-7, outperforming several MUC-6 and MUC-7 systems. The contribution of features is analyzed, with ALIAS, STR_MATCH, and APPOSITIVE being the most informative. Error analysis identifies common types of errors, including string match errors, noun phrase identification errors, and semantic class determination issues. The paper also compares the errors made by the system with those of the RESOLVE system, highlighting areas for improvement.