Understanding Identifying Relations for Open Information Extraction

This paper introduces two simple syntactic and lexical constraints to improve Open Information Extraction (IE). The constraints are implemented in the REVERB system, which significantly outperforms previous systems like TEXTRUNNER and WOE $ ^{pos} $ in terms of precision and recall. REVERB's extractions have a higher precision (over 30% at 0.8) compared to earlier systems, which had virtually none. The paper also analyzes the errors in REVERB's output, suggesting directions for future work. Open IE systems extract arbitrary relations from text without requiring a pre-specified vocabulary. Previous systems, such as TEXTRUNNER and WOE $ ^{pos} $ , often produce incoherent and uninformative extractions. REVERB addresses these issues by applying a syntactic constraint that requires relation phrases to start with a verb, end with a preposition, and be a contiguous sequence of words. It also applies a lexical constraint that ensures relation phrases appear with a minimal number of distinct argument pairs in a large corpus. The paper evaluates REVERB against other systems and finds that it achieves a higher area under the precision-recall curve (AUC) than previous systems. REVERB's performance is further improved by a lexical constraint that reduces the number of overspecified relation phrases. However, REVERB still has limitations, such as not handling n-ary relationships and failing to extract all possible arguments in some cases. The paper concludes that REVERB is a significant improvement over previous Open IE systems, and that further research is needed to address its limitations. The REVERB system and the data used in the experiments are made available to the research community.This paper introduces two simple syntactic and lexical constraints to improve Open Information Extraction (IE). The constraints are implemented in the REVERB system, which significantly outperforms previous systems like TEXTRUNNER and WOE $ ^{pos} $ in terms of precision and recall. REVERB's extractions have a higher precision (over 30% at 0.8) compared to earlier systems, which had virtually none. The paper also analyzes the errors in REVERB's output, suggesting directions for future work. Open IE systems extract arbitrary relations from text without requiring a pre-specified vocabulary. Previous systems, such as TEXTRUNNER and WOE $ ^{pos} $ , often produce incoherent and uninformative extractions. REVERB addresses these issues by applying a syntactic constraint that requires relation phrases to start with a verb, end with a preposition, and be a contiguous sequence of words. It also applies a lexical constraint that ensures relation phrases appear with a minimal number of distinct argument pairs in a large corpus. The paper evaluates REVERB against other systems and finds that it achieves a higher area under the precision-recall curve (AUC) than previous systems. REVERB's performance is further improved by a lexical constraint that reduces the number of overspecified relation phrases. However, REVERB still has limitations, such as not handling n-ary relationships and failing to extract all possible arguments in some cases. The paper concludes that REVERB is a significant improvement over previous Open IE systems, and that further research is needed to address its limitations. The REVERB system and the data used in the experiments are made available to the research community.

Identifying Relations for Open Information Extraction

July 27-31, 2011 | Anthony Fader, Stephen Soderland, and Oren Etzioni