Distant supervision for relation extraction without labeled data

Distant supervision for relation extraction without labeled data

2-7 August 2009 | Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky
The paper introduces a novel approach to relation extraction called "distant supervision," which does not require labeled data and can handle large corpora. The method leverages Freebase, a large semantic database, to provide supervision for relation extraction. For each pair of entities in a Freebase relation, the system extracts textual features from sentences containing those entities in an unlabeled corpus and trains a logistic regression classifier. This approach combines the benefits of supervised and unsupervised information extraction, allowing for the extraction of a large number of relations from a vast amount of text. The system achieves a precision of 67.6% for 10,000 instances of 102 relations. The paper also analyzes the effectiveness of syntactic versus lexical features, finding that syntactic features are particularly useful for ambiguous or lexically distant relations. The evaluation includes both held-out and human-annotated data, showing that the combination of syntactic and lexical features significantly improves precision.The paper introduces a novel approach to relation extraction called "distant supervision," which does not require labeled data and can handle large corpora. The method leverages Freebase, a large semantic database, to provide supervision for relation extraction. For each pair of entities in a Freebase relation, the system extracts textual features from sentences containing those entities in an unlabeled corpus and trains a logistic regression classifier. This approach combines the benefits of supervised and unsupervised information extraction, allowing for the extraction of a large number of relations from a vast amount of text. The system achieves a precision of 67.6% for 10,000 instances of 102 relations. The paper also analyzes the effectiveness of syntactic versus lexical features, finding that syntactic features are particularly useful for ambiguous or lexically distant relations. The evaluation includes both held-out and human-annotated data, showing that the combination of syntactic and lexical features significantly improves precision.
Reach us at info@study.space
[slides and audio] Distant supervision for relation extraction without labeled data