Modeling Relations and Their Mentions without Labeled Text

Modeling Relations and Their Mentions without Labeled Text

2010 | Sebastian Riedel, Limin Yao, and Andrew McCallum
This paper presents a novel approach to extract relations from text without explicit training annotation. Recent approaches assume that every sentence that mentions two related entities expresses the corresponding relation. Motivated by the observation that this assumption frequently does not hold, especially when considering external knowledge bases, we propose to relax it. Instead, we assume that at least one sentence which mentions two related entities expresses the corresponding relation. To model this assumption, we make two contributions. First, we introduce a novel undirected graphical model that captures both the task of predicting relations between entities and the task of predicting which sentences express these relations. Second, we propose to train this graphical model by framing distant supervision as an instance of constraint-driven semi-supervision. In particular, we use SampleRank, a discriminative learning algorithm for large factor graphs, and inject the expressed-at-least-once assumption through a truth function. Empirically, this approach improves precision substantially. For the task of extracting 1000 Freebase relation instances from the New York Times, we measure a precision of 91% for at-least-once supervision and 87% for distant supervision. This amounts to an error reduction rate of 31%. A crucial aspect of our approach is its extensibility: framed exclusively in terms of factor graphs and truth functions, it is conceptually easy to apply it to larger tasks such as the joint prediction of relations and entity types. In future work, we will exploit this aspect and extend our model to jointly perform other relevant tasks for the automatic construction of knowledge bases.This paper presents a novel approach to extract relations from text without explicit training annotation. Recent approaches assume that every sentence that mentions two related entities expresses the corresponding relation. Motivated by the observation that this assumption frequently does not hold, especially when considering external knowledge bases, we propose to relax it. Instead, we assume that at least one sentence which mentions two related entities expresses the corresponding relation. To model this assumption, we make two contributions. First, we introduce a novel undirected graphical model that captures both the task of predicting relations between entities and the task of predicting which sentences express these relations. Second, we propose to train this graphical model by framing distant supervision as an instance of constraint-driven semi-supervision. In particular, we use SampleRank, a discriminative learning algorithm for large factor graphs, and inject the expressed-at-least-once assumption through a truth function. Empirically, this approach improves precision substantially. For the task of extracting 1000 Freebase relation instances from the New York Times, we measure a precision of 91% for at-least-once supervision and 87% for distant supervision. This amounts to an error reduction rate of 31%. A crucial aspect of our approach is its extensibility: framed exclusively in terms of factor graphs and truth functions, it is conceptually easy to apply it to larger tasks such as the joint prediction of relations and entity types. In future work, we will exploit this aspect and extend our model to jointly perform other relevant tasks for the automatic construction of knowledge bases.
Reach us at info@study.space