July 27–31, 2011 | Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, Gerhard Weikum
This paper presents a robust method for disambiguating named entities in natural language text by leveraging context from knowledge bases and using a new form of coherence graph. The method combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, and the coherence among candidate entities for all mentions together. It builds a weighted graph of mentions and candidate entities and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs. The key contributions include a framework for combining popularity priors, similarity measures, and coherence, new measures for defining mention-entity similarity, a new algorithm for computing dense subgraphs in a mention-entity graph, and an empirical evaluation on a demanding corpus with significant improvements over state-of-the-art methods.This paper presents a robust method for disambiguating named entities in natural language text by leveraging context from knowledge bases and using a new form of coherence graph. The method combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, and the coherence among candidate entities for all mentions together. It builds a weighted graph of mentions and candidate entities and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs. The key contributions include a framework for combining popularity priors, similarity measures, and coherence, new measures for defining mention-entity similarity, a new algorithm for computing dense subgraphs in a mention-entity graph, and an empirical evaluation on a demanding corpus with significant improvements over state-of-the-art methods.