July 27–31, 2011 | Johannes Hoffart¹, Mohamed Amir Yosef¹, Ilaria Bordino², Hagen Fürstenau³, Manfred Pinkal³, Marc Spaniol¹, Bilyana Taneva¹, Stefan Thater³, Gerhard Weikum¹
This paper presents a robust method for collective disambiguation of named entities in text. The approach combines three measures: prior probability of an entity being mentioned, similarity between the contexts of a mention and a candidate entity, and coherence among candidate entities for all mentions. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
The paper introduces a framework that integrates popularity priors, similarity measures, and coherence into a robust disambiguation method. It also presents new measures for defining mention-entity similarity and a new algorithm for computing dense subgraphs in a mention-entity graph, which produces high-quality mention-entity mappings. The method is evaluated on a demanding corpus, showing significant improvements over state-of-the-art opponents.
The framework considers mentions of named entities in text and maps them to their proper entries in a knowledge base. It uses existing knowledge bases like DBpedia or YAGO to identify entity candidates. The method combines popularity priors, similarity measures, and coherence to determine the best disambiguation. The framework includes robustness tests to selectively enable or disable components based on the input text.
The paper also discusses the state of the art in named entity disambiguation, including prior methods that use Wikipedia for explicit disambiguation and approaches that consider semantic coherence. The proposed method improves upon these by jointly considering multiple mentions in an input and aiming for a collective assignment onto entities.
The framework is evaluated on a dataset derived from the CoNLL 2003 NER task, showing significant improvements in precision and recall. The results demonstrate that the proposed method outperforms existing approaches, particularly in handling ambiguous and long-tailed entity mentions. The method is implemented in a prototype system called AIDA, which provides an integrated NED method using popularity, similarity, and graph-based coherence. The system is fully implemented and accessible online. Future work will consider additional semantic properties between entities to further enhance the coherence algorithm.This paper presents a robust method for collective disambiguation of named entities in text. The approach combines three measures: prior probability of an entity being mentioned, similarity between the contexts of a mention and a candidate entity, and coherence among candidate entities for all mentions. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
The paper introduces a framework that integrates popularity priors, similarity measures, and coherence into a robust disambiguation method. It also presents new measures for defining mention-entity similarity and a new algorithm for computing dense subgraphs in a mention-entity graph, which produces high-quality mention-entity mappings. The method is evaluated on a demanding corpus, showing significant improvements over state-of-the-art opponents.
The framework considers mentions of named entities in text and maps them to their proper entries in a knowledge base. It uses existing knowledge bases like DBpedia or YAGO to identify entity candidates. The method combines popularity priors, similarity measures, and coherence to determine the best disambiguation. The framework includes robustness tests to selectively enable or disable components based on the input text.
The paper also discusses the state of the art in named entity disambiguation, including prior methods that use Wikipedia for explicit disambiguation and approaches that consider semantic coherence. The proposed method improves upon these by jointly considering multiple mentions in an input and aiming for a collective assignment onto entities.
The framework is evaluated on a dataset derived from the CoNLL 2003 NER task, showing significant improvements in precision and recall. The results demonstrate that the proposed method outperforms existing approaches, particularly in handling ambiguous and long-tailed entity mentions. The method is implemented in a prototype system called AIDA, which provides an integrated NED method using popularity, similarity, and graph-based coherence. The system is fully implemented and accessible online. Future work will consider additional semantic properties between entities to further enhance the coherence algorithm.