Understanding Large-Scale Named Entity Disambiguation Based on Wikipedia Data

This paper presents a large-scale system for recognizing and disambiguating named entities using information from Wikipedia and web search results. The system aims to maximize the agreement between contextual information extracted from Wikipedia and the context of a document, as well as the agreement among category tags associated with candidate entities. The disambiguation process involves a vector space model that compares the processed document with the vector representations of Wikipedia entities, aiming to maximize similarity and category agreement. The system was evaluated on both Wikipedia articles and news stories, achieving high accuracy in both domains. The proposed system can be adapted to other languages and has potential applications in entity-based indexing, searching, and personalized web views.This paper presents a large-scale system for recognizing and disambiguating named entities using information from Wikipedia and web search results. The system aims to maximize the agreement between contextual information extracted from Wikipedia and the context of a document, as well as the agreement among category tags associated with candidate entities. The disambiguation process involves a vector space model that compares the processed document with the vector representations of Wikipedia entities, aiming to maximize similarity and category agreement. The system was evaluated on both Wikipedia articles and news stories, achieving high accuracy in both domains. The proposed system can be adapted to other languages and has potential applications in entity-based indexing, searching, and personalized web views.

Large-Scale Named Entity Disambiguation Based on Wikipedia Data

June 2007 | Silviu Cucerzan