Understanding Using Encyclopedic Knowledge for Named entity Disambiguation

The paper presents a novel method for detecting and disambiguating named entities in open domain text using an online encyclopedia, Wikipedia. The approach leverages the high coverage and rich structure of Wikipedia to train a disambiguation SVM kernel, which significantly outperforms a baseline method. The method involves two main steps: detecting whether a proper name refers to a named entity in the dictionary and disambiguating between multiple entities that can be denoted by the same name. The paper discusses the structure of Wikipedia, including redirect and disambiguation pages, categories, and hyperlinks, which are used to create a dataset of disambiguated queries. Two disambiguation methods are introduced: one based on cosine similarity and another using a taxonomy kernel that incorporates word-category correlations. Experimental results show that the taxonomy kernel outperforms the cosine similarity in most scenarios, confirming the effectiveness of using Wikipedia's taxonomy for named entity disambiguation. The paper also explores future work, including the potential benefits of using more training data and integrating word-category correlations with traditional word sense disambiguation techniques.The paper presents a novel method for detecting and disambiguating named entities in open domain text using an online encyclopedia, Wikipedia. The approach leverages the high coverage and rich structure of Wikipedia to train a disambiguation SVM kernel, which significantly outperforms a baseline method. The method involves two main steps: detecting whether a proper name refers to a named entity in the dictionary and disambiguating between multiple entities that can be denoted by the same name. The paper discusses the structure of Wikipedia, including redirect and disambiguation pages, categories, and hyperlinks, which are used to create a dataset of disambiguated queries. Two disambiguation methods are introduced: one based on cosine similarity and another using a taxonomy kernel that incorporates word-category correlations. Experimental results show that the taxonomy kernel outperforms the cosine similarity in most scenarios, confirming the effectiveness of using Wikipedia's taxonomy for named entity disambiguation. The paper also explores future work, including the potential benefits of using more training data and integrating word-category correlations with traditional word sense disambiguation techniques.

Using Encyclopedic Knowledge for Named Entity Disambiguation

| Razvan Bunescu, Marius Paşca