pages 19–27, Boulder, Colorado, June 2009. | Eneko Agirre† Enrique Alfonseca† Keith Hall† Jana Kravalova‡§ Marius Pașca† Aitor Soroa†
This paper presents and compares WordNet-based and distributional approaches to measuring semantic similarity and relatedness. The authors discuss the strengths and weaknesses of each approach and propose a combination of both methods. Each method independently achieves the best results on the RG and WordSim353 datasets, and a supervised combination of the methods yields the best published results on all datasets. The paper also explores cross-lingual similarity, demonstrating that the methods can be easily adapted for this task with minor losses. The WordNet-based method uses personalized PageRank over WordNet synsets, while the distributional methods use a large Web corpus to induce distributional similarities. The paper evaluates these methods on monolingual and cross-lingual tasks, analyzes learning curves, and discusses the differences between learning similarity and relatedness scores. The results show that distributional context-window approaches perform well on the RG dataset, while WordNet-based methods with disambiguated glosses perform better on the WordSim353 dataset. The paper concludes by highlighting the complementary nature of the methods and their effectiveness in cross-lingual tasks.This paper presents and compares WordNet-based and distributional approaches to measuring semantic similarity and relatedness. The authors discuss the strengths and weaknesses of each approach and propose a combination of both methods. Each method independently achieves the best results on the RG and WordSim353 datasets, and a supervised combination of the methods yields the best published results on all datasets. The paper also explores cross-lingual similarity, demonstrating that the methods can be easily adapted for this task with minor losses. The WordNet-based method uses personalized PageRank over WordNet synsets, while the distributional methods use a large Web corpus to induce distributional similarities. The paper evaluates these methods on monolingual and cross-lingual tasks, analyzes learning curves, and discusses the differences between learning similarity and relatedness scores. The results show that distributional context-window approaches perform well on the RG dataset, while WordNet-based methods with disambiguated glosses perform better on the WordSim353 dataset. The paper concludes by highlighting the complementary nature of the methods and their effectiveness in cross-lingual tasks.