Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

1997, Taiwan | Jay J. Jiang, David W. Conrath
This paper introduces a novel approach to measuring semantic similarity between words and concepts, combining lexical taxonomy with corpus statistical information. The proposed method integrates edge-based and node-based approaches to enhance the quantification of semantic distance in a taxonomy-generated semantic space. Specifically, it combines the edge counting scheme's edge-based approach with the information content calculation's node-based approach. The method is evaluated using a common dataset of word pair similarity ratings, outperforming other computational models with a correlation value of 0.828, close to the upper bound of 0.885 achieved by human subjects. The paper discusses the advantages and limitations of both edge-based and node-based approaches, highlighting the strengths and weaknesses of each. The proposed combined approach is shown to be effective, particularly in word sense disambiguation and information retrieval tasks. The study also explores the impact of taxonomy structure on semantic distance calculations and provides insights into the optimal parameter settings for the proposed model.This paper introduces a novel approach to measuring semantic similarity between words and concepts, combining lexical taxonomy with corpus statistical information. The proposed method integrates edge-based and node-based approaches to enhance the quantification of semantic distance in a taxonomy-generated semantic space. Specifically, it combines the edge counting scheme's edge-based approach with the information content calculation's node-based approach. The method is evaluated using a common dataset of word pair similarity ratings, outperforming other computational models with a correlation value of 0.828, close to the upper bound of 0.885 achieved by human subjects. The paper discusses the advantages and limitations of both edge-based and node-based approaches, highlighting the strengths and weaknesses of each. The proposed combined approach is shown to be effective, particularly in word sense disambiguation and information retrieval tasks. The study also explores the impact of taxonomy structure on semantic distance calculations and provides insights into the optimal parameter settings for the proposed model.
Reach us at info@study.space