[slides and audio] Using Information Content to Evaluate Semantic Similarity in a Taxonomy

This paper introduces a new method for measuring semantic similarity in an IS-A taxonomy based on information content. The method is compared to the traditional edge counting approach and shows significantly better performance, with a correlation of r = 0.79 with human similarity judgments, compared to r = 0.66 for edge counting. The information content approach is based on the idea that the more information two concepts share, the more similar they are. It calculates similarity by finding the concept that subsumes both concepts and has the highest information content. This approach is more robust to varying link distances in the taxonomy and can adapt to different contexts by combining taxonomic structure with empirical probability estimates. The method was evaluated using a large, independently constructed corpus and previously existing human subject data. The results show that the information content approach performs well, with a correlation of r = 0.79 with human similarity judgments, and significantly better than the traditional edge counting approach. The paper also discusses related work, including the Leacock and Chodorow method, which uses normalized path length instead of information content. While this method also performs well, the information content approach was found to be more effective in the experiments described. The paper concludes that the information content approach provides a promising way to measure semantic similarity in an IS-A taxonomy. It is a hybrid approach that combines corpus-based statistical methods with knowledge-based taxonomic information. The approach is particularly useful in resolving syntactic ambiguity using semantic information. The paper also discusses the challenges of measuring semantic similarity, including the need to consider the relationship among word senses and the potential for inappropriate word senses to produce spuriously high similarity measures. The paper suggests that future research should explore the relationship between the two algorithms and further study the effectiveness of the information content approach in different contexts.This paper introduces a new method for measuring semantic similarity in an IS-A taxonomy based on information content. The method is compared to the traditional edge counting approach and shows significantly better performance, with a correlation of r = 0.79 with human similarity judgments, compared to r = 0.66 for edge counting. The information content approach is based on the idea that the more information two concepts share, the more similar they are. It calculates similarity by finding the concept that subsumes both concepts and has the highest information content. This approach is more robust to varying link distances in the taxonomy and can adapt to different contexts by combining taxonomic structure with empirical probability estimates. The method was evaluated using a large, independently constructed corpus and previously existing human subject data. The results show that the information content approach performs well, with a correlation of r = 0.79 with human similarity judgments, and significantly better than the traditional edge counting approach. The paper also discusses related work, including the Leacock and Chodorow method, which uses normalized path length instead of information content. While this method also performs well, the information content approach was found to be more effective in the experiments described. The paper concludes that the information content approach provides a promising way to measure semantic similarity in an IS-A taxonomy. It is a hybrid approach that combines corpus-based statistical methods with knowledge-based taxonomic information. The approach is particularly useful in resolving syntactic ambiguity using semantic information. The paper also discusses the challenges of measuring semantic similarity, including the need to consider the relationship among word senses and the potential for inappropriate word senses to produce spuriously high similarity measures. The paper suggests that future research should explore the relationship between the two algorithms and further study the effectiveness of the information content approach in different contexts.

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

1995 | Philip Resnik