Understanding Sentence similarity based on semantic nets and corpus statistics

This paper addresses the challenge of computing sentence similarity, particularly for very short texts, by developing an algorithm that integrates semantic and syntactic information. The proposed method uses a lexical database and corpus statistics to calculate semantic similarity and word order similarity, respectively. The semantic similarity is derived from the hierarchical structure of a lexical knowledge base (e.g., WordNet) by considering both the shortest path length and depth of subsumption between words. The word order similarity is measured using a normalized difference of word order vectors. The overall sentence similarity is a combination of these two measures, weighted by a parameter. The method is evaluated through experiments with human participants, demonstrating its effectiveness in capturing human intuition about sentence similarity. The paper also discusses the limitations of existing methods and highlights the adaptability and efficiency of the proposed approach.This paper addresses the challenge of computing sentence similarity, particularly for very short texts, by developing an algorithm that integrates semantic and syntactic information. The proposed method uses a lexical database and corpus statistics to calculate semantic similarity and word order similarity, respectively. The semantic similarity is derived from the hierarchical structure of a lexical knowledge base (e.g., WordNet) by considering both the shortest path length and depth of subsumption between words. The word order similarity is measured using a normalized difference of word order vectors. The overall sentence similarity is a combination of these two measures, weighted by a parameter. The method is evaluated through experiments with human participants, demonstrating its effectiveness in capturing human intuition about sentence similarity. The paper also discusses the limitations of existing methods and highlights the adaptability and efficiency of the proposed approach.

Sentence Similarity Based on Semantic Nets and Corpus Statistics

| Yuhua Li, David McLean, Zuhair Bandar, James D. O'Shea, Keeley Crockett