This paper evaluates five measures of lexical semantic relatedness based on WordNet, comparing their performance in detecting and correcting real-word spelling errors. The measures include those proposed by Jiang and Conradt, Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. The study finds that Jiang and Conradt's measure is superior to the others. It also explains why distributional similarity is not a good proxy for lexical semantic relatedness.
The paper discusses various approaches to measuring semantic relatedness, including dictionary-based methods, Roget-structured thesauri, and WordNet-based approaches. It describes how WordNet is structured, with a subsumption hierarchy and various relations between synsets. It also discusses methods for computing taxonomic path length and scaling the network to account for varying link distances.
The paper then presents information-based and integrated approaches, including Resnik's information-based approach, Jiang and Conrath's combined approach, and Lin's universal similarity measure. These approaches use information content and corpus statistics to compute semantic relatedness.
The paper evaluates these measures by comparing them with human ratings of semantic relatedness. It finds that Jiang and Conradt's measure has the highest correlation with human ratings. It also discusses the limitations of this analysis, including the small amount of data available and the difficulty of obtaining reliable human judgments.
Finally, the paper presents an application-based evaluation of the measures, using the detection and correction of real-word spelling errors as a testbed. It describes a malapropism corrector that uses semantic relatedness measures to detect and correct spelling errors. The study finds that the performance of the corrector is limited by the assumptions made, but the measures are compared fairly as they are affected equally. The results show that Jiang and Conradt's measure performs best in this application.This paper evaluates five measures of lexical semantic relatedness based on WordNet, comparing their performance in detecting and correcting real-word spelling errors. The measures include those proposed by Jiang and Conradt, Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. The study finds that Jiang and Conradt's measure is superior to the others. It also explains why distributional similarity is not a good proxy for lexical semantic relatedness.
The paper discusses various approaches to measuring semantic relatedness, including dictionary-based methods, Roget-structured thesauri, and WordNet-based approaches. It describes how WordNet is structured, with a subsumption hierarchy and various relations between synsets. It also discusses methods for computing taxonomic path length and scaling the network to account for varying link distances.
The paper then presents information-based and integrated approaches, including Resnik's information-based approach, Jiang and Conrath's combined approach, and Lin's universal similarity measure. These approaches use information content and corpus statistics to compute semantic relatedness.
The paper evaluates these measures by comparing them with human ratings of semantic relatedness. It finds that Jiang and Conradt's measure has the highest correlation with human ratings. It also discusses the limitations of this analysis, including the small amount of data available and the difficulty of obtaining reliable human judgments.
Finally, the paper presents an application-based evaluation of the measures, using the detection and correction of real-word spelling errors as a testbed. It describes a malapropism corrector that uses semantic relatedness measures to detect and correct spelling errors. The study finds that the performance of the corrector is limited by the assumptions made, but the measures are compared fairly as they are affected equally. The results show that Jiang and Conradt's measure performs best in this application.