[slides and audio] WordNet%3A%3ASimilarity - Measuring the Relatedness of Concepts

WordNet::Similarity is a Perl-based software package that measures the semantic similarity and relatedness between concepts (synsets) using the lexical database WordNet. It provides six similarity measures and three relatedness measures. These measures are implemented as Perl modules that take two concepts as input and return a numeric value representing their similarity or relatedness. The similarity measures include res, lin, and jcn, which are based on the information content of the least common subsumer (LCS) of two concepts. The lin and jcn measures also consider the sum of the information content of the two concepts. Other measures, such as lch, wup, and path, are based on path lengths between concepts. WordNet::Similarity also supports hypothetical root nodes for noun and verb concepts. Measures of relatedness include hso, lesk, and vector. The hso measure classifies relations in WordNet as directional and finds a path between concepts that is neither too long nor changes direction too often. The lesk measure finds overlaps between glosses of concepts and their related concepts. The vector measure creates a co-occurrence matrix for words in WordNet glosses and represents each gloss with a vector. WordNet::Similarity can be used via a command line interface (similarity.pl) or a web interface. It can also be embedded in Perl programs as a module. The package is freely available under the GNU Public License and is distributed via CPAN and SourceForge. The software is based on the WordNet database and uses various methods to calculate similarity and relatedness, including path lengths, information content, and gloss overlaps. It has been used in various research areas, including word sense disambiguation, semantic relatedness, and multiword expression evaluation. The package has been developed and maintained by researchers at the University of Minnesota and other institutions.WordNet::Similarity is a Perl-based software package that measures the semantic similarity and relatedness between concepts (synsets) using the lexical database WordNet. It provides six similarity measures and three relatedness measures. These measures are implemented as Perl modules that take two concepts as input and return a numeric value representing their similarity or relatedness. The similarity measures include res, lin, and jcn, which are based on the information content of the least common subsumer (LCS) of two concepts. The lin and jcn measures also consider the sum of the information content of the two concepts. Other measures, such as lch, wup, and path, are based on path lengths between concepts. WordNet::Similarity also supports hypothetical root nodes for noun and verb concepts. Measures of relatedness include hso, lesk, and vector. The hso measure classifies relations in WordNet as directional and finds a path between concepts that is neither too long nor changes direction too often. The lesk measure finds overlaps between glosses of concepts and their related concepts. The vector measure creates a co-occurrence matrix for words in WordNet glosses and represents each gloss with a vector. WordNet::Similarity can be used via a command line interface (similarity.pl) or a web interface. It can also be embedded in Perl programs as a module. The package is freely available under the GNU Public License and is distributed via CPAN and SourceForge. The software is based on the WordNet database and uses various methods to calculate similarity and relatedness, including path lengths, information content, and gloss overlaps. It has been used in various research areas, including word sense disambiguation, semantic relatedness, and multiword expression evaluation. The package has been developed and maintained by researchers at the University of Minnesota and other institutions.

WordNet::Similarity - Measuring the Relatedness of Concepts

March 2004 | Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi