The Google Similarity Distance

The Google Similarity Distance

VOL. 19, NO 3, MARCH 2007 | Rudi L. Cilibrasi and Paul M.B. Vitányi
The paper introduces a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity, using the World Wide Web as a database and Google as a search engine. The method, called Google similarity distance (NGD), is derived from the probabilities of web pages containing specific search terms. The authors demonstrate the effectiveness of NGD through various applications, including hierarchical clustering, classification, and language translation. They validate the method by comparing it with the WordNet database, achieving an agreement of 87% in binary classification tasks. The paper also discusses the theoretical underpinnings of the method, including Kolmogorov complexity, information distance, and normalized compression distance (NCD). The authors argue that the NGD captures the true semantic relations between words and phrases, leveraging the vast and diverse information available on the web.The paper introduces a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity, using the World Wide Web as a database and Google as a search engine. The method, called Google similarity distance (NGD), is derived from the probabilities of web pages containing specific search terms. The authors demonstrate the effectiveness of NGD through various applications, including hierarchical clustering, classification, and language translation. They validate the method by comparing it with the WordNet database, achieving an agreement of 87% in binary classification tasks. The paper also discusses the theoretical underpinnings of the method, including Kolmogorov complexity, information distance, and normalized compression distance (NCD). The authors argue that the NGD captures the true semantic relations between words and phrases, leveraging the vast and diverse information available on the web.
Reach us at info@study.space