Volume 68–No.13, April 2013 | Wael H. Gomaa, Aly A. Fahmy
This survey by Wael H. Gomaa discusses the existing approaches to measuring text similarity, categorizing them into three main types: String-based, Corpus-based, and Knowledge-based similarities. The paper provides an overview of each category, detailing various algorithms and methods used in each approach. String-based similarity measures focus on character and term sequences, while Corpus-based measures use large corpora to determine semantic similarities. Knowledge-based measures leverage semantic networks, particularly WordNet, to assess the degree of similarity between words. The survey also explores hybrid methods that combine multiple similarity measures to improve performance. The paper concludes by highlighting the importance of text similarity in various applications and the potential of hybrid approaches for enhancing accuracy.This survey by Wael H. Gomaa discusses the existing approaches to measuring text similarity, categorizing them into three main types: String-based, Corpus-based, and Knowledge-based similarities. The paper provides an overview of each category, detailing various algorithms and methods used in each approach. String-based similarity measures focus on character and term sequences, while Corpus-based measures use large corpora to determine semantic similarities. Knowledge-based measures leverage semantic networks, particularly WordNet, to assess the degree of similarity between words. The survey also explores hybrid methods that combine multiple similarity measures to improve performance. The paper concludes by highlighting the importance of text similarity in various applications and the potential of hybrid approaches for enhancing accuracy.