Fast Text Searching Allowing Errors

Fast Text Searching Allowing Errors

October 1992/Vol.35, No.10 | Sun Wu and Udi Manber
The article discusses the problem of string-matching in the presence of errors, a common scenario in various applications such as text editing, DNA sequence analysis, and music transcription. The authors present a new algorithm that is fast, simple to implement, and supports a wide range of variations of the approximate string-matching problem. The algorithm is based on a numeric scheme for exact string matching developed by Baeza-Yates and Gonnet. It can handle common query types, including regular expressions and different types of closeness measures, such as edit distance. The article also describes an improvement to the main algorithm using a partition approach, which can be more efficient when the number of errors is small compared to the pattern size. Additionally, the algorithm supports extensions like sets of characters, wild cards, unknown number of errors, and nonuniform costs. The authors provide experimental results showing the efficiency of their algorithm compared to other methods, and they describe agrep, a software package based on this algorithm that has been widely used since 1991. The article concludes by highlighting the practical applications and limitations of the algorithm, emphasizing its robustness and flexibility in handling approximate string matching.The article discusses the problem of string-matching in the presence of errors, a common scenario in various applications such as text editing, DNA sequence analysis, and music transcription. The authors present a new algorithm that is fast, simple to implement, and supports a wide range of variations of the approximate string-matching problem. The algorithm is based on a numeric scheme for exact string matching developed by Baeza-Yates and Gonnet. It can handle common query types, including regular expressions and different types of closeness measures, such as edit distance. The article also describes an improvement to the main algorithm using a partition approach, which can be more efficient when the number of errors is small compared to the pattern size. Additionally, the algorithm supports extensions like sets of characters, wild cards, unknown number of errors, and nonuniform costs. The authors provide experimental results showing the efficiency of their algorithm compared to other methods, and they describe agrep, a software package based on this algorithm that has been widely used since 1991. The article concludes by highlighting the practical applications and limitations of the algorithm, emphasizing its robustness and flexibility in handling approximate string matching.
Reach us at info@study.space