[slides and audio] A fast string searching algorithm

The paper presents a fast string searching algorithm that efficiently locates the first occurrence of a character string, "pat," within another string, "string." The algorithm starts matching characters from the end of "pat" and uses statistical information to make large jumps through "string," reducing the number of characters inspected on average. For a random English pattern of length 5, the algorithm typically inspects only about one-quarter of the characters in "string" before finding a match. The algorithm is implemented to execute fewer than \(i + \text{patlen}\) machine instructions on average, making it faster than both the Knuth-Morris-Pratt (KMP) algorithm and a simple search algorithm. The performance is supported by empirical evidence and theoretical analysis, which show that the algorithm is sublinear in the number of references to "string" and instructions executed. The worst-case behavior is linear in \(i + \text{patlen}\), assuming sufficient array space. The paper also discusses implementation considerations, empirical results, and theoretical analysis, highlighting the algorithm's efficiency and limitations.The paper presents a fast string searching algorithm that efficiently locates the first occurrence of a character string, "pat," within another string, "string." The algorithm starts matching characters from the end of "pat" and uses statistical information to make large jumps through "string," reducing the number of characters inspected on average. For a random English pattern of length 5, the algorithm typically inspects only about one-quarter of the characters in "string" before finding a match. The algorithm is implemented to execute fewer than \(i + \text{patlen}\) machine instructions on average, making it faster than both the Knuth-Morris-Pratt (KMP) algorithm and a simple search algorithm. The performance is supported by empirical evidence and theoretical analysis, which show that the algorithm is sublinear in the number of references to "string" and instructions executed. The worst-case behavior is linear in \(i + \text{patlen}\), assuming sufficient array space. The paper also discusses implementation considerations, empirical results, and theoretical analysis, highlighting the algorithm's efficiency and limitations.

A Fast String Searching Algorithm

October 1977 | Robert S. Boyer, J Strother Moore