June 1975 | Alfred V. Aho and Margaret J. Corasick
This paper introduces an efficient algorithm for locating all occurrences of a finite set of keywords in a text string. The algorithm involves constructing a finite state pattern matching machine from the keywords and then using this machine to process the text string in a single pass. The construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords, while the number of state transitions made by the machine is independent of the number of keywords. The authors demonstrate that this algorithm significantly improves the speed of a library bibliographic search program, reducing the running time by a factor of 5 to 10. The paper also discusses the construction of the goto, failure, and output functions for the pattern matching machine and provides algorithms for these constructions. Additionally, it explores the elimination of failure transitions using a deterministic finite automaton, which can further optimize the algorithm. The authors conclude by highlighting the suitability of the pattern matching scheme for applications involving large numbers of keywords and provide an example of its successful implementation in a library bibliographic search program.This paper introduces an efficient algorithm for locating all occurrences of a finite set of keywords in a text string. The algorithm involves constructing a finite state pattern matching machine from the keywords and then using this machine to process the text string in a single pass. The construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords, while the number of state transitions made by the machine is independent of the number of keywords. The authors demonstrate that this algorithm significantly improves the speed of a library bibliographic search program, reducing the running time by a factor of 5 to 10. The paper also discusses the construction of the goto, failure, and output functions for the pattern matching machine and provides algorithms for these constructions. Additionally, it explores the elimination of failure transitions using a deterministic finite automaton, which can further optimize the algorithm. The authors conclude by highlighting the suitability of the pattern matching scheme for applications involving large numbers of keywords and provide an example of its successful implementation in a library bibliographic search program.