Ken Thompson developed a regular expression search algorithm and implemented it as a compiler for the IBM 7094 computer. The algorithm efficiently searches for embedded character strings in text by examining each character sequentially and building new lists of possible next characters. This approach avoids backtracking, making it faster than previous search methods. The compiler translates regular expressions into IBM 7094 code, which then processes the text to find matches.
The compiler has three stages: syntax checking, converting to reverse Polish notation, and generating object code. The third stage uses a stack to process the regular expression, with binary and unary operators combining stack entries. The compiled code includes routines for matching characters (NNODE) and branching search paths (CNODE).
The algorithm is used in a text editor and other applications, such as symbol table searches in assemblers. The runtime routines manage two lists (CLIST and NLIST) to track possible matches and search paths. The code handles special cases, such as the null regular expression, by modifying the compiled code to avoid infinite loops.
The implementation is efficient and can be extended to recognize more complex patterns, including special characters and operators. The algorithm is fast and effective, with minimal overhead for redundant searches. The paper references related works and provides a detailed description of the compiler and runtime routines.Ken Thompson developed a regular expression search algorithm and implemented it as a compiler for the IBM 7094 computer. The algorithm efficiently searches for embedded character strings in text by examining each character sequentially and building new lists of possible next characters. This approach avoids backtracking, making it faster than previous search methods. The compiler translates regular expressions into IBM 7094 code, which then processes the text to find matches.
The compiler has three stages: syntax checking, converting to reverse Polish notation, and generating object code. The third stage uses a stack to process the regular expression, with binary and unary operators combining stack entries. The compiled code includes routines for matching characters (NNODE) and branching search paths (CNODE).
The algorithm is used in a text editor and other applications, such as symbol table searches in assemblers. The runtime routines manage two lists (CLIST and NLIST) to track possible matches and search paths. The code handles special cases, such as the null regular expression, by modifying the compiled code to avoid infinite loops.
The implementation is efficient and can be extended to recognize more complex patterns, including special characters and operators. The algorithm is fast and effective, with minimal overhead for redundant searches. The paper references related works and provides a detailed description of the compiler and runtime routines.