Understanding Programming Techniques%3A Regular expression search algorithm

The chapter discusses a method for locating specific character strings within text using regular expressions, implemented as a compiler that translates the regular expression into IBM 7094 machine code. The compiler consists of three stages: a syntax sieve, a converter to reverse Polish form, and an object code producer. The algorithm is designed to be highly parallel and efficient, avoiding backtracking and minimizing storage and bookkeeping. The compiled code uses transfer instructions to search for all possible sequel characters in the regular expression. The implementation includes runtime routines for maintaining lists of possible matches and handling the search process. The chapter also addresses issues such as handling the null regular expression and optimizing the size of the lists to avoid redundant searches. The method is applicable in various contexts, including text editors and assemblers.The chapter discusses a method for locating specific character strings within text using regular expressions, implemented as a compiler that translates the regular expression into IBM 7094 machine code. The compiler consists of three stages: a syntax sieve, a converter to reverse Polish form, and an object code producer. The algorithm is designed to be highly parallel and efficient, avoiding backtracking and minimizing storage and bookkeeping. The compiled code uses transfer instructions to search for all possible sequel characters in the regular expression. The implementation includes runtime routines for maintaining lists of possible matches and handling the search process. The chapter also addresses issues such as handling the null regular expression and optimizing the size of the lists to avoid redundant searches. The method is applicable in various contexts, including text editors and assemblers.

Regular Expression Search Algorithm

Volume 11 / Number 6 / June, 1968 | KEN THOMPSON