This paper presents a new algorithm for constructing auxiliary digital search trees to aid in exact-match substring searching. The algorithm, called Algorithm M, has the same asymptotic running time as previously published algorithms but is more space-efficient. It is designed to construct an index structure that allows substring searches to be completed in time linear in the length of the substring. The algorithm is particularly useful for dynamically changing databases, such as text editors.
Algorithm M constructs a digital search tree T, where paths represent suffixes of the string S, and terminal nodes correspond to positions within S. The algorithm processes suffixes of S from longest to shortest, inserting them into the tree. It uses constraints to ensure that the tree is a multiway Patricia tree, which contains at most n nonterminal nodes, where n is the length of S. The algorithm uses suffix links to efficiently find the locus of a string, reducing the time required for searches.
The algorithm is implemented using a hash table to represent the tree, which is very compact and efficient in terms of space. The algorithm also includes an incremental update mechanism, allowing the tree to be modified efficiently in response to changes in the string S. This is particularly useful for applications where the string is frequently updated, such as text editors.
The paper also discusses the time complexity of Algorithm M, showing that it runs in linear time relative to the length of the input string. The algorithm's space efficiency is demonstrated by comparing it to other algorithms, showing that it requires about 25% less data space. The paper concludes that Algorithm M is a significant improvement over previous algorithms for constructing suffix trees, offering both time and space efficiency.This paper presents a new algorithm for constructing auxiliary digital search trees to aid in exact-match substring searching. The algorithm, called Algorithm M, has the same asymptotic running time as previously published algorithms but is more space-efficient. It is designed to construct an index structure that allows substring searches to be completed in time linear in the length of the substring. The algorithm is particularly useful for dynamically changing databases, such as text editors.
Algorithm M constructs a digital search tree T, where paths represent suffixes of the string S, and terminal nodes correspond to positions within S. The algorithm processes suffixes of S from longest to shortest, inserting them into the tree. It uses constraints to ensure that the tree is a multiway Patricia tree, which contains at most n nonterminal nodes, where n is the length of S. The algorithm uses suffix links to efficiently find the locus of a string, reducing the time required for searches.
The algorithm is implemented using a hash table to represent the tree, which is very compact and efficient in terms of space. The algorithm also includes an incremental update mechanism, allowing the tree to be modified efficiently in response to changes in the string S. This is particularly useful for applications where the string is frequently updated, such as text editors.
The paper also discusses the time complexity of Algorithm M, showing that it runs in linear time relative to the length of the input string. The algorithm's space efficiency is demonstrated by comparing it to other algorithms, showing that it requires about 25% less data space. The paper concludes that Algorithm M is a significant improvement over previous algorithms for constructing suffix trees, offering both time and space efficiency.