PatMaN: rapid alignment of short sequences to large databases

PatMaN: rapid alignment of short sequences to large databases

May 3, 2008 | Kay Prüfer, Udo Stenzel, Michael Dannemann, Richard E. Green, Michael Lachmann and Janet Kelso
The paper introduces PatMaN, a tool designed for rapidly aligning short nucleotide sequences to large databases, allowing for a predefined number of gaps and mismatches. The program uses a non-deterministic automata matching algorithm on a keyword tree of the search strings, enabling both queries with and without ambiguity codes. The search time is efficient for perfect matches but increases exponentially with the number of edits allowed. PatMaN is implemented in C++ and is available under the GNU General Public License. The authors demonstrate its functionality by aligning Affymetrix HGU95-A microarray probes to the chimpanzee genome, finding 15.9 million hits. The tool is particularly useful for mapping short sequences with limited edit distances and has potential applications in next-generation resequencing technology.The paper introduces PatMaN, a tool designed for rapidly aligning short nucleotide sequences to large databases, allowing for a predefined number of gaps and mismatches. The program uses a non-deterministic automata matching algorithm on a keyword tree of the search strings, enabling both queries with and without ambiguity codes. The search time is efficient for perfect matches but increases exponentially with the number of edits allowed. PatMaN is implemented in C++ and is available under the GNU General Public License. The authors demonstrate its functionality by aligning Affymetrix HGU95-A microarray probes to the chimpanzee genome, finding 15.9 million hits. The tool is particularly useful for mapping short sequences with limited edit distances and has potential applications in next-generation resequencing technology.
Reach us at info@study.space