PyNAST: a flexible tool for aligning sequences to a template alignment

PyNAST: a flexible tool for aligning sequences to a template alignment

November 13, 2009 | J. Gregory Caporaso¹, Kyle Bittinger², Frederic D. Bushman², Todd Z. DeSantis³, Gary L. Andersen³ and Rob Knight¹,*
PyNAST is a reimplementation of the NAST algorithm, offering improved portability and flexibility. It provides three interfaces: a Mac OS X GUI, a command-line interface, and an API. PyNAST allows users to align sequences to arbitrary template alignments, not just 16S rRNA genes. It uses the PyCogent toolkit and includes parameterized algorithms for pairwise alignment, such as BLAST, MUSCLE, MAFFT, ClustalW, or a PyCogent HMM aligner. It is an open-source software package with minimal dependencies, making it easy to install on single machines or clusters. The NAST algorithm aligns a candidate sequence to a template alignment, ensuring the output sequence is the same length as the input template. In PyNAST, users can specify any template alignment in a standard fasta file. The algorithm identifies the most similar sequence in the template alignment using BLAST, removes gaps, and aligns the candidate sequence. Gaps are reintroduced into the pairwise alignment, and then removed to match the template length. PyNAST runs faster than the original NAST, with a runtime of 1.46 seconds per sequence compared to 1.55 seconds for NAST. The algorithm's complexity is determined by the pairwise alignment step. PyNAST's availability as an open-source application with multiple interfaces allows broader application of the NAST algorithm to larger datasets and new domains. Funding was provided by various grants. No conflicts of interest were declared.PyNAST is a reimplementation of the NAST algorithm, offering improved portability and flexibility. It provides three interfaces: a Mac OS X GUI, a command-line interface, and an API. PyNAST allows users to align sequences to arbitrary template alignments, not just 16S rRNA genes. It uses the PyCogent toolkit and includes parameterized algorithms for pairwise alignment, such as BLAST, MUSCLE, MAFFT, ClustalW, or a PyCogent HMM aligner. It is an open-source software package with minimal dependencies, making it easy to install on single machines or clusters. The NAST algorithm aligns a candidate sequence to a template alignment, ensuring the output sequence is the same length as the input template. In PyNAST, users can specify any template alignment in a standard fasta file. The algorithm identifies the most similar sequence in the template alignment using BLAST, removes gaps, and aligns the candidate sequence. Gaps are reintroduced into the pairwise alignment, and then removed to match the template length. PyNAST runs faster than the original NAST, with a runtime of 1.46 seconds per sequence compared to 1.55 seconds for NAST. The algorithm's complexity is determined by the pairwise alignment step. PyNAST's availability as an open-source application with multiple interfaces allows broader application of the NAST algorithm to larger datasets and new domains. Funding was provided by various grants. No conflicts of interest were declared.
Reach us at info@study.space