MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

January 16, 2013 | Kazutaka Katoh*1,2 and Daron M. Standley1
MAFFT version 7 is a major update of the multiple sequence alignment (MSA) program, offering improved performance and usability. It includes new features such as adding unaligned sequences to an existing alignment, adjusting nucleotide alignment direction, constrained alignment, and parallel processing. The paper discusses how these features work, along with examples of misalignments and strategies to avoid them. MAFFT is a similarity-based MSA method, but it incorporates evolutionary information. It assumes all input sequences are homologous and preserves sequence order, though sequences can be reordered based on similarity. The program has various alignment strategies, including progressive, iterative refinement, and structural alignment methods for RNAs. MAFFT version 7 also includes a subprogram, mafft-profile, for aligning two existing alignments. However, it is not suitable for adding new sequences. An alternative option, --addprofile, is safer and assumes a phylogenetic relationship between alignments. The --add option allows adding unaligned sequences to an existing MSA, assuming they derive from a branch in the tree of an existing alignment. The --addfragments option is used for adding short fragments, without considering sequence relationships. The paper discusses test cases, such as fungal internal transcribed spacers (ITS) and bacterial SSU rRNA, demonstrating how these options improve alignment accuracy and efficiency. It also highlights the importance of selecting the appropriate alignment strategy based on the problem at hand. The --addfragments option is particularly useful for handling large datasets with many fragmentary sequences. MAFFT version 7 also supports parallel processing with the --thread option, improving performance on multi-core systems. It includes enhanced options for input/output, such as adjusting sequence direction and preserving case. The program can estimate phylogenetic positions of new sequences, though more accurate methods may be needed for complex cases. The paper also discusses the use of structural information to improve MSA, including the integration of structural alignment programs like ASH with MAFFT. This approach helps align sequences with structural information, improving accuracy. The paper concludes with a discussion of future improvements and the importance of selecting appropriate parameters and strategies for different alignment tasks.MAFFT version 7 is a major update of the multiple sequence alignment (MSA) program, offering improved performance and usability. It includes new features such as adding unaligned sequences to an existing alignment, adjusting nucleotide alignment direction, constrained alignment, and parallel processing. The paper discusses how these features work, along with examples of misalignments and strategies to avoid them. MAFFT is a similarity-based MSA method, but it incorporates evolutionary information. It assumes all input sequences are homologous and preserves sequence order, though sequences can be reordered based on similarity. The program has various alignment strategies, including progressive, iterative refinement, and structural alignment methods for RNAs. MAFFT version 7 also includes a subprogram, mafft-profile, for aligning two existing alignments. However, it is not suitable for adding new sequences. An alternative option, --addprofile, is safer and assumes a phylogenetic relationship between alignments. The --add option allows adding unaligned sequences to an existing MSA, assuming they derive from a branch in the tree of an existing alignment. The --addfragments option is used for adding short fragments, without considering sequence relationships. The paper discusses test cases, such as fungal internal transcribed spacers (ITS) and bacterial SSU rRNA, demonstrating how these options improve alignment accuracy and efficiency. It also highlights the importance of selecting the appropriate alignment strategy based on the problem at hand. The --addfragments option is particularly useful for handling large datasets with many fragmentary sequences. MAFFT version 7 also supports parallel processing with the --thread option, improving performance on multi-core systems. It includes enhanced options for input/output, such as adjusting sequence direction and preserving case. The program can estimate phylogenetic positions of new sequences, though more accurate methods may be needed for complex cases. The paper also discusses the use of structural information to improve MSA, including the integration of structural alignment programs like ASH with MAFFT. This approach helps align sequences with structural information, improving accuracy. The paper concludes with a discussion of future improvements and the importance of selecting appropriate parameters and strategies for different alignment tasks.
Reach us at info@study.space