The article discusses recent developments in the MAFFT multiple sequence alignment (MSA) program, focusing on its accuracy and scalability. MAFFT was initially developed in 2002 to rapidly construct reasonable MSAs for large datasets. The authors highlight the challenges in both scalability and accuracy, particularly with the increasing amounts of sequence data from large-scale sequencing projects and the discovery of functional noncoding RNAs (ncRNAs). To address these issues, MAFFT was updated to Version 6 in 2007, incorporating two new techniques: the PartTree algorithm for improved scalability and a Four-way consistency objective function for enhanced accuracy in ncRNA alignment.
The PartTree algorithm is a scalable tree-building method that reduces the time complexity from $O(N^2)$ to $O(N \log N)$, making it suitable for aligning large datasets of up to 60,000 sequences. The Four-way Consistency objective function, used in the G-INS-i and L-INS-i options, incorporates secondary structure information to improve the accuracy of ncRNA alignments.
The article also reviews other features of MAFFT, such as the iterative refinement method with the WSP score and the consistency-based method, and discusses future directions, including the consideration of RNA and protein structures, scalability improvements, and the integration of biological knowledge into MSA construction. The availability of MAFFT and its web server for alignment and phylogenetic inference is mentioned, along with benchmark results and comparisons with other alignment programs.The article discusses recent developments in the MAFFT multiple sequence alignment (MSA) program, focusing on its accuracy and scalability. MAFFT was initially developed in 2002 to rapidly construct reasonable MSAs for large datasets. The authors highlight the challenges in both scalability and accuracy, particularly with the increasing amounts of sequence data from large-scale sequencing projects and the discovery of functional noncoding RNAs (ncRNAs). To address these issues, MAFFT was updated to Version 6 in 2007, incorporating two new techniques: the PartTree algorithm for improved scalability and a Four-way consistency objective function for enhanced accuracy in ncRNA alignment.
The PartTree algorithm is a scalable tree-building method that reduces the time complexity from $O(N^2)$ to $O(N \log N)$, making it suitable for aligning large datasets of up to 60,000 sequences. The Four-way Consistency objective function, used in the G-INS-i and L-INS-i options, incorporates secondary structure information to improve the accuracy of ncRNA alignments.
The article also reviews other features of MAFFT, such as the iterative refinement method with the WSP score and the consistency-based method, and discusses future directions, including the consideration of RNA and protein structures, scalability improvements, and the integration of biological knowledge into MSA construction. The availability of MAFFT and its web server for alignment and phylogenetic inference is mentioned, along with benchmark results and comparisons with other alignment programs.