MAFFT version 5: improvement in accuracy of multiple sequence alignment

MAFFT version 5: improvement in accuracy of multiple sequence alignment

January 20, 2005 | Kazutaka Katoh, Kei-ichi Kuma, Hiroyuki Toh and Takashi Miyata
MAFFT version 5.3 improves the accuracy of multiple sequence alignment. The new options H-INS-i, F-INS-i, and G-INS-i incorporate pairwise alignment information into the objective function, achieving higher accuracy than existing methods like TCoffee and CLUSTAL W in benchmark tests. These options can handle hundreds of sequences on a standard desktop computer. Including close homologues (E-value < 10^-5 to 10^-20) significantly improves accuracy, especially for low-similarity alignments. A Ruby script, mafftE.rb, automatically aligns input sequences with their homologues from SwissProt using NCBI-BLAST. The new strategies [GHF]-INS-i use a TCoffee-like approach, incorporating all pairwise alignment information into the objective function. The accuracy of multiple alignments improves when sequences are aligned with homologues, and this effect is more pronounced for the new options. The new parameter set, optimized for the TWIf+0 dataset, improved accuracy by about 5 percentage points. The accuracy of [GHF]-INS-i was slightly better than FFT-NS-i, but the latter remains accurate for less demanding alignments. The new strategies outperformed other methods in accuracy for alignments with many sequences. The accuracy of multiple alignments increased with the number of homologues, and this effect was most significant for MAFFT. The position-specific gap penalty was found to be important for accurate alignment. The results suggest that including many homologues is crucial for accurate sequence alignment. Further improvements in accuracy and speed are needed, including the integration of structural information and more efficient homologue selection.MAFFT version 5.3 improves the accuracy of multiple sequence alignment. The new options H-INS-i, F-INS-i, and G-INS-i incorporate pairwise alignment information into the objective function, achieving higher accuracy than existing methods like TCoffee and CLUSTAL W in benchmark tests. These options can handle hundreds of sequences on a standard desktop computer. Including close homologues (E-value < 10^-5 to 10^-20) significantly improves accuracy, especially for low-similarity alignments. A Ruby script, mafftE.rb, automatically aligns input sequences with their homologues from SwissProt using NCBI-BLAST. The new strategies [GHF]-INS-i use a TCoffee-like approach, incorporating all pairwise alignment information into the objective function. The accuracy of multiple alignments improves when sequences are aligned with homologues, and this effect is more pronounced for the new options. The new parameter set, optimized for the TWIf+0 dataset, improved accuracy by about 5 percentage points. The accuracy of [GHF]-INS-i was slightly better than FFT-NS-i, but the latter remains accurate for less demanding alignments. The new strategies outperformed other methods in accuracy for alignments with many sequences. The accuracy of multiple alignments increased with the number of homologues, and this effect was most significant for MAFFT. The position-specific gap penalty was found to be important for accurate alignment. The results suggest that including many homologues is crucial for accurate sequence alignment. Further improvements in accuracy and speed are needed, including the integration of structural information and more efficient homologue selection.
Reach us at info@study.space
Understanding MAFFT version 5%3A improvement in accuracy of multiple sequence alignment