The supplementary information provides a detailed guide on the construction of guide trees and profile Hidden Markov Models (HMMs) for sequence alignment using Clustal Omega. The guide tree construction employs the mBed scheme, which selects a small number of sequences as seeds to evaluate distances, avoiding the quadratic scalability issues of full pairwise distance matrices. The mBed vectors are then used with bisecting K-Means to create subclusters, allowing for efficient computation of distances within each cluster. Clustal Omega uses the k-tuple distance measure and Kimura-corrected pairwise aligned identities for distance matrix construction.
For sequence/profile alignment, Clustal Omega uses an adapted version of the HHalign package, converting sequences/profiles into HMMs and aligning them using either the Maximum Accuracy (MAC) algorithm or the Viterbi algorithm, depending on memory constraints. External Profile Alignment (EPA) involves aligning sequences/profiles to an external HMM, transferring pseudo-count information, and re-aligning the sequences using the MAC or Viterbi algorithm.
The iteration process in Clustal Omega involves re-aligning sequences and optionally re-calculating the guide tree. This process can be decoupled, allowing for independent control over HMM and guide tree iterations. The effects of HMM iteration are illustrated through a benchmark example, showing improvements in alignment quality.
Clustal Omega's performance is benchmarked against various alignment engines using the BALiBASE 3, PREFAB 4.0, and HomFam datasets. The results show that Clustal Omega generally outperforms other programs in terms of accuracy and scalability, especially for large datasets. The software is licensed under the GNU Lesser General Public License and is available for multiple platforms.The supplementary information provides a detailed guide on the construction of guide trees and profile Hidden Markov Models (HMMs) for sequence alignment using Clustal Omega. The guide tree construction employs the mBed scheme, which selects a small number of sequences as seeds to evaluate distances, avoiding the quadratic scalability issues of full pairwise distance matrices. The mBed vectors are then used with bisecting K-Means to create subclusters, allowing for efficient computation of distances within each cluster. Clustal Omega uses the k-tuple distance measure and Kimura-corrected pairwise aligned identities for distance matrix construction.
For sequence/profile alignment, Clustal Omega uses an adapted version of the HHalign package, converting sequences/profiles into HMMs and aligning them using either the Maximum Accuracy (MAC) algorithm or the Viterbi algorithm, depending on memory constraints. External Profile Alignment (EPA) involves aligning sequences/profiles to an external HMM, transferring pseudo-count information, and re-aligning the sequences using the MAC or Viterbi algorithm.
The iteration process in Clustal Omega involves re-aligning sequences and optionally re-calculating the guide tree. This process can be decoupled, allowing for independent control over HMM and guide tree iterations. The effects of HMM iteration are illustrated through a benchmark example, showing improvements in alignment quality.
Clustal Omega's performance is benchmarked against various alignment engines using the BALiBASE 3, PREFAB 4.0, and HomFam datasets. The results show that Clustal Omega generally outperforms other programs in terms of accuracy and scalability, especially for large datasets. The software is licensed under the GNU Lesser General Public License and is available for multiple platforms.