| Chuong B. Do, Mahathi S. P. Mahabhshyam, Michael Brudno, and Serafim Batzoglou
PROBCONS is a probabilistic consistency-based method for multiple sequence alignment of protein families. It introduces a novel scoring function for multiple sequence comparisons and provides a practical tool for progressive protein multiple sequence alignment. The method achieves statistically significant improvements over other leading methods on benchmark datasets such as BAliBASE, SABmark, and PREFAB while maintaining practical speed. PROBCONS is publicly available as a web resource and its source code is available under the GNU Public License.
The method uses a pair hidden Markov model (HMM) to specify the probability distribution over all alignments between a pair of sequences. It computes posterior-probability matrices, expected accuracies, and applies a probabilistic consistency transformation to incorporate multiple sequence conservation information during pairwise alignment. It then constructs a guide tree based on expected accuracy and performs progressive alignment. Post-processing steps such as iterative refinement are also used to improve alignment accuracy.
PROBCONS was tested on three benchmarking suites: BAliBASE 2.01, PREFAB 3.0, and SABmark 1.63. It showed clear statistically significant improvements in accuracy over other alignment tools in every benchmark test while maintaining practical running times. All parameters for the program were derived through unsupervised training methods without manual adjustments.
The results of testing on the BAliBASE benchmark alignments database showed that PROBCONS achieved the strongest performance in both SP and CS scores in all references. On the PREFAB database, PROBCONS and PROBCONS-EXT demonstrated a strong lead in SP score. On the SABmark database, PROBCONS demonstrated significantly higher fD and fM scores overall.
The comparison of PROBCONS variants showed that the use of maximum expected accuracy as an objective function and the application of the probabilistic consistency transformation were the two main features contributing to its accuracy. The methodology employed in developing the PROBCONS algorithm is straightforward and widely applicable. The results indicate that posterior-based approaches are a powerful general approach for improving alignment accuracy. Additionally, among the added features, using the probabilistic consistency transformation provided the largest accuracy improvement.PROBCONS is a probabilistic consistency-based method for multiple sequence alignment of protein families. It introduces a novel scoring function for multiple sequence comparisons and provides a practical tool for progressive protein multiple sequence alignment. The method achieves statistically significant improvements over other leading methods on benchmark datasets such as BAliBASE, SABmark, and PREFAB while maintaining practical speed. PROBCONS is publicly available as a web resource and its source code is available under the GNU Public License.
The method uses a pair hidden Markov model (HMM) to specify the probability distribution over all alignments between a pair of sequences. It computes posterior-probability matrices, expected accuracies, and applies a probabilistic consistency transformation to incorporate multiple sequence conservation information during pairwise alignment. It then constructs a guide tree based on expected accuracy and performs progressive alignment. Post-processing steps such as iterative refinement are also used to improve alignment accuracy.
PROBCONS was tested on three benchmarking suites: BAliBASE 2.01, PREFAB 3.0, and SABmark 1.63. It showed clear statistically significant improvements in accuracy over other alignment tools in every benchmark test while maintaining practical running times. All parameters for the program were derived through unsupervised training methods without manual adjustments.
The results of testing on the BAliBASE benchmark alignments database showed that PROBCONS achieved the strongest performance in both SP and CS scores in all references. On the PREFAB database, PROBCONS and PROBCONS-EXT demonstrated a strong lead in SP score. On the SABmark database, PROBCONS demonstrated significantly higher fD and fM scores overall.
The comparison of PROBCONS variants showed that the use of maximum expected accuracy as an objective function and the application of the probabilistic consistency transformation were the two main features contributing to its accuracy. The methodology employed in developing the PROBCONS algorithm is straightforward and widely applicable. The results indicate that posterior-based approaches are a powerful general approach for improving alignment accuracy. Additionally, among the added features, using the probabilistic consistency transformation provided the largest accuracy improvement.