Multiple Protein Structure Alignment at Scale with FoldMason

Multiple Protein Structure Alignment at Scale with FoldMason

August 3, 2024 | Cameron L.M. Gilchrist, Milot Mirdita, and Martin Steinegger
FoldMason is a progressive multiple structural alignment (MSTA) method that leverages the structural alphabet from Foldseek to align hundreds of thousands of protein structures. It computes confidence scores, offers interactive visualizations, and provides high speed and accuracy for large-scale protein structure analysis. FoldMason is free open-source software available at foldmason.foldseek.com and search.foldseek.com/foldmason. It uses a 3Di+AA alphabet to represent protein structures as sequences, enabling fast string comparison algorithms. FoldMason performs all-vs.-all comparisons via a striped accelerated ungapped alignment and generates a minimum spanning tree to guide MSTA construction. It also includes iterative refinement to maximize LDDT scores. FoldMason achieves high accuracy comparable to state-of-the-art structure-based aligners while operating significantly faster, making it suitable for large datasets. It supports phylogenetic analysis of proteins beyond the twilight zone of sequence similarity. FoldMason's MSTAs are validated on benchmark datasets, showing superior performance compared to other tools in terms of speed and accuracy. It is particularly effective at aligning flexible protein structures and supports structural phylogenetics. FoldMason also provides interactive visualizations and a webserver for user access. The method is scalable and efficient, with applications in structural biology and phylogenetics. FoldMason is available as open-source software and is designed to be user-friendly, with a webserver for interactive visualization. It is a powerful tool for analyzing protein structures in the post-AlphaFold era.FoldMason is a progressive multiple structural alignment (MSTA) method that leverages the structural alphabet from Foldseek to align hundreds of thousands of protein structures. It computes confidence scores, offers interactive visualizations, and provides high speed and accuracy for large-scale protein structure analysis. FoldMason is free open-source software available at foldmason.foldseek.com and search.foldseek.com/foldmason. It uses a 3Di+AA alphabet to represent protein structures as sequences, enabling fast string comparison algorithms. FoldMason performs all-vs.-all comparisons via a striped accelerated ungapped alignment and generates a minimum spanning tree to guide MSTA construction. It also includes iterative refinement to maximize LDDT scores. FoldMason achieves high accuracy comparable to state-of-the-art structure-based aligners while operating significantly faster, making it suitable for large datasets. It supports phylogenetic analysis of proteins beyond the twilight zone of sequence similarity. FoldMason's MSTAs are validated on benchmark datasets, showing superior performance compared to other tools in terms of speed and accuracy. It is particularly effective at aligning flexible protein structures and supports structural phylogenetics. FoldMason also provides interactive visualizations and a webserver for user access. The method is scalable and efficient, with applications in structural biology and phylogenetics. FoldMason is available as open-source software and is designed to be user-friendly, with a webserver for interactive visualization. It is a powerful tool for analyzing protein structures in the post-AlphaFold era.
Reach us at info@study.space