Fast and accurate protein structure search with Foldseek

Fast and accurate protein structure search with Foldseek

17 February 2022 | Michel van Kempen, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, Martin Steinegger
Foldseek is a fast and accurate tool for protein structure search, designed to address the challenge of searching large databases of protein structures. It achieves this by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet, reducing structure comparisons to faster sequence alignments. Foldseek's 3D alphabet, which describes tertiary residue-residue interactions, is more efficient and sensitive than traditional backbone structural alphabets. The tool uses a prefilter to identify similar k-mers, followed by a vectorized Smith-Waterman alignment, and can also perform global alignments using TM-align. Foldseek outperforms other structural aligners in terms of speed and sensitivity, with a speed improvement of over 4,000 times compared to TM-align and Dali on a small benchmark set, and over 21,000 times on a larger dataset. Foldseek's performance is validated through benchmarking on the SCOPe dataset and a reference-free multi-domain benchmark, demonstrating its ability to detect homologous structures and improve structural analyses in biology and bioinformatics.Foldseek is a fast and accurate tool for protein structure search, designed to address the challenge of searching large databases of protein structures. It achieves this by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet, reducing structure comparisons to faster sequence alignments. Foldseek's 3D alphabet, which describes tertiary residue-residue interactions, is more efficient and sensitive than traditional backbone structural alphabets. The tool uses a prefilter to identify similar k-mers, followed by a vectorized Smith-Waterman alignment, and can also perform global alignments using TM-align. Foldseek outperforms other structural aligners in terms of speed and sensitivity, with a speed improvement of over 4,000 times compared to TM-align and Dali on a small benchmark set, and over 21,000 times on a larger dataset. Foldseek's performance is validated through benchmarking on the SCOPe dataset and a reference-free multi-domain benchmark, demonstrating its ability to detect homologous structures and improve structural analyses in biology and bioinformatics.
Reach us at info@study.space