January 9, 2024 | Caroline Puente-Lelievre, Ashar J. Malik, Jordan Douglas, David Ascher, Matthew Baker, Jane Allison, Anthony Poole, Daniel Lundin, Matthew Fullmer, Remco Bouckert, Hyunbin Kim, Martin Steinegger, Nicholas Matzke
The paper discusses the application of tertiary interaction (3Di) characters in structural phylogenetics, particularly in the context of the "twilight zone" where sequence similarity is highly decayed. The authors address the limitations of traditional structural phylogenetics, which are constrained by the lack of solved structures and the reliance on distance methods that do not provide statistical robustness. They introduce Foldseek, a program that encodes 3D tertiary structure interactions into a 1-D string of discrete states, enabling efficient homology searches and structural alignments. The study uses the ferritin-like superfamily as a test dataset to demonstrate the effectiveness of 3Di characters in phylogenetic inference. By combining amino acid and 3Di characters, partitioning, and custom models, the authors achieve a better match to the structural distances tree compared to structure-free analyses. The results suggest that structural phylogenetics, combined with AlphaFold structures, could become a routine practice in protein phylogenetics, allowing for the re-exploration of fundamental phylogenetic problems. The paper also highlights the computational efficiency of the approach, which reduces analysis time from weeks to minutes on desktop computers. However, it acknowledges limitations such as the sensitivity of 3Di encoding to small perturbations in protein structures and the need for further research on the statistical properties of 3Di characters.The paper discusses the application of tertiary interaction (3Di) characters in structural phylogenetics, particularly in the context of the "twilight zone" where sequence similarity is highly decayed. The authors address the limitations of traditional structural phylogenetics, which are constrained by the lack of solved structures and the reliance on distance methods that do not provide statistical robustness. They introduce Foldseek, a program that encodes 3D tertiary structure interactions into a 1-D string of discrete states, enabling efficient homology searches and structural alignments. The study uses the ferritin-like superfamily as a test dataset to demonstrate the effectiveness of 3Di characters in phylogenetic inference. By combining amino acid and 3Di characters, partitioning, and custom models, the authors achieve a better match to the structural distances tree compared to structure-free analyses. The results suggest that structural phylogenetics, combined with AlphaFold structures, could become a routine practice in protein phylogenetics, allowing for the re-exploration of fundamental phylogenetic problems. The paper also highlights the computational efficiency of the approach, which reduces analysis time from weeks to minutes on desktop computers. However, it acknowledges limitations such as the sensitivity of 3Di encoding to small perturbations in protein structures and the need for further research on the statistical properties of 3Di characters.