Chainsaw: protein domain segmentation with fully convolutional neural networks

Chainsaw: protein domain segmentation with fully convolutional neural networks

8 May 2024 | Jude Wells, Alex Hawkins-Hooker, Nicola Bordin, Ian Sillitoe, Brooks Paige, Christine Orengo
Chainsaw is a supervised learning approach for protein domain segmentation that outperforms current state-of-the-art methods. It uses a fully convolutional neural network (CNN) to predict the probability that pairs of residues belong in the same domain. Domain assignments are derived from these pairwise probabilities using a greedy algorithm. Chainsaw achieves 78% accuracy in matching CATH domain annotations and is twice as preferred by expert human evaluators compared to the next best method when predicting on AlphaFold models. The method is available on GitHub and can be used to infer functional annotations for uncharacterized proteins. Chainsaw's performance is superior to other supervised and unsupervised methods, including Merizo, EguchiCNN, UniDoc, PUU, and SWORD, on both CATH-annotated PDB structures and AlphaFold models. The confidence score of Chainsaw's predictions correlates well with ground-truth accuracy, and the method can handle proteins of any size without cropping or padding.Chainsaw is a supervised learning approach for protein domain segmentation that outperforms current state-of-the-art methods. It uses a fully convolutional neural network (CNN) to predict the probability that pairs of residues belong in the same domain. Domain assignments are derived from these pairwise probabilities using a greedy algorithm. Chainsaw achieves 78% accuracy in matching CATH domain annotations and is twice as preferred by expert human evaluators compared to the next best method when predicting on AlphaFold models. The method is available on GitHub and can be used to infer functional annotations for uncharacterized proteins. Chainsaw's performance is superior to other supervised and unsupervised methods, including Merizo, EguchiCNN, UniDoc, PUU, and SWORD, on both CATH-annotated PDB structures and AlphaFold models. The confidence score of Chainsaw's predictions correlates well with ground-truth accuracy, and the method can handle proteins of any size without cropping or padding.
Reach us at info@study.space
[slides] Chainsaw%3A protein domain segmentation with fully convolutional neural networks | StudySpace