Chainsaw: protein domain segmentation with fully convolutional neural networks

Chainsaw: protein domain segmentation with fully convolutional neural networks

2024 | Jude Wells, Alex Hawkins-Hooker, Nicola Bordin, Ian Sillitoe, Brooks Paige, Christine Orengo
Chainsaw is a supervised learning method for protein domain segmentation using fully convolutional neural networks (CNNs). It predicts the probability that each pair of residues belongs to the same domain and uses a greedy algorithm to assign residues to domains based on these probabilities. Chainsaw outperforms existing methods in accuracy, achieving 78% agreement with CATH domain annotations versus 72% for the next best method. When applied to AlphaFold models, expert evaluators were twice as likely to prefer Chainsaw's predictions over the next best method. Chainsaw can handle any input size without cropping or padding and is not overfit to CATH assignments. It was tested on CATH-annotated PDB structures and AlphaFold models, showing superior performance compared to other methods like Merizo, EguchiCNN, UniDoc, and SWORD. Chainsaw also performs well on AlphaFold models, with fewer domain prediction errors than the next best method. In a blind comparison of 200 AlphaFold models, Chainsaw was preferred over UniDoc in roughly twice as many cases. Chainsaw combined with Foldseek can infer functional annotations for previously uncharacterized proteins. Chainsaw's inference time is 0.6 seconds on CPU and 0.2 seconds on GPU. It is available on GitHub. The method is effective for domain prediction and can be used for downstream tasks such as identifying uncharacterized proteins and inferring functional annotations. Chainsaw's confidence score correlates with prediction accuracy and can be used to identify alternative valid assignments. It is a promising tool for protein domain segmentation and functional annotation.Chainsaw is a supervised learning method for protein domain segmentation using fully convolutional neural networks (CNNs). It predicts the probability that each pair of residues belongs to the same domain and uses a greedy algorithm to assign residues to domains based on these probabilities. Chainsaw outperforms existing methods in accuracy, achieving 78% agreement with CATH domain annotations versus 72% for the next best method. When applied to AlphaFold models, expert evaluators were twice as likely to prefer Chainsaw's predictions over the next best method. Chainsaw can handle any input size without cropping or padding and is not overfit to CATH assignments. It was tested on CATH-annotated PDB structures and AlphaFold models, showing superior performance compared to other methods like Merizo, EguchiCNN, UniDoc, and SWORD. Chainsaw also performs well on AlphaFold models, with fewer domain prediction errors than the next best method. In a blind comparison of 200 AlphaFold models, Chainsaw was preferred over UniDoc in roughly twice as many cases. Chainsaw combined with Foldseek can infer functional annotations for previously uncharacterized proteins. Chainsaw's inference time is 0.6 seconds on CPU and 0.2 seconds on GPU. It is available on GitHub. The method is effective for domain prediction and can be used for downstream tasks such as identifying uncharacterized proteins and inferring functional annotations. Chainsaw's confidence score correlates with prediction accuracy and can be used to identify alternative valid assignments. It is a promising tool for protein domain segmentation and functional annotation.
Reach us at info@futurestudyspace.com