This research article presents a new method for predicting protein residue-residue contacts using support vector machines (SVMs) and a large set of informative features. The method, called SVMcon, outperforms the latest version of the CMAPpro contact map predictor on the same test dataset, achieving a 4% higher accuracy. SVMcon was also evaluated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment, where it was ranked among the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation ≥12 on 13 de novo domains.
The method integrates various features, including profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful information. It uses SVMs to predict medium- and long-range contacts, which are not captured by local secondary structures. SVMcon was tested on a dataset of 485 proteins for training and 48 proteins for testing, with a sequence identity below 25% between the training and testing datasets. The performance of SVMcon was evaluated using sensitivity and specificity, with SVMcon showing a 4% higher accuracy than CMAPpro at the break-even point.
The results show that SVMcon achieves good performance in predicting medium- to long-range contacts and can be modularly incorporated into a structure prediction pipeline. The method also demonstrates improved accuracy for proteins with beta-sheets compared to those with alpha helices, likely due to the strong restraints between beta-strands. The study highlights the importance of contact prediction in protein structure prediction and folding, and suggests that further research is needed to improve the accuracy of contact map predictions. The results indicate that while contact prediction accuracy is still low, it is an important step towards achieving the milestone of about 30% accuracy required for deriving moderately accurate 3D protein structures from scratch.This research article presents a new method for predicting protein residue-residue contacts using support vector machines (SVMs) and a large set of informative features. The method, called SVMcon, outperforms the latest version of the CMAPpro contact map predictor on the same test dataset, achieving a 4% higher accuracy. SVMcon was also evaluated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment, where it was ranked among the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation ≥12 on 13 de novo domains.
The method integrates various features, including profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful information. It uses SVMs to predict medium- and long-range contacts, which are not captured by local secondary structures. SVMcon was tested on a dataset of 485 proteins for training and 48 proteins for testing, with a sequence identity below 25% between the training and testing datasets. The performance of SVMcon was evaluated using sensitivity and specificity, with SVMcon showing a 4% higher accuracy than CMAPpro at the break-even point.
The results show that SVMcon achieves good performance in predicting medium- to long-range contacts and can be modularly incorporated into a structure prediction pipeline. The method also demonstrates improved accuracy for proteins with beta-sheets compared to those with alpha helices, likely due to the strong restraints between beta-strands. The study highlights the importance of contact prediction in protein structure prediction and folding, and suggests that further research is needed to improve the accuracy of contact map predictions. The results indicate that while contact prediction accuracy is still low, it is an important step towards achieving the milestone of about 30% accuracy required for deriving moderately accurate 3D protein structures from scratch.