This study proposes a method for predicting protein-protein interactions (PPIs) based solely on protein sequence information. The method uses a support vector machine (SVM) combined with a kernel function and a conjoint triad feature to describe amino acids. Over 16,000 diverse PPI pairs were used to construct the universal model. The method outperforms other sequence-based PPI prediction methods because it can predict PPI networks. Different types of PPI networks have been effectively mapped with this method, suggesting that even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. The method's prediction ability is enhanced by supplementary experimental information.
Protein-protein interactions are central to most biological processes. Determining protein interaction networks is a major goal of functional genomics. However, experimental methods only cover a fraction of complete PPI networks, so computational methods for PPI prediction are important. Several computational methods have been developed for PPI prediction, but they often require information about protein homology or interaction marks. The proposed method uses only sequence information, which is more universal but challenging in computational biology.
The method represents each protein sequence as a vector space of amino acid features and concatenates the vector spaces of two proteins to characterize PPI pairs. Amino acids are clustered into seven classes based on their physicochemical properties, reducing the dimensionality of the vector space. The conjoint triad method abstracts protein pair features based on amino acid classification. A specially designed kernel function is used for binary classification on a large dataset.
The SVM parameters C and γ were optimized using a grid search approach. The optimal values were found to be C=128 and γ=0.25. The prediction accuracy was evaluated on five test sets, showing high precision and sensitivity. The method was tested on three types of PPI networks: one-core, multiple-core, and crossover networks. The results showed that the method can effectively predict PPIs in these networks. The method's prediction ability was compared with other kernel functions, and it was found to be more accurate. The method can be used to predict PPIs in complex networks, and additional experimental information can enhance its prediction ability. The study was supported by grants from the Chinese Academy of Sciences and the Shanghai Science and Technology Commission.This study proposes a method for predicting protein-protein interactions (PPIs) based solely on protein sequence information. The method uses a support vector machine (SVM) combined with a kernel function and a conjoint triad feature to describe amino acids. Over 16,000 diverse PPI pairs were used to construct the universal model. The method outperforms other sequence-based PPI prediction methods because it can predict PPI networks. Different types of PPI networks have been effectively mapped with this method, suggesting that even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. The method's prediction ability is enhanced by supplementary experimental information.
Protein-protein interactions are central to most biological processes. Determining protein interaction networks is a major goal of functional genomics. However, experimental methods only cover a fraction of complete PPI networks, so computational methods for PPI prediction are important. Several computational methods have been developed for PPI prediction, but they often require information about protein homology or interaction marks. The proposed method uses only sequence information, which is more universal but challenging in computational biology.
The method represents each protein sequence as a vector space of amino acid features and concatenates the vector spaces of two proteins to characterize PPI pairs. Amino acids are clustered into seven classes based on their physicochemical properties, reducing the dimensionality of the vector space. The conjoint triad method abstracts protein pair features based on amino acid classification. A specially designed kernel function is used for binary classification on a large dataset.
The SVM parameters C and γ were optimized using a grid search approach. The optimal values were found to be C=128 and γ=0.25. The prediction accuracy was evaluated on five test sets, showing high precision and sensitivity. The method was tested on three types of PPI networks: one-core, multiple-core, and crossover networks. The results showed that the method can effectively predict PPIs in these networks. The method's prediction ability was compared with other kernel functions, and it was found to be more accurate. The method can be used to predict PPIs in complex networks, and additional experimental information can enhance its prediction ability. The study was supported by grants from the Chinese Academy of Sciences and the Shanghai Science and Technology Commission.