MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

2024 | Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li
MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding This paper proposes MAPE-PPI, a novel method for protein-protein interaction (PPI) prediction that integrates both protein sequence and structure information. The method defines the microenvironment of an amino acid residue based on its sequence and structural contexts, capturing the surrounding chemical properties and geometric features. A microenvironment-aware protein embedding is then learned using a sufficiently large microenvironment "vocabulary" (codebook), which encodes microenvironments into chemically meaningful discrete codes. A novel pre-training strategy, Masked Codebook Modeling (MCM), is introduced to capture dependencies between different microenvironments by randomly masking the codebook and reconstructing the input. The learned microenvironment codebook is used as an off-the-shelf tool to efficiently encode proteins of different sizes and functions for large-scale PPI prediction. Extensive experiments show that MAPE-PPI outperforms state-of-the-art competitors in terms of effectiveness and computational efficiency for PPI prediction with millions of PPIs. The method achieves superior performance on various datasets, including SHS27k and SHS148k, and demonstrates strong generalization and robustness to domain shifts and structural perturbations. The codebook learning process is shown to effectively capture the distribution of amino acids and their local environments, and the method is evaluated through ablation studies and hyperparameter sensitivity analysis. Overall, MAPE-PPI provides a more effective and efficient approach to PPI prediction by integrating microenvironment information into protein embeddings.MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding This paper proposes MAPE-PPI, a novel method for protein-protein interaction (PPI) prediction that integrates both protein sequence and structure information. The method defines the microenvironment of an amino acid residue based on its sequence and structural contexts, capturing the surrounding chemical properties and geometric features. A microenvironment-aware protein embedding is then learned using a sufficiently large microenvironment "vocabulary" (codebook), which encodes microenvironments into chemically meaningful discrete codes. A novel pre-training strategy, Masked Codebook Modeling (MCM), is introduced to capture dependencies between different microenvironments by randomly masking the codebook and reconstructing the input. The learned microenvironment codebook is used as an off-the-shelf tool to efficiently encode proteins of different sizes and functions for large-scale PPI prediction. Extensive experiments show that MAPE-PPI outperforms state-of-the-art competitors in terms of effectiveness and computational efficiency for PPI prediction with millions of PPIs. The method achieves superior performance on various datasets, including SHS27k and SHS148k, and demonstrates strong generalization and robustness to domain shifts and structural perturbations. The codebook learning process is shown to effectively capture the distribution of amino acids and their local environments, and the method is evaluated through ablation studies and hyperparameter sensitivity analysis. Overall, MAPE-PPI provides a more effective and efficient approach to PPI prediction by integrating microenvironment information into protein embeddings.
Reach us at info@study.space