2 April 2024 | Qiyu Liang, Nana Peng, Yi Xie, Nivedita Kumar, Weibo Gao & yansong Miao
MolPhase is an advanced algorithm for predicting protein phase separation (PS) behavior, utilizing diverse physicochemical features and extensive experimental datasets to improve accuracy and reliability. It provides a user-friendly interface to compare distinct biophysical features along protein sequences and enables efficient predictions of new phase-separating proteins. Key contributing factors include electrostatic pi-interactions, disorder, and prion-like domains. MolPhase was trained using 606 experimental-derived PS proteins and outperformed existing predictors like DeePhase, PSPredictor, FuzDrop, and PSPer. It enables efficient analysis of extensive protein sequence datasets, facilitating the identification of novel phase-separating proteins and their functional roles. MolPhase identified phytobacterial type III effectors (T3Es) as highly prone to homotypic PS, which was experimentally validated in vitro and in vivo. The physicochemical characteristics of T3Es dictate their association patterns, influencing the material properties of phase-separating droplets. MolPhase's integration of effective prediction and experimental validation shows potential to evaluate how biomolecule PS functions in biological systems.
MolPhase was developed using 606 sequences from public databases and additional manually curated sequences, resulting in a positive PS dataset (POS) and a negative PS dataset (NEG) from the Protein Data Bank (PDB). Analysis of POS and NEG proteins revealed that POS proteins have longer sequences, higher percentages of intrinsically disordered regions (IDRs) and low complexity regions (LCRs), and higher fractions of interactive domains. POS proteins also have lower fractions of charged residues and a more neutral net charge per residue. POS proteins are more hydrophilic than NEG proteins. MolPhase identified pi interaction as the most significant factor in molecular condensation, followed by IDR percentage. Glycine was the third most important feature. MolPhase's performance was evaluated using external datasets, showing high accuracy and reliability. It was compared with four other predictors, with MolPhase displaying the lowest false negative rate.
MolPhase was applied to evaluate phase separation in phytobacterial effectors in plant hosts. T3Es were analyzed for phase separation behavior in vitro and in vivo, revealing that they undergo PS both in vitro and in vivo. However, the homotypic PS from recombinant proteins showed different material properties and dynamics than the condensates formed in living cells. This suggests that diverse microenvironments may impact the significance of intrinsic features in a particular environmental setting. MolPhase's analysis of the proteomes of Xcc 8004 and Pst DC3000 revealed that T3Es have a higher propensity for PS than other proteins. MolPhase's performance was further validated by analyzing the Pseudomonas syringae Type III Effector Compendium (PsyTEC), which contains 529 T3Es. The results showed that T3Es have a higher propensity for PS than other proteins.
MolPhaseMolPhase is an advanced algorithm for predicting protein phase separation (PS) behavior, utilizing diverse physicochemical features and extensive experimental datasets to improve accuracy and reliability. It provides a user-friendly interface to compare distinct biophysical features along protein sequences and enables efficient predictions of new phase-separating proteins. Key contributing factors include electrostatic pi-interactions, disorder, and prion-like domains. MolPhase was trained using 606 experimental-derived PS proteins and outperformed existing predictors like DeePhase, PSPredictor, FuzDrop, and PSPer. It enables efficient analysis of extensive protein sequence datasets, facilitating the identification of novel phase-separating proteins and their functional roles. MolPhase identified phytobacterial type III effectors (T3Es) as highly prone to homotypic PS, which was experimentally validated in vitro and in vivo. The physicochemical characteristics of T3Es dictate their association patterns, influencing the material properties of phase-separating droplets. MolPhase's integration of effective prediction and experimental validation shows potential to evaluate how biomolecule PS functions in biological systems.
MolPhase was developed using 606 sequences from public databases and additional manually curated sequences, resulting in a positive PS dataset (POS) and a negative PS dataset (NEG) from the Protein Data Bank (PDB). Analysis of POS and NEG proteins revealed that POS proteins have longer sequences, higher percentages of intrinsically disordered regions (IDRs) and low complexity regions (LCRs), and higher fractions of interactive domains. POS proteins also have lower fractions of charged residues and a more neutral net charge per residue. POS proteins are more hydrophilic than NEG proteins. MolPhase identified pi interaction as the most significant factor in molecular condensation, followed by IDR percentage. Glycine was the third most important feature. MolPhase's performance was evaluated using external datasets, showing high accuracy and reliability. It was compared with four other predictors, with MolPhase displaying the lowest false negative rate.
MolPhase was applied to evaluate phase separation in phytobacterial effectors in plant hosts. T3Es were analyzed for phase separation behavior in vitro and in vivo, revealing that they undergo PS both in vitro and in vivo. However, the homotypic PS from recombinant proteins showed different material properties and dynamics than the condensates formed in living cells. This suggests that diverse microenvironments may impact the significance of intrinsic features in a particular environmental setting. MolPhase's analysis of the proteomes of Xcc 8004 and Pst DC3000 revealed that T3Es have a higher propensity for PS than other proteins. MolPhase's performance was further validated by analyzing the Pseudomonas syringae Type III Effector Compendium (PsyTEC), which contains 529 T3Es. The results showed that T3Es have a higher propensity for PS than other proteins.
MolPhase