June 3, 2024 | Sören von Bülow, Giulio Tesei, and Kresten Lindorff-Larsen
A machine learning model was developed to predict phase separation (PS) propensities of intrinsically disordered proteins (IDRs) from their sequence. The model combines coarse-grained molecular dynamics simulations with active learning to estimate free energy and saturation concentration for PS directly from sequence. The model was validated using experimental and computational data and applied to 27,663 IDRs in the human proteome, predicting that 1,420 (5%) undergo homotypic PS with transfer free energies < -2kBT. The model reveals that changes from charge- to hydrophobicity-mediated interactions can break the symmetry between intra- and inter-molecular interactions. Structural preferences at condensate interfaces were analyzed, showing substantial heterogeneity determined by the same sequence properties as PS. The model refines established rules governing sequence features and PS propensities, offering tools for interpreting and designing experiments on phase separation. The model also enables the study of structural properties of condensates and their interfaces. The model's predictions are accurate and efficient, with RMSD < 1 for transfer free energy and ln c_sat. The model's performance was validated against experimental data, showing strong correlation with simulation results. The model's predictions are interpretable, showing that sequence features like hydrophobicity, charge patterning, and single-chain scaling exponent correlate with PS propensities. The model's predictions are transferable and can be used to design IDRs with specific PS propensities. The model's predictions are also useful for understanding the relationship between sequence features and PS propensities, and for studying the structural properties of condensates and their interfaces. The model's predictions are accurate and efficient, with RMSD < 1 for transfer free energy and ln c_sat. The model's predictions are validated against experimental data, showing strong correlation with simulation results. The model's predictions are interpretable, showing that sequence features like hydrophobicity, charge patterning, and single-chain scaling exponent correlate with PS propensities. The model's predictions are transferable and can be used to design IDRs with specific PS propensities. The model's predictions are also useful for understanding the relationship between sequence features and PS propensities, and for studying the structural properties of condensates and their interfaces.A machine learning model was developed to predict phase separation (PS) propensities of intrinsically disordered proteins (IDRs) from their sequence. The model combines coarse-grained molecular dynamics simulations with active learning to estimate free energy and saturation concentration for PS directly from sequence. The model was validated using experimental and computational data and applied to 27,663 IDRs in the human proteome, predicting that 1,420 (5%) undergo homotypic PS with transfer free energies < -2kBT. The model reveals that changes from charge- to hydrophobicity-mediated interactions can break the symmetry between intra- and inter-molecular interactions. Structural preferences at condensate interfaces were analyzed, showing substantial heterogeneity determined by the same sequence properties as PS. The model refines established rules governing sequence features and PS propensities, offering tools for interpreting and designing experiments on phase separation. The model also enables the study of structural properties of condensates and their interfaces. The model's predictions are accurate and efficient, with RMSD < 1 for transfer free energy and ln c_sat. The model's performance was validated against experimental data, showing strong correlation with simulation results. The model's predictions are interpretable, showing that sequence features like hydrophobicity, charge patterning, and single-chain scaling exponent correlate with PS propensities. The model's predictions are transferable and can be used to design IDRs with specific PS propensities. The model's predictions are also useful for understanding the relationship between sequence features and PS propensities, and for studying the structural properties of condensates and their interfaces. The model's predictions are accurate and efficient, with RMSD < 1 for transfer free energy and ln c_sat. The model's predictions are validated against experimental data, showing strong correlation with simulation results. The model's predictions are interpretable, showing that sequence features like hydrophobicity, charge patterning, and single-chain scaling exponent correlate with PS propensities. The model's predictions are transferable and can be used to design IDRs with specific PS propensities. The model's predictions are also useful for understanding the relationship between sequence features and PS propensities, and for studying the structural properties of condensates and their interfaces.