Geometric deep learning of protein–DNA binding specificity

Geometric deep learning of protein–DNA binding specificity

September 2024 | Raktim Mitra, Jinsen Li, Jared M. Sagendorf, Yibei Jiang, Ari S. Cohen, Tsu-Pei Chiu, Cameron J. Glasscock & Remo Rohs
DeepPBS is a geometric deep learning model designed to predict protein-DNA binding specificity from protein-DNA structures. It can be applied to experimental or predicted structures and provides interpretable scores for protein residues involved in DNA binding. These scores, validated through mutagenesis experiments, enable the prediction of binding specificity for designed proteins targeting specific DNA sequences. DeepPBS bridges structure-determining and binding specificity-determining experiments, offering a foundation for machine-aided studies of molecular interactions. Transcription factors regulate gene expression and understanding their binding mechanisms is crucial. Protein-DNA binding involves various interactions, including electrostatic, stacking, and hydrogen bonding. Protein-DNA structures are typically obtained through experimental methods like X-ray crystallography and stored in the PDB. However, these structures do not capture the full range of possible DNA sequences. Experimental methods like protein-binding microarrays and SELEX-seq can capture possible DNA sequences but lack structural information. These methods are complementary, requiring manual correlation of structural and binding data. Predicting binding specificity across protein families remains challenging. Structural changes and mechanistic diversity contribute to this difficulty. DeepPBS leverages structural data to achieve generalizability across protein families. It processes protein-DNA structures as bipartite graphs, aggregating atomic information and applying geometric convolutions to predict binding specificity. DeepPBS can be used with predicted structures, improving protein-DNA complex design through feedback. It is competitive with family-specific models like rCLAMPS, being more generalizable across protein families and biological assemblies. DeepPBS provides interpretable scores for protein residues involved in DNA binding, validated against experimental data. It was applied to the p53-DNA interface, showing alignment with existing knowledge and mutagenesis experiments. DeepPBS was also applied to in silico-designed protein-DNA complexes, predicting binding specificity close to experimental data. It can analyze molecular simulation trajectories, demonstrating its utility in computational studies. DeepPBS offers a computational framework for understanding protein-DNA binding, connecting structural and specificity data. It allows exploration of family-specific patterns and their effects on binding specificity. While it requires a docked sym-helix, it provides a significant step toward solving the larger problem of predicting binding specificity. DeepPBS is applicable to both existing and synthetically designed proteins, offering a tool for experimental design and synthetic biology. It is efficient, suitable for high-throughput applications, and robust to conformational changes. Despite limitations in handling single-stranded DNA and RNA, DeepPBS shows promise in extending to other polymer-polymer interactions and mutations. Overall, DeepPBS represents a significant advancement in computational methods for protein-DNA binding studies.DeepPBS is a geometric deep learning model designed to predict protein-DNA binding specificity from protein-DNA structures. It can be applied to experimental or predicted structures and provides interpretable scores for protein residues involved in DNA binding. These scores, validated through mutagenesis experiments, enable the prediction of binding specificity for designed proteins targeting specific DNA sequences. DeepPBS bridges structure-determining and binding specificity-determining experiments, offering a foundation for machine-aided studies of molecular interactions. Transcription factors regulate gene expression and understanding their binding mechanisms is crucial. Protein-DNA binding involves various interactions, including electrostatic, stacking, and hydrogen bonding. Protein-DNA structures are typically obtained through experimental methods like X-ray crystallography and stored in the PDB. However, these structures do not capture the full range of possible DNA sequences. Experimental methods like protein-binding microarrays and SELEX-seq can capture possible DNA sequences but lack structural information. These methods are complementary, requiring manual correlation of structural and binding data. Predicting binding specificity across protein families remains challenging. Structural changes and mechanistic diversity contribute to this difficulty. DeepPBS leverages structural data to achieve generalizability across protein families. It processes protein-DNA structures as bipartite graphs, aggregating atomic information and applying geometric convolutions to predict binding specificity. DeepPBS can be used with predicted structures, improving protein-DNA complex design through feedback. It is competitive with family-specific models like rCLAMPS, being more generalizable across protein families and biological assemblies. DeepPBS provides interpretable scores for protein residues involved in DNA binding, validated against experimental data. It was applied to the p53-DNA interface, showing alignment with existing knowledge and mutagenesis experiments. DeepPBS was also applied to in silico-designed protein-DNA complexes, predicting binding specificity close to experimental data. It can analyze molecular simulation trajectories, demonstrating its utility in computational studies. DeepPBS offers a computational framework for understanding protein-DNA binding, connecting structural and specificity data. It allows exploration of family-specific patterns and their effects on binding specificity. While it requires a docked sym-helix, it provides a significant step toward solving the larger problem of predicting binding specificity. DeepPBS is applicable to both existing and synthetically designed proteins, offering a tool for experimental design and synthetic biology. It is efficient, suitable for high-throughput applications, and robust to conformational changes. Despite limitations in handling single-stranded DNA and RNA, DeepPBS shows promise in extending to other polymer-polymer interactions and mutations. Overall, DeepPBS represents a significant advancement in computational methods for protein-DNA binding studies.
Reach us at info@study.space
[slides] Geometric deep learning of protein%E2%80%93DNA binding specificity | StudySpace