May 2024 | Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu & Tie-Yan Liu
A deep learning framework called Distributional Graphormer (DiG) is introduced to predict the equilibrium distribution of molecular systems. Unlike traditional methods that focus on predicting a single structure, DiG generates diverse conformations and estimates state densities more efficiently. It is inspired by thermodynamic annealing and uses deep neural networks to transform a simple distribution toward the equilibrium distribution, conditioned on molecular descriptors like chemical graphs or protein sequences. DiG has been applied to various molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst–adsorbate sampling, and property-guided structure generation. It demonstrates significant improvements in predicting equilibrium distributions, enabling efficient sampling of diverse and function-relevant structures. DiG can generalize across molecular systems and generate structures that resemble experimental observations. It also supports inverse design by biasing distributions toward desired properties. The framework uses a diffusion process based on the Graphormer architecture, conditioned on molecular descriptors, and can be trained with experimental or simulation data. For data-scarce cases, a physics-informed diffusion pre-training (PIDP) method is developed. DiG has been evaluated on three tasks: protein structure distribution, ligand conformation distribution in binding pockets, and molecular adsorption on catalyst surfaces. It generates realistic and diverse structures, showing high accuracy in predicting conformations and binding poses. DiG also enables efficient prediction of thermodynamic properties by estimating equilibrium distributions. The framework offers a substantial advancement in deep learning for molecular systems, opening new research opportunities in molecular sciences.A deep learning framework called Distributional Graphormer (DiG) is introduced to predict the equilibrium distribution of molecular systems. Unlike traditional methods that focus on predicting a single structure, DiG generates diverse conformations and estimates state densities more efficiently. It is inspired by thermodynamic annealing and uses deep neural networks to transform a simple distribution toward the equilibrium distribution, conditioned on molecular descriptors like chemical graphs or protein sequences. DiG has been applied to various molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst–adsorbate sampling, and property-guided structure generation. It demonstrates significant improvements in predicting equilibrium distributions, enabling efficient sampling of diverse and function-relevant structures. DiG can generalize across molecular systems and generate structures that resemble experimental observations. It also supports inverse design by biasing distributions toward desired properties. The framework uses a diffusion process based on the Graphormer architecture, conditioned on molecular descriptors, and can be trained with experimental or simulation data. For data-scarce cases, a physics-informed diffusion pre-training (PIDP) method is developed. DiG has been evaluated on three tasks: protein structure distribution, ligand conformation distribution in binding pockets, and molecular adsorption on catalyst surfaces. It generates realistic and diverse structures, showing high accuracy in predicting conformations and binding poses. DiG also enables efficient prediction of thermodynamic properties by estimating equilibrium distributions. The framework offers a substantial advancement in deep learning for molecular systems, opening new research opportunities in molecular sciences.