Diffusion Models in De Novo Drug Design

Diffusion Models in De Novo Drug Design

7 Jun 2024 | Amira Alakhdar, Barnabas Poczos, and Newell Washburn
Diffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. These models, inspired by non-equilibrium statistical physics, can generate 3D molecular structures with specific properties crucial to drug discovery. They excel at learning complex probability distributions of 3D molecular geometries and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation, comparing their performance, evaluation methods, and implementation details. It covers strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges in generating stable 3D molecular structures. The review also explores applications of diffusion models in de novo drug design and related areas of computational chemistry, including structure-based drug design, target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. It discusses conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review highlights their role in advancing drug discovery and their current limitations. Diffusion models are probabilistic generative models that add noise to distort data and then reverse the process to generate samples. Current research focuses on three main formulations: denoising diffusion probabilistic models (DDPMs), score-based generative models (SGMs), and models motivated by stochastic differential equations (score SDEs). DDPMs are the most popular for molecular generation, but other formulations are also used. The denoising diffusion probabilistic model (DDPM) incorporates two Markov chains: a forward chain that transforms data into Gaussian noise and a reverse chain that converts noise back to data by learning denoising transformations. The reverse chain learns to retrieve data gradually. The performance of DDPMs is evaluated by minimizing the Kullback-Leibler divergence between forward and reverse Markov chains. Score-based generative models (SGMs) use a sequence of increasing Gaussian noise to perturb data and train a deep neural network to predict the score function. The objective loss function for SGMs is equivalent to that of DDPMs. Stochastic differential equations (score SDEs) extend DDPMs and SGMs to include infinite time steps or noise levels, involving solving stochastic differential equations where SDEs are used for noise perturbation and sample generation, and denoising is accomplished by estimating the score function of noisy data distributions. Molecular representations include SMILES and 2D/3D graphs. SMILES is a notation that translates molecular structures into one-dimensional strings, while molecular graph representations use graphs to represent molecular structures. Both atom types and bond types are encoded using one-hot embeddings. Essential requirements for diffusion models in molecular graph generation include E(3) invariance and SE(3) equivariance, permutation invariant graph generation, accounting for discreteness, capturing underlying data distributions, andDiffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. These models, inspired by non-equilibrium statistical physics, can generate 3D molecular structures with specific properties crucial to drug discovery. They excel at learning complex probability distributions of 3D molecular geometries and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation, comparing their performance, evaluation methods, and implementation details. It covers strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges in generating stable 3D molecular structures. The review also explores applications of diffusion models in de novo drug design and related areas of computational chemistry, including structure-based drug design, target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. It discusses conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review highlights their role in advancing drug discovery and their current limitations. Diffusion models are probabilistic generative models that add noise to distort data and then reverse the process to generate samples. Current research focuses on three main formulations: denoising diffusion probabilistic models (DDPMs), score-based generative models (SGMs), and models motivated by stochastic differential equations (score SDEs). DDPMs are the most popular for molecular generation, but other formulations are also used. The denoising diffusion probabilistic model (DDPM) incorporates two Markov chains: a forward chain that transforms data into Gaussian noise and a reverse chain that converts noise back to data by learning denoising transformations. The reverse chain learns to retrieve data gradually. The performance of DDPMs is evaluated by minimizing the Kullback-Leibler divergence between forward and reverse Markov chains. Score-based generative models (SGMs) use a sequence of increasing Gaussian noise to perturb data and train a deep neural network to predict the score function. The objective loss function for SGMs is equivalent to that of DDPMs. Stochastic differential equations (score SDEs) extend DDPMs and SGMs to include infinite time steps or noise levels, involving solving stochastic differential equations where SDEs are used for noise perturbation and sample generation, and denoising is accomplished by estimating the score function of noisy data distributions. Molecular representations include SMILES and 2D/3D graphs. SMILES is a notation that translates molecular structures into one-dimensional strings, while molecular graph representations use graphs to represent molecular structures. Both atom types and bond types are encoded using one-hot embeddings. Essential requirements for diffusion models in molecular graph generation include E(3) invariance and SE(3) equivariance, permutation invariant graph generation, accounting for discreteness, capturing underlying data distributions, and
Reach us at info@study.space
Understanding Diffusion Models in De Novo Drug Design