September 14, 2011 | Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert Müller, O. Anatole von Lilienfeld
This paper introduces a machine learning (ML) model to predict atomization energies of organic molecules based on nuclear charges and atomic positions. The model maps the quantum mechanical problem of solving the Schrödinger equation onto a non-linear statistical regression problem. It is trained on atomization energies calculated using hybrid density-functional theory (DFT) and achieves a mean absolute error of ~10 kcal/mol when validated on over 7,000 small organic molecules. The model is capable of predicting molecular atomization potential energy curves and is highly efficient, with predictions made in milliseconds rather than hours on a conventional CPU.
The model uses a "Coulomb" matrix to represent molecular structures, which captures both nuclear charges and atomic positions. This representation is invariant to translations, rotations, and atom permutations, making it suitable for statistical modeling. The ML model is trained using kernel ridge regression and achieves high accuracy, outperforming bond counting and semi-empirical quantum chemistry methods. The model's performance is validated using cross-validation and is shown to be transferable to new molecular systems.
The model is tested on a diverse set of molecules from the GDB database, including those with different chemical structures and compositions. It successfully predicts atomization energies for molecules not used in training, demonstrating its applicability to chemical compound space. The model is also shown to accurately predict atomization energy curves for molecules beyond equilibrium geometries, including those with different bond types and functional groups.
The results indicate that the ML approach is highly accurate and efficient, with potential applications in rational compound design, geometrical relaxations, chemical reactions, and molecular dynamics. The study highlights the potential of ML in modeling molecular properties and suggests that the Coulomb matrix or its improvements could be useful as descriptors in various applications.This paper introduces a machine learning (ML) model to predict atomization energies of organic molecules based on nuclear charges and atomic positions. The model maps the quantum mechanical problem of solving the Schrödinger equation onto a non-linear statistical regression problem. It is trained on atomization energies calculated using hybrid density-functional theory (DFT) and achieves a mean absolute error of ~10 kcal/mol when validated on over 7,000 small organic molecules. The model is capable of predicting molecular atomization potential energy curves and is highly efficient, with predictions made in milliseconds rather than hours on a conventional CPU.
The model uses a "Coulomb" matrix to represent molecular structures, which captures both nuclear charges and atomic positions. This representation is invariant to translations, rotations, and atom permutations, making it suitable for statistical modeling. The ML model is trained using kernel ridge regression and achieves high accuracy, outperforming bond counting and semi-empirical quantum chemistry methods. The model's performance is validated using cross-validation and is shown to be transferable to new molecular systems.
The model is tested on a diverse set of molecules from the GDB database, including those with different chemical structures and compositions. It successfully predicts atomization energies for molecules not used in training, demonstrating its applicability to chemical compound space. The model is also shown to accurately predict atomization energy curves for molecules beyond equilibrium geometries, including those with different bond types and functional groups.
The results indicate that the ML approach is highly accurate and efficient, with potential applications in rational compound design, geometrical relaxations, chemical reactions, and molecular dynamics. The study highlights the potential of ML in modeling molecular properties and suggests that the Coulomb matrix or its improvements could be useful as descriptors in various applications.