Machine-Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes

Machine-Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes

January 26, 2024 | Patrick W. V. Butler, Rooollah Hafizi, and Graeme M. Day*
This article presents a method for improving the efficiency of organic molecular crystal structure prediction (CSP) by using machine-learned interatomic potentials (MLIPs) trained through active learning. The goal is to reduce the computational cost of high-level ab initio calculations, which are typically too expensive for large-scale CSP. MLIPs, trained to reproduce results of ab initio methods with lower computational cost, can efficiently rank crystal structures. The study combines active learning with well-established sampling methods from CSP to generate highly automated potentials that are relevant across a wide range of crystal packing space. These potentials are demonstrated to efficiently rerank large, diverse crystal structure landscapes to near-DFT accuracy from force field-based CSP, improving the reliability of the final energy ranking. Furthermore, the potentials are extended to more accurately model structures far from lattice energy minima through additional on-the-fly training within Monte Carlo simulations. The study investigates how to best develop MLIPs, specifically neural network potentials (NNPs), from organic CSP landscapes. It begins by examining active learning on a CSP landscape of oxalic acid, investigating the effects of hyperparameters and strategies on the size and quality of the selected training set. From this, an efficient approach combining active learning with Δ-learning is identified. The approach is then demonstrated through correcting the CSP landscapes of resorcinol and triptycene-tris(benzimidazolone) (TTBI), each containing thousands of structures, to the DFT level. Finally, the potentials are extended to describe structures far from the CSP minima by combining on-the-fly training with Monte Carlo simulations. The results show that active learning combined with Δ-learning significantly reduces the training set size required to achieve a certain level of accuracy, reducing computational costs. The study also demonstrates that the active learning workflow can be applied to other diverse, large-scale landscapes, highlighting the advantage of the correction even for landscapes where the low-level method is initially thought to perform reasonably well. The potentials are shown to achieve good accuracy across the entire landscape, with only a small number of structures having high uncertainties. The study also shows that on-the-fly training within MC simulations can improve the description of the energy surface, reducing the number of structures with high uncertainties and improving the accuracy of the potential. The results indicate that the workflow is generally useful for organic CSP and can help address the often prohibitive costs associated with the DFT ranking of predicted structures. The methodology can be applied to existing legacy or published landscapes as well as new CSP studies, although a "good enough" baseline model is required. The potentials generated with the methods presented here are reliable at local minima on the lattice energy surface and accurate in the local region of the lattice energy surface. However, properties and behaviors that require a broader description of the lattice energy surface, such as transitions between polymorphs, might require the potential to extrapolate beyond its training, so risks loss of accuracy.This article presents a method for improving the efficiency of organic molecular crystal structure prediction (CSP) by using machine-learned interatomic potentials (MLIPs) trained through active learning. The goal is to reduce the computational cost of high-level ab initio calculations, which are typically too expensive for large-scale CSP. MLIPs, trained to reproduce results of ab initio methods with lower computational cost, can efficiently rank crystal structures. The study combines active learning with well-established sampling methods from CSP to generate highly automated potentials that are relevant across a wide range of crystal packing space. These potentials are demonstrated to efficiently rerank large, diverse crystal structure landscapes to near-DFT accuracy from force field-based CSP, improving the reliability of the final energy ranking. Furthermore, the potentials are extended to more accurately model structures far from lattice energy minima through additional on-the-fly training within Monte Carlo simulations. The study investigates how to best develop MLIPs, specifically neural network potentials (NNPs), from organic CSP landscapes. It begins by examining active learning on a CSP landscape of oxalic acid, investigating the effects of hyperparameters and strategies on the size and quality of the selected training set. From this, an efficient approach combining active learning with Δ-learning is identified. The approach is then demonstrated through correcting the CSP landscapes of resorcinol and triptycene-tris(benzimidazolone) (TTBI), each containing thousands of structures, to the DFT level. Finally, the potentials are extended to describe structures far from the CSP minima by combining on-the-fly training with Monte Carlo simulations. The results show that active learning combined with Δ-learning significantly reduces the training set size required to achieve a certain level of accuracy, reducing computational costs. The study also demonstrates that the active learning workflow can be applied to other diverse, large-scale landscapes, highlighting the advantage of the correction even for landscapes where the low-level method is initially thought to perform reasonably well. The potentials are shown to achieve good accuracy across the entire landscape, with only a small number of structures having high uncertainties. The study also shows that on-the-fly training within MC simulations can improve the description of the energy surface, reducing the number of structures with high uncertainties and improving the accuracy of the potential. The results indicate that the workflow is generally useful for organic CSP and can help address the often prohibitive costs associated with the DFT ranking of predicted structures. The methodology can be applied to existing legacy or published landscapes as well as new CSP studies, although a "good enough" baseline model is required. The potentials generated with the methods presented here are reliable at local minima on the lattice energy surface and accurate in the local region of the lattice energy surface. However, properties and behaviors that require a broader description of the lattice energy surface, such as transitions between polymorphs, might require the potential to extrapolate beyond its training, so risks loss of accuracy.
Reach us at info@study.space
Understanding Machine-Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes