Understanding Sample efficient reinforcement learning with active learning for molecular design

This article presents a novel approach combining reinforcement learning (RL) with active learning (AL) to accelerate molecular design. The method, called RL-AL, significantly improves sample efficiency in molecular discovery by leveraging AL to select the most informative compounds for evaluation with an oracle function. The system uses a surrogate model to predict oracle scores and selects compounds based on an acquisition function, which helps identify the most promising candidates for further evaluation. This approach outperforms traditional RL methods in terms of both the number of hits generated and the computational time required to find them. For example, the RL-AL method achieves a 5–66-fold increase in hits generated for a fixed oracle budget and a 4–64-fold reduction in computational time to find a specific number of hits. The compounds discovered through RL-AL show substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds without reducing output diversity. The RL-AL system is designed to work with various oracle functions, including docking and ROCS (Rapid Overlay of Chemical Structures), and demonstrates significant improvements in oracle-call efficiency. The method is particularly effective for oracle functions that are computationally expensive, such as free energy perturbation methods, which have been largely overlooked in RL due to high computational costs. The RL-AL approach is also applicable to other domains where oracle experiments or simulations are costly or time-consuming. The study highlights the importance of balancing exploration and exploitation in the RL-AL process, and shows that the system can be optimized by adjusting parameters such as the size of the AL batch, the number of AL loops per RL epoch, and the ratio of AL to RL. The optimal configuration was found to be a zero-weight update for surrogate-predicted compounds, with a batch size of 512 and an AL/RL ratio of 0.125. This configuration achieved a 19.94-fold improvement in the number of hits for the ROCS oracle and a 66.46-fold improvement for the docking oracle. The study also explores the impact of different acquisition strategies on the performance of the RL-AL system, showing that using a probabilistic formulation of the MPO (multiparameter optimization) score can lead to significant improvements in the MPO score. This approach allows the system to better satisfy the more balanced MPO profile, which is essential for the successful design of new compounds. The results demonstrate that the RL-AL method provides a substantial reduction in compute resources required to produce the same number and quality of hits, and has the potential to enable the incorporation of even more accurate and expensive physics-based methods in molecular design.This article presents a novel approach combining reinforcement learning (RL) with active learning (AL) to accelerate molecular design. The method, called RL-AL, significantly improves sample efficiency in molecular discovery by leveraging AL to select the most informative compounds for evaluation with an oracle function. The system uses a surrogate model to predict oracle scores and selects compounds based on an acquisition function, which helps identify the most promising candidates for further evaluation. This approach outperforms traditional RL methods in terms of both the number of hits generated and the computational time required to find them. For example, the RL-AL method achieves a 5–66-fold increase in hits generated for a fixed oracle budget and a 4–64-fold reduction in computational time to find a specific number of hits. The compounds discovered through RL-AL show substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds without reducing output diversity. The RL-AL system is designed to work with various oracle functions, including docking and ROCS (Rapid Overlay of Chemical Structures), and demonstrates significant improvements in oracle-call efficiency. The method is particularly effective for oracle functions that are computationally expensive, such as free energy perturbation methods, which have been largely overlooked in RL due to high computational costs. The RL-AL approach is also applicable to other domains where oracle experiments or simulations are costly or time-consuming. The study highlights the importance of balancing exploration and exploitation in the RL-AL process, and shows that the system can be optimized by adjusting parameters such as the size of the AL batch, the number of AL loops per RL epoch, and the ratio of AL to RL. The optimal configuration was found to be a zero-weight update for surrogate-predicted compounds, with a batch size of 512 and an AL/RL ratio of 0.125. This configuration achieved a 19.94-fold improvement in the number of hits for the ROCS oracle and a 66.46-fold improvement for the docking oracle. The study also explores the impact of different acquisition strategies on the performance of the RL-AL system, showing that using a probabilistic formulation of the MPO (multiparameter optimization) score can lead to significant improvements in the MPO score. This approach allows the system to better satisfy the more balanced MPO profile, which is essential for the successful design of new compounds. The results demonstrate that the RL-AL method provides a substantial reduction in compute resources required to produce the same number and quality of hits, and has the potential to enable the incorporation of even more accurate and expensive physics-based methods in molecular design.

Sample efficient reinforcement learning with active learning for molecular design

2024 | Michael Dodds, Jeff Guo, Thomas L"ohr, Alessandro Tibo, Ola Engkvist and Jon Paul Janet