2024 | Michael Dodds, Jeff Guo, Thomas Löhr, Alessandro Tibo, Ola Engkvist and Jon Paul Janet
This paper introduces an active learning (AL) system integrated with a reinforcement learning (RL) model (RL-AL) for molecular design, aiming to improve the sample efficiency of the optimization process. The authors address the challenge of bridging the gap between simulated episodes in computer games and real scientific problems with complex environments, particularly in drug discovery. They demonstrate that their approach significantly accelerates the search for novel solutions compared to baseline RL methods, achieving a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time. The compounds discovered through RL-AL show substantial enrichment in a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds without a reduction in output diversity. The method is applicable to various oracle functions, including free energy perturbation methods, and can be used in any RL domain. The study also explores the interplay between RL and AL, identifies unique challenges, and develops a novel AL approach to solve the multiparameter optimization (MPO) problem.This paper introduces an active learning (AL) system integrated with a reinforcement learning (RL) model (RL-AL) for molecular design, aiming to improve the sample efficiency of the optimization process. The authors address the challenge of bridging the gap between simulated episodes in computer games and real scientific problems with complex environments, particularly in drug discovery. They demonstrate that their approach significantly accelerates the search for novel solutions compared to baseline RL methods, achieving a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time. The compounds discovered through RL-AL show substantial enrichment in a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds without a reduction in output diversity. The method is applicable to various oracle functions, including free energy perturbation methods, and can be used in any RL domain. The study also explores the interplay between RL and AL, identifies unique challenges, and develops a novel AL approach to solve the multiparameter optimization (MPO) problem.