13 March 2024 | Shuan Chen, Sunggi An, Ramil Babazade, Yousung Jung
Atom-to-atom mapping (AAM) is crucial for understanding chemical reaction mechanisms and improving the accuracy of machine learning (ML) models in retrosynthesis and reaction outcome prediction. Existing methods often rely on substructure alignments rather than chemical knowledge, leading to potential errors. To address this, the authors present LocalMapper, an ML model that learns precise AAMs from chemist-labeled reactions through human-in-the-loop machine learning. LocalMapper achieves 98.5% calibrated accuracy by learning from only 2% of chemist-labeled reactions in the USPTO-50K dataset. Notably, 97% of the confidently predicted AAMs show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper outperforms existing methods. The model's ability to generate reliable AAMs is expected to enhance the quality of future ML-based reaction prediction models. The key contributions include a knowledge-based uncertainty identification method, state-of-the-art AAM prediction accuracy, and favorable performance in out-of-distribution tests.Atom-to-atom mapping (AAM) is crucial for understanding chemical reaction mechanisms and improving the accuracy of machine learning (ML) models in retrosynthesis and reaction outcome prediction. Existing methods often rely on substructure alignments rather than chemical knowledge, leading to potential errors. To address this, the authors present LocalMapper, an ML model that learns precise AAMs from chemist-labeled reactions through human-in-the-loop machine learning. LocalMapper achieves 98.5% calibrated accuracy by learning from only 2% of chemist-labeled reactions in the USPTO-50K dataset. Notably, 97% of the confidently predicted AAMs show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper outperforms existing methods. The model's ability to generate reliable AAMs is expected to enhance the quality of future ML-based reaction prediction models. The key contributions include a knowledge-based uncertainty identification method, state-of-the-art AAM prediction accuracy, and favorable performance in out-of-distribution tests.