2 January 2024 | Emma King-Smith, Simon Berritt, Louise Bernier, Xinjun Hou, Jacquelyn L. Klug-McLeod, Jason Mustakis, Neal W. Sach, Joseph W. Tucker, Qingyi Yang, Roger M. Howard & Alpha A. Lee
The article introduces a high-throughput experimentation (HTE) analyzer, a robust and statistically rigorous framework, designed to interpret large-scale HTE datasets and uncover hidden chemical insights. The framework, named HITEA, is applicable to any HTE dataset, regardless of size or scope, and yields interpretable correlations between starting materials, reagents, and outcomes. The authors disclose over 39,000 previously proprietary HTE reactions, covering a broad range of chemistry, including cross-coupling reactions and chiral salt resolutions. HITEA was validated on cross-coupling and hydrogenation datasets, revealing statistically significant hidden relationships between reaction components and outcomes, as well as highlighting areas of dataset bias and specific reaction spaces requiring further investigation.
The HITEA methodology is centered around three orthogonal statistical analysis frameworks: random forests, Z-score ANOVA–Tukey, and principal component analysis (PCA). These frameworks answer questions about variable importance, best-in-class/worst-in-class reagents, and how these reagents populate the chemical space. The authors demonstrate the flexibility and versatility of HITEA by analyzing datasets spanning from 3,000 to 1,000 reactions, covering a wide range of substrates and reaction types.
The article also discusses the potential applications of HITEA in mechanistic interrogation, bias identification for machine learning, and future HTE screens. HITEA can provide valuable insights for reaction optimization, help identify biases in datasets, and guide future HTE screens. The authors call for the chemical community to collect, publish, and analyze more HTE data to explore uncharted territories of the chemical reactome.The article introduces a high-throughput experimentation (HTE) analyzer, a robust and statistically rigorous framework, designed to interpret large-scale HTE datasets and uncover hidden chemical insights. The framework, named HITEA, is applicable to any HTE dataset, regardless of size or scope, and yields interpretable correlations between starting materials, reagents, and outcomes. The authors disclose over 39,000 previously proprietary HTE reactions, covering a broad range of chemistry, including cross-coupling reactions and chiral salt resolutions. HITEA was validated on cross-coupling and hydrogenation datasets, revealing statistically significant hidden relationships between reaction components and outcomes, as well as highlighting areas of dataset bias and specific reaction spaces requiring further investigation.
The HITEA methodology is centered around three orthogonal statistical analysis frameworks: random forests, Z-score ANOVA–Tukey, and principal component analysis (PCA). These frameworks answer questions about variable importance, best-in-class/worst-in-class reagents, and how these reagents populate the chemical space. The authors demonstrate the flexibility and versatility of HITEA by analyzing datasets spanning from 3,000 to 1,000 reactions, covering a wide range of substrates and reaction types.
The article also discusses the potential applications of HITEA in mechanistic interrogation, bias identification for machine learning, and future HTE screens. HITEA can provide valuable insights for reaction optimization, help identify biases in datasets, and guide future HTE screens. The authors call for the chemical community to collect, publish, and analyze more HTE data to explore uncharted territories of the chemical reactome.