05 August 2014 | Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp & O. Anatole von Lilienfeld
The article presents a comprehensive computational study of 134,000 stable small organic molecules composed of CHONF atoms, selected from the GDB-17 chemical universe database. These molecules represent a subset of all species with up to nine heavy atoms (CONF) out of the 466 billion organic molecules in the database. The study aims to provide a rigorous and unbiased exploration of chemical compound space, which is crucial for the computational design of new drugs and materials. Key properties calculated include geometric structures, energy levels, electronic properties, and thermodynamic data. The calculations were performed using the B3LYP/6-31G(zdf,p) level of quantum chemistry, with more accurate G4MP2 results provided for the predominant stoichiometry C₆H₈O₂. The data set is intended to serve as a benchmark for method validation, the development of new methods like hybrid quantum mechanics/machine learning, and the systematic identification of structure-property relationships. The article also discusses the validation of the results through comparisons with more accurate theoretical methods and the assessment of geometry consistency using InChI identifiers.The article presents a comprehensive computational study of 134,000 stable small organic molecules composed of CHONF atoms, selected from the GDB-17 chemical universe database. These molecules represent a subset of all species with up to nine heavy atoms (CONF) out of the 466 billion organic molecules in the database. The study aims to provide a rigorous and unbiased exploration of chemical compound space, which is crucial for the computational design of new drugs and materials. Key properties calculated include geometric structures, energy levels, electronic properties, and thermodynamic data. The calculations were performed using the B3LYP/6-31G(zdf,p) level of quantum chemistry, with more accurate G4MP2 results provided for the predominant stoichiometry C₆H₈O₂. The data set is intended to serve as a benchmark for method validation, the development of new methods like hybrid quantum mechanics/machine learning, and the systematic identification of structure-property relationships. The article also discusses the validation of the results through comparisons with more accurate theoretical methods and the assessment of geometry consistency using InChI identifiers.