Analyzing Learned Molecular Representations for Property Prediction

Analyzing Learned Molecular Representations for Property Prediction

20 Nov 2019 | Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay
This paper evaluates the performance of a graph convolutional neural network (D-MPNN) for molecular property prediction, comparing it to existing models on both public and proprietary datasets. The D-MPNN uses messages centered on directed bonds rather than atoms, which helps avoid unnecessary loops during message passing. The model also combines computed molecule-level features with the molecular representation learned by the MPNN. The D-MPNN is evaluated on 19 public and 16 proprietary datasets, showing strong performance across a wide range of chemical endpoints. The model outperforms existing baselines on most datasets, including those from Amgen, Novartis, and BASF. The results indicate that the D-MPNN is a powerful tool for property prediction, with performance comparable or better than existing models on many datasets. The model's performance is also evaluated on scaffold-based splits, which are more representative of real-world data splits used in drug discovery. The study highlights the importance of using appropriate evaluation methods to assess the generalization ability of property prediction models. The results suggest that the D-MPNN is a promising approach for molecular property prediction, with potential for further improvements in accuracy and reproducibility.This paper evaluates the performance of a graph convolutional neural network (D-MPNN) for molecular property prediction, comparing it to existing models on both public and proprietary datasets. The D-MPNN uses messages centered on directed bonds rather than atoms, which helps avoid unnecessary loops during message passing. The model also combines computed molecule-level features with the molecular representation learned by the MPNN. The D-MPNN is evaluated on 19 public and 16 proprietary datasets, showing strong performance across a wide range of chemical endpoints. The model outperforms existing baselines on most datasets, including those from Amgen, Novartis, and BASF. The results indicate that the D-MPNN is a powerful tool for property prediction, with performance comparable or better than existing models on many datasets. The model's performance is also evaluated on scaffold-based splits, which are more representative of real-world data splits used in drug discovery. The study highlights the importance of using appropriate evaluation methods to assess the generalization ability of property prediction models. The results suggest that the D-MPNN is a promising approach for molecular property prediction, with potential for further improvements in accuracy and reproducibility.
Reach us at info@study.space