29 February 2024 | Hao Chen, Frederick J. King, Bin Zhou, Yu Wang, Carter J. Canedy, Joel Hayashi, Yang Zhong, Max W. Chang, Lars Pache, Julian L. Wong, Yong Jia, John Joslin, Tao Jiang, Christopher Benner, Sumit K. Chanda & Yingyao Zhou
This study introduces FRoGS, a deep learning approach that represents gene signatures based on their biological functions rather than their identities, similar to how word2vec works in natural language processing. By training a deep learning model, FRoGS effectively predicts compound-target interactions using the Broad Institute's L1000 datasets, outperforming models based on gene identities alone. Integrating additional pharmacological data sources significantly increases the number of high-quality compound-target predictions, many supported by in silico or experimental evidence. FRoGS demonstrates general utility in machine learning-based bioinformatics applications, enabling the discovery of new relationships among gene signatures from large-scale OMICS studies.
FRoGS extracts weak pathway signals from gene signatures by encoding genes into high-dimensional vectors that reflect their functional roles. This approach outperforms traditional methods that rely on gene identities, as demonstrated by the ability to detect shared functionality between perturbation signatures. FRoGS vectors were validated using t-SNE projections, showing that genes are grouped based on their functions, similar to how synonyms are co-located in word2vec embeddings.
FRoGS significantly improves the recall of known compound targets compared to other methods. It achieves a recall of 36.3% for compound-target predictions, outperforming other approaches. FRoGS also predicts compound targets supported by structure and activity data sources, demonstrating its effectiveness in identifying novel targets without relying on chemical structure features.
The FRoGS-based model, Model L, outperforms activity-based target prediction models, including pQSAR, in terms of recall and accuracy. It predicts a high-quality compound-target network with 1598 compounds, 682 genes, and 146,749 associations. The model's predictions are supported by multiple orthogonal data sources, including experimental data, structure similarity, and activity profiles.
FRoGS enables the discovery of ligands for the aryl hydrocarbon receptor (AhR), predicting 369 compounds potentially targeting AhR. Of these, 333 were confirmed to be AhR agonists or antagonists, demonstrating the model's effectiveness in identifying novel targets. The study highlights the potential of FRoGS in improving compound target predictions and advancing drug discovery by integrating functional information into machine learning models.This study introduces FRoGS, a deep learning approach that represents gene signatures based on their biological functions rather than their identities, similar to how word2vec works in natural language processing. By training a deep learning model, FRoGS effectively predicts compound-target interactions using the Broad Institute's L1000 datasets, outperforming models based on gene identities alone. Integrating additional pharmacological data sources significantly increases the number of high-quality compound-target predictions, many supported by in silico or experimental evidence. FRoGS demonstrates general utility in machine learning-based bioinformatics applications, enabling the discovery of new relationships among gene signatures from large-scale OMICS studies.
FRoGS extracts weak pathway signals from gene signatures by encoding genes into high-dimensional vectors that reflect their functional roles. This approach outperforms traditional methods that rely on gene identities, as demonstrated by the ability to detect shared functionality between perturbation signatures. FRoGS vectors were validated using t-SNE projections, showing that genes are grouped based on their functions, similar to how synonyms are co-located in word2vec embeddings.
FRoGS significantly improves the recall of known compound targets compared to other methods. It achieves a recall of 36.3% for compound-target predictions, outperforming other approaches. FRoGS also predicts compound targets supported by structure and activity data sources, demonstrating its effectiveness in identifying novel targets without relying on chemical structure features.
The FRoGS-based model, Model L, outperforms activity-based target prediction models, including pQSAR, in terms of recall and accuracy. It predicts a high-quality compound-target network with 1598 compounds, 682 genes, and 146,749 associations. The model's predictions are supported by multiple orthogonal data sources, including experimental data, structure similarity, and activity profiles.
FRoGS enables the discovery of ligands for the aryl hydrocarbon receptor (AhR), predicting 369 compounds potentially targeting AhR. Of these, 333 were confirmed to be AhR agonists or antagonists, demonstrating the model's effectiveness in identifying novel targets. The study highlights the potential of FRoGS in improving compound target predictions and advancing drug discovery by integrating functional information into machine learning models.