27 July 2015 | Babak Alipanahi1,2,6, Andrew Delong1,6, Matthew T Weirauch3-5 & Brendan J Frey1-3
DeepBind is a deep learning approach for predicting the sequence specificities of DNA- and RNA-binding proteins. It outperforms existing methods, even when trained on in vitro data and tested on in vivo data. DeepBind uses deep convolutional neural networks to discover new sequence motifs and predict binding scores. It can handle millions of sequences per experiment and provides visualizations such as position weight matrices and mutation maps. DeepBind is applicable to both microarray and sequencing data, can learn from large datasets, generalizes across technologies, tolerates noise and mislabeled data, and trains automatically. It is used to identify binding sites and score the effects of mutations.
DeepBind was evaluated on various datasets, including PBM, RNAcompete, ChIP-seq, and HT-SELEX data. It outperformed other methods in predicting transcription factor and RNA-binding protein specificities. DeepBind models were tested on in vivo data and showed high performance, even when trained on in vitro data. It was also used to analyze disease-associated genetic variants and identify potential regulatory roles of RNA-binding proteins in alternative splicing.
DeepBind was trained on a combined 12 terabases of sequence data from thousands of experiments. It was evaluated on the DREAM5 challenge, where it outperformed 26 other methods. It was also used to analyze in vivo data, including ChIP-seq and CLIP-seq, and showed high performance compared to existing methods. DeepBind was able to identify in vivo bound sequences from in vitro data, suggesting its ability to capture genuine nucleic acid binding properties.
DeepBind was also used to identify and visualize damaging genetic variants. It was able to detect mutations that disrupt binding sites and affect gene expression, potentially leading to disease. DeepBind was tested on various genetic variants and showed high performance in predicting their effects. It was also used to analyze the TERT promoter and identify mutations linked to cancer.
DeepBind models were found to be consistent with known splicing patterns and were used to predict binding scores at exon junctions regulated by known splicing regulators. These predictions were consistent with experimental CLIP-seq data and known binding profiles of studied RBPs.
DeepBind is based on deep learning, which is a scalable and modular pattern discovery method. It does not rely on common application-specific heuristics and has an active research community that is generating significant investment from academia and industry. DeepBind is available as a standalone software tool and can be used to analyze sequence specificities of DNA- and RNA-binding proteins.DeepBind is a deep learning approach for predicting the sequence specificities of DNA- and RNA-binding proteins. It outperforms existing methods, even when trained on in vitro data and tested on in vivo data. DeepBind uses deep convolutional neural networks to discover new sequence motifs and predict binding scores. It can handle millions of sequences per experiment and provides visualizations such as position weight matrices and mutation maps. DeepBind is applicable to both microarray and sequencing data, can learn from large datasets, generalizes across technologies, tolerates noise and mislabeled data, and trains automatically. It is used to identify binding sites and score the effects of mutations.
DeepBind was evaluated on various datasets, including PBM, RNAcompete, ChIP-seq, and HT-SELEX data. It outperformed other methods in predicting transcription factor and RNA-binding protein specificities. DeepBind models were tested on in vivo data and showed high performance, even when trained on in vitro data. It was also used to analyze disease-associated genetic variants and identify potential regulatory roles of RNA-binding proteins in alternative splicing.
DeepBind was trained on a combined 12 terabases of sequence data from thousands of experiments. It was evaluated on the DREAM5 challenge, where it outperformed 26 other methods. It was also used to analyze in vivo data, including ChIP-seq and CLIP-seq, and showed high performance compared to existing methods. DeepBind was able to identify in vivo bound sequences from in vitro data, suggesting its ability to capture genuine nucleic acid binding properties.
DeepBind was also used to identify and visualize damaging genetic variants. It was able to detect mutations that disrupt binding sites and affect gene expression, potentially leading to disease. DeepBind was tested on various genetic variants and showed high performance in predicting their effects. It was also used to analyze the TERT promoter and identify mutations linked to cancer.
DeepBind models were found to be consistent with known splicing patterns and were used to predict binding scores at exon junctions regulated by known splicing regulators. These predictions were consistent with experimental CLIP-seq data and known binding profiles of studied RBPs.
DeepBind is based on deep learning, which is a scalable and modular pattern discovery method. It does not rely on common application-specific heuristics and has an active research community that is generating significant investment from academia and industry. DeepBind is available as a standalone software tool and can be used to analyze sequence specificities of DNA- and RNA-binding proteins.