Understanding DANN%3A a deep learning approach for annotating the pathogenicity of genetic variants

The paper introduces DANN (Deleterious Annotation of Genetic Variants using Neural Networks), a deep learning approach for annotating the pathogenicity of genetic variants, particularly non-coding variants. DANN is designed to address the limitations of CADD (Combined Annotation-Dependent Depletion), which uses a linear kernel support vector machine (SVM) to differentiate between likely benign and deleterious variants. DANN employs a deep neural network (DNN) to capture non-linear relationships among features, leveraging GPU acceleration and techniques like dropout and momentum training. The DNN model consists of an input layer, a sigmoid function output layer, and three hidden layers with 1000 nodes each. The training process involves a large dataset of observed and simulated variants, with the DNN achieving a 19% reduction in error rate and a 14% improvement in the area under the curve (AUC) compared to CADD's SVM methodology. The study also evaluates the models' performance on a dataset of pathogenic mutations from the ClinVar database, showing that DANN outperforms both logistic regression (LR) and SVM in terms of accuracy and separation, especially for non-coding variants. The authors conclude that DANN is a valuable tool for prioritizing putative causal variants for further analysis.The paper introduces DANN (Deleterious Annotation of Genetic Variants using Neural Networks), a deep learning approach for annotating the pathogenicity of genetic variants, particularly non-coding variants. DANN is designed to address the limitations of CADD (Combined Annotation-Dependent Depletion), which uses a linear kernel support vector machine (SVM) to differentiate between likely benign and deleterious variants. DANN employs a deep neural network (DNN) to capture non-linear relationships among features, leveraging GPU acceleration and techniques like dropout and momentum training. The DNN model consists of an input layer, a sigmoid function output layer, and three hidden layers with 1000 nodes each. The training process involves a large dataset of observed and simulated variants, with the DNN achieving a 19% reduction in error rate and a 14% improvement in the area under the curve (AUC) compared to CADD's SVM methodology. The study also evaluates the models' performance on a dataset of pathogenic mutations from the ClinVar database, showing that DANN outperforms both logistic regression (LR) and SVM in terms of accuracy and separation, especially for non-coding variants. The authors conclude that DANN is a valuable tool for prioritizing putative causal variants for further analysis.

DANN: a deep learning approach for annotating the pathogenicity of genetic variants

2015 | Daniel Quang, Yifei Chen, Xiaohui Xie