2019, Vol. 47, Database issue | Philipp Rentzsch, Daniela Witten, Gregory M. Cooper, Jay Shendure and Martin Kircher
The article reviews the latest updates to Combined Annotation-Dependent Depletion (CADD), a widely used measure for predicting the deleteriousness of genetic variants in the human genome. CADD integrates over 60 genomic features and uses a machine learning model trained on simulated de novo variants and variants that have been fixed in human populations since the split from chimpanzees. The model distinguishes between proxy-neutral and proxy-deleterious variants, with the latter being more likely to be harmful. The article highlights the advantages of CADD, including its systematic and objective labeling of variants, ability to accommodate various features, and capacity to score both coding and non-coding variants. It also discusses the limitations, such as the imperfect approximation of variant pathogenicity provided by the proxy-deleterious label. The latest version, CADD v1.4, supports the human genome build GRCh38 and includes improvements to the website, such as simplified variant lookup, extended documentation, and an Application Program Interface (API). CADD has been widely adopted in genetic studies, particularly for prioritizing variants in Mendelian disorders and complex traits, and has inspired the development of other genome-wide predictors. The article concludes by outlining future directions for improving CADD, including the integration of domain-specific scores and more complex models.The article reviews the latest updates to Combined Annotation-Dependent Depletion (CADD), a widely used measure for predicting the deleteriousness of genetic variants in the human genome. CADD integrates over 60 genomic features and uses a machine learning model trained on simulated de novo variants and variants that have been fixed in human populations since the split from chimpanzees. The model distinguishes between proxy-neutral and proxy-deleterious variants, with the latter being more likely to be harmful. The article highlights the advantages of CADD, including its systematic and objective labeling of variants, ability to accommodate various features, and capacity to score both coding and non-coding variants. It also discusses the limitations, such as the imperfect approximation of variant pathogenicity provided by the proxy-deleterious label. The latest version, CADD v1.4, supports the human genome build GRCh38 and includes improvements to the website, such as simplified variant lookup, extended documentation, and an Application Program Interface (API). CADD has been widely adopted in genetic studies, particularly for prioritizing variants in Mendelian disorders and complex traits, and has inspired the development of other genome-wide predictors. The article concludes by outlining future directions for improving CADD, including the integration of domain-specific scores and more complex models.