[slides] CADD%3A predicting the deleteriousness of variants throughout the human genome

CADD is a widely used method for predicting the deleteriousness of genetic variants across the human genome. It integrates over 60 genomic features to score single nucleotide variants (SNVs) and short insertions and deletions (InDels) in the reference genome. CADD uses a machine learning model trained on simulated de novo variants and variants fixed in human populations since the human-chimpanzee split. The model distinguishes between proxy-neutral (mostly benign) and proxy-deleterious (potentially harmful) variants, enabling the prioritization of variants that may contribute to disease. The latest version, CADD 1.4, supports the human genome build GRCh38 and includes updated documentation, an API, and improved integration with other tools. CADD scores are available online and can be accessed via a web server or local installations. Users can upload VCF files or use pre-scored variant sets to obtain scores. CADD scores are also available through third-party resources like dbNSFP, Ensembl VEP, and ANNOVAR. The scores are normalized to a PHRED-like scale, allowing for comparison across different studies. However, raw scores provide higher resolution and are recommended for variant prioritization, while scaled scores are useful for direct interpretation of pathogenicity. CADD has been widely adopted in genetic research, with over 1984 citations and 24,000 unique users. It has been used in both clinical and population studies to identify pathogenic variants. CADD has also influenced the development of other variant prediction tools, including Deep Neural Networks and ensemble learners. Despite its success, CADD does not use curated variant sets for training, which reduces bias and improves generalizability. CADD v1.4 supports GRCh38, the latest human genome assembly, and provides scores for a larger genomic region compared to previous versions. The model is trained on a large, unbiased dataset, and its performance is validated against multiple datasets, including ClinVar and ExAC. CADD scores are available for SNVs and InDels on autosomes and chromosome X, with scores for chromosome Y and mitochondrial variants no longer supported. CADD is freely available for non-commercial use and can be accessed via a web server, API, or local installation. It is designed to be flexible and scalable, allowing for integration with various genomic tools and databases. The model's ability to integrate diverse genomic features and its large training set make it a powerful tool for variant prioritization and disease research. CADD continues to be a key resource in the field of human genetics, providing a reliable and comprehensive method for assessing the pathogenicity of genetic variants.CADD is a widely used method for predicting the deleteriousness of genetic variants across the human genome. It integrates over 60 genomic features to score single nucleotide variants (SNVs) and short insertions and deletions (InDels) in the reference genome. CADD uses a machine learning model trained on simulated de novo variants and variants fixed in human populations since the human-chimpanzee split. The model distinguishes between proxy-neutral (mostly benign) and proxy-deleterious (potentially harmful) variants, enabling the prioritization of variants that may contribute to disease. The latest version, CADD 1.4, supports the human genome build GRCh38 and includes updated documentation, an API, and improved integration with other tools. CADD scores are available online and can be accessed via a web server or local installations. Users can upload VCF files or use pre-scored variant sets to obtain scores. CADD scores are also available through third-party resources like dbNSFP, Ensembl VEP, and ANNOVAR. The scores are normalized to a PHRED-like scale, allowing for comparison across different studies. However, raw scores provide higher resolution and are recommended for variant prioritization, while scaled scores are useful for direct interpretation of pathogenicity. CADD has been widely adopted in genetic research, with over 1984 citations and 24,000 unique users. It has been used in both clinical and population studies to identify pathogenic variants. CADD has also influenced the development of other variant prediction tools, including Deep Neural Networks and ensemble learners. Despite its success, CADD does not use curated variant sets for training, which reduces bias and improves generalizability. CADD v1.4 supports GRCh38, the latest human genome assembly, and provides scores for a larger genomic region compared to previous versions. The model is trained on a large, unbiased dataset, and its performance is validated against multiple datasets, including ClinVar and ExAC. CADD scores are available for SNVs and InDels on autosomes and chromosome X, with scores for chromosome Y and mitochondrial variants no longer supported. CADD is freely available for non-commercial use and can be accessed via a web server, API, or local installation. It is designed to be flexible and scalable, allowing for integration with various genomic tools and databases. The model's ability to integrate diverse genomic features and its large training set make it a powerful tool for variant prioritization and disease research. CADD continues to be a key resource in the field of human genetics, providing a reliable and comprehensive method for assessing the pathogenicity of genetic variants.

CADD: predicting the deleteriousness of variants throughout the human genome

2019 | Philipp Rentzsch, Daniela Witten, Gregory M. Cooper, Jay Shendure, Martin Kircher