mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters

mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters

April 23, 2013 | Hákón Jónsson¹,*, Aurélien Ginolhac¹, Mikkel Schubert¹, Philip L. F. Johnson² and Ludovic Orlando¹
mapDamage2.0 is a fast, approximate Bayesian method for estimating ancient DNA (aDNA) damage parameters. It addresses the challenges of analyzing aDNA, which is often contaminated with exogenous DNA and suffers from post-mortem damage. The method uses a statistical model of DNA damage to estimate key parameters such as the average length of overhangs (λ), nick frequency (ν), and cytosine deamination rates in double-stranded and overhang regions (δd and δs). The Bayesian framework allows for rescaling base quality scores based on the probability of damage, improving the accuracy of downstream analyses like SNP calling. The approach builds on the DNA damage model described by Briggs et al. (2007), assuming that mutations and post-mortem damage are independent within a fragment, with occurrences depending on the relative position from the sequence ends. The model uses a multinomial distribution to describe position-specific substitutions and incorporates a DNA damage transition matrix. The damage probabilities are calculated based on the likelihood of deamination events in different DNA regions. mapDamage2.0 is compatible with various DNA library protocols and handles NGS datasets efficiently. It has been applied to a range of aDNA datasets, showing good agreement between posterior predictive intervals and empirical frequencies. The method also demonstrated that tissue- and sample-specific micro-environmental characteristics influence DNA damage kinetics. Applying the quality rescaling scheme to an Australian Aboriginal individual's genome improved the overlap of genotype calls with dbSNP v137, suggesting reduced false-positive SNP calls. The study concludes that mapDamage2.0 provides a robust method for inferring aDNA damage parameters, with potential applications in improving mapping procedures against reference genomes. The method's ability to rescale base quality scores helps mitigate the impact of nucleotide misincorporations in downstream analyses.mapDamage2.0 is a fast, approximate Bayesian method for estimating ancient DNA (aDNA) damage parameters. It addresses the challenges of analyzing aDNA, which is often contaminated with exogenous DNA and suffers from post-mortem damage. The method uses a statistical model of DNA damage to estimate key parameters such as the average length of overhangs (λ), nick frequency (ν), and cytosine deamination rates in double-stranded and overhang regions (δd and δs). The Bayesian framework allows for rescaling base quality scores based on the probability of damage, improving the accuracy of downstream analyses like SNP calling. The approach builds on the DNA damage model described by Briggs et al. (2007), assuming that mutations and post-mortem damage are independent within a fragment, with occurrences depending on the relative position from the sequence ends. The model uses a multinomial distribution to describe position-specific substitutions and incorporates a DNA damage transition matrix. The damage probabilities are calculated based on the likelihood of deamination events in different DNA regions. mapDamage2.0 is compatible with various DNA library protocols and handles NGS datasets efficiently. It has been applied to a range of aDNA datasets, showing good agreement between posterior predictive intervals and empirical frequencies. The method also demonstrated that tissue- and sample-specific micro-environmental characteristics influence DNA damage kinetics. Applying the quality rescaling scheme to an Australian Aboriginal individual's genome improved the overlap of genotype calls with dbSNP v137, suggesting reduced false-positive SNP calls. The study concludes that mapDamage2.0 provides a robust method for inferring aDNA damage parameters, with potential applications in improving mapping procedures against reference genomes. The method's ability to rescale base quality scores helps mitigate the impact of nucleotide misincorporations in downstream analyses.
Reach us at info@study.space