March 16, 2024 | Yun Deng, Rasmus Nielsen, and Yun S. Song
SINGER is a novel Bayesian method for efficiently sampling Ancestral Recombination Graphs (ARGs) from the posterior distribution, enabling accurate inference and uncertainty quantification for large samples. It improves upon existing methods by being at least an order of magnitude faster, with enhanced accuracy and robustness to model misspecification. SINGER uses a two-step threading algorithm, an improved Markov chain Monte Carlo (MCMC) scheme, and a new proposal called Sub-Graph Pruning and Re-grafting (SGPR) to explore the ARG space. It also includes ARG re-scaling to adjust for algorithmic biases. SINGER outperforms existing methods in coalescence time accuracy, tree topology accuracy, and robustness to model misspecification. It was applied to African populations in the 1000 Genomes Project, revealing signals of local adaptation, ancient balancing selection, and archaic introgression. SINGER provides more accurate estimates of population-specific fine-scale diversity, which is useful for studying local adaptation. It also identifies genomic regions consistent with a specific model of archaic introgression using a coalescence distribution heatmap. SINGER is more robust to common sources of model misspecification, such as population size changes and background selection, without requiring explicit modeling of these factors. It is also more efficient in terms of computational speed and convergence. SINGER is applicable to large samples and can be extended to include more complex demographic models. However, it requires phased genomes as input and assumes accurate phasing, which can be challenging for understudied populations. SINGER's performance was benchmarked against other ARG inference methods, showing superior accuracy and robustness. It is a promising tool for population genetic analysis and can be used to study evolutionary signals in large genomic datasets.SINGER is a novel Bayesian method for efficiently sampling Ancestral Recombination Graphs (ARGs) from the posterior distribution, enabling accurate inference and uncertainty quantification for large samples. It improves upon existing methods by being at least an order of magnitude faster, with enhanced accuracy and robustness to model misspecification. SINGER uses a two-step threading algorithm, an improved Markov chain Monte Carlo (MCMC) scheme, and a new proposal called Sub-Graph Pruning and Re-grafting (SGPR) to explore the ARG space. It also includes ARG re-scaling to adjust for algorithmic biases. SINGER outperforms existing methods in coalescence time accuracy, tree topology accuracy, and robustness to model misspecification. It was applied to African populations in the 1000 Genomes Project, revealing signals of local adaptation, ancient balancing selection, and archaic introgression. SINGER provides more accurate estimates of population-specific fine-scale diversity, which is useful for studying local adaptation. It also identifies genomic regions consistent with a specific model of archaic introgression using a coalescence distribution heatmap. SINGER is more robust to common sources of model misspecification, such as population size changes and background selection, without requiring explicit modeling of these factors. It is also more efficient in terms of computational speed and convergence. SINGER is applicable to large samples and can be extended to include more complex demographic models. However, it requires phased genomes as input and assumes accurate phasing, which can be challenging for understudied populations. SINGER's performance was benchmarked against other ARG inference methods, showing superior accuracy and robustness. It is a promising tool for population genetic analysis and can be used to study evolutionary signals in large genomic datasets.