Efficient Bayesian mixed model analysis increases association power in large cohorts

Efficient Bayesian mixed model analysis increases association power in large cohorts

2015 March | Po-Ru Loh, George Tucker, Brendan K Bulik-Sullivan, Bjarni J Vilhjalmsson, Hilary K Finucane, Rany M Salem, Daniel I Chasman, Paul M Ridker, Benjamin M Neale, Nick Patterson, and Alkes L Price
The article presents BOLT-LMM, a more efficient Bayesian mixed model method for genetic association analysis, which significantly improves statistical power compared to existing methods. BOLT-LMM reduces computational complexity from O(MN²) to O(MN) by using a Bayesian mixture prior on marker effect sizes, allowing it to handle large cohorts more efficiently. This approach models non-infinitesimal genetic architectures, which are more realistic than the standard infinitesimal model assuming small, independent effect sizes. The method was tested on nine quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS), showing increased association power, equivalent to up to a 10% increase in effective sample size. Simulations demonstrated that BOLT-LMM's power increases with cohort size, making it suitable for large-scale genome-wide association studies (GWAS). BOLT-LMM consists of four main steps: estimating variance parameters, computing infinitesimal mixed model association statistics, estimating Gaussian mixture parameters, and computing Gaussian mixture model association statistics. It uses a variational approximation to compute phenotypic residuals and applies a retrospective score statistic for association testing. The method avoids computing or storing a genetic relationship matrix, reducing memory usage. It also calibrates its statistic using LD Score regression, ensuring accurate results. BOLT-LMM was compared to existing methods in terms of computational efficiency and power. It outperformed methods like GCTA-LOCO and BOLT-LMM-inf, especially in datasets with non-infinitesimal genetic architectures. Simulations showed that BOLT-LMM's power gains increase with the number of causal SNPs and cohort size. It also demonstrated robustness to confounding factors, with calibrated statistics that were well-calibrated and did not show significant inflation in false positives. The method was applied to real data from the WGHS, where it showed increased power for lipid traits, with a 10% increase in mean χ² statistics compared to PCA. BOLT-LMM also performed well in controlling Type I error and showed consistent results across different genetic architectures. It was found to be more efficient than existing methods, with memory usage only about MN/4 bytes, making it suitable for large datasets. BOLT-LMM's hybrid approach of leaving each chromosome out, fitting a Bayesian model on the remaining SNPs, and applying a retrospective hypothesis test for association of left-out SNPs with the residual phenotype, allows it to handle large datasets efficiently. It is recommended for use in GWAS, particularly for large, non-ascertained population cohorts and for diseases with prevalence ≥5%. The method is also suitable for analyzing large ascertained case-control studies of rarer diseases, though further research is needed to model ascertainment using posterior mean liabilities. BOLT-LMM's efficiency and power make it a promising tool for genetic association studies.The article presents BOLT-LMM, a more efficient Bayesian mixed model method for genetic association analysis, which significantly improves statistical power compared to existing methods. BOLT-LMM reduces computational complexity from O(MN²) to O(MN) by using a Bayesian mixture prior on marker effect sizes, allowing it to handle large cohorts more efficiently. This approach models non-infinitesimal genetic architectures, which are more realistic than the standard infinitesimal model assuming small, independent effect sizes. The method was tested on nine quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS), showing increased association power, equivalent to up to a 10% increase in effective sample size. Simulations demonstrated that BOLT-LMM's power increases with cohort size, making it suitable for large-scale genome-wide association studies (GWAS). BOLT-LMM consists of four main steps: estimating variance parameters, computing infinitesimal mixed model association statistics, estimating Gaussian mixture parameters, and computing Gaussian mixture model association statistics. It uses a variational approximation to compute phenotypic residuals and applies a retrospective score statistic for association testing. The method avoids computing or storing a genetic relationship matrix, reducing memory usage. It also calibrates its statistic using LD Score regression, ensuring accurate results. BOLT-LMM was compared to existing methods in terms of computational efficiency and power. It outperformed methods like GCTA-LOCO and BOLT-LMM-inf, especially in datasets with non-infinitesimal genetic architectures. Simulations showed that BOLT-LMM's power gains increase with the number of causal SNPs and cohort size. It also demonstrated robustness to confounding factors, with calibrated statistics that were well-calibrated and did not show significant inflation in false positives. The method was applied to real data from the WGHS, where it showed increased power for lipid traits, with a 10% increase in mean χ² statistics compared to PCA. BOLT-LMM also performed well in controlling Type I error and showed consistent results across different genetic architectures. It was found to be more efficient than existing methods, with memory usage only about MN/4 bytes, making it suitable for large datasets. BOLT-LMM's hybrid approach of leaving each chromosome out, fitting a Bayesian model on the remaining SNPs, and applying a retrospective hypothesis test for association of left-out SNPs with the residual phenotype, allows it to handle large datasets efficiently. It is recommended for use in GWAS, particularly for large, non-ascertained population cohorts and for diseases with prevalence ≥5%. The method is also suitable for analyzing large ascertained case-control studies of rarer diseases, though further research is needed to model ascertainment using posterior mean liabilities. BOLT-LMM's efficiency and power make it a promising tool for genetic association studies.
Reach us at info@study.space
[slides and audio] Efficient Bayesian mixed model analysis increases association power in large cohorts