Fast model-based estimation of ancestry in unrelated individuals

Fast model-based estimation of ancestry in unrelated individuals

2009 | David H. Alexander, John Novembre, and Kenneth Lange
A new algorithm and program, ADMIXTURE, are introduced for model-based estimation of ancestry in unrelated individuals. ADMIXTURE uses a likelihood model similar to the widely used program STRUCTURE but runs significantly faster, solving problems in minutes that take STRUCTURE hours. It is also as fast as EIGENSTRAT, a popular program that uses principal component analysis. ADMIXTURE's maximum likelihood estimates of admixture coefficients and ancestral allele frequencies are as accurate as STRUCTURE's Bayesian estimates. On real-world data, ADMIXTURE's estimates are comparable to those from STRUCTURE and EIGENSTRAT. ADMIXTURE's computational speed allows for the use of a larger set of markers in ancestry estimation and enables effective correction for population stratification in association studies. ADMIXTURE estimates global ancestry, distinguishing it from local ancestry estimation. It uses a model-based approach, estimating ancestry coefficients as parameters of a statistical model. Unlike STRUCTURE, which uses a Bayesian approach with Markov chain Monte Carlo (MCMC) sampling, ADMIXTURE maximizes the likelihood rather than sampling the posterior. This approach is faster and more efficient, especially with high-dimensional data. ADMIXTURE employs a block relaxation algorithm that alternates between updating the ancestry coefficient matrix Q and the population allele frequency matrix F. Each update involves sequential quadratic programming, a method suitable for constrained optimization. A quasi-Newton acceleration technique is used to speed up convergence. ADMIXTURE's estimates are compared to those from STRUCTURE and FRAPPE, showing that it is faster and more accurate. ADMIXTURE's performance is validated through simulations and real-world data. It accurately recovers admixture coefficients and ancestral allele frequencies, with results comparable to STRUCTURE and EIGENSTRAT. In real data, ADMIXTURE correctly identifies admixture patterns in populations such as the MEX and ASW samples. It also performs well in association studies, correcting for population structure effectively. ADMIXTURE's speed and accuracy make it a valuable tool for ancestry estimation and population stratification correction. It is available as a standalone program and is freely downloadable. The program's efficiency allows for the use of a large number of markers, enhancing the accuracy of ancestry estimates. ADMIXTURE's results are comparable to those from other methods, making it a reliable choice for genetic studies.A new algorithm and program, ADMIXTURE, are introduced for model-based estimation of ancestry in unrelated individuals. ADMIXTURE uses a likelihood model similar to the widely used program STRUCTURE but runs significantly faster, solving problems in minutes that take STRUCTURE hours. It is also as fast as EIGENSTRAT, a popular program that uses principal component analysis. ADMIXTURE's maximum likelihood estimates of admixture coefficients and ancestral allele frequencies are as accurate as STRUCTURE's Bayesian estimates. On real-world data, ADMIXTURE's estimates are comparable to those from STRUCTURE and EIGENSTRAT. ADMIXTURE's computational speed allows for the use of a larger set of markers in ancestry estimation and enables effective correction for population stratification in association studies. ADMIXTURE estimates global ancestry, distinguishing it from local ancestry estimation. It uses a model-based approach, estimating ancestry coefficients as parameters of a statistical model. Unlike STRUCTURE, which uses a Bayesian approach with Markov chain Monte Carlo (MCMC) sampling, ADMIXTURE maximizes the likelihood rather than sampling the posterior. This approach is faster and more efficient, especially with high-dimensional data. ADMIXTURE employs a block relaxation algorithm that alternates between updating the ancestry coefficient matrix Q and the population allele frequency matrix F. Each update involves sequential quadratic programming, a method suitable for constrained optimization. A quasi-Newton acceleration technique is used to speed up convergence. ADMIXTURE's estimates are compared to those from STRUCTURE and FRAPPE, showing that it is faster and more accurate. ADMIXTURE's performance is validated through simulations and real-world data. It accurately recovers admixture coefficients and ancestral allele frequencies, with results comparable to STRUCTURE and EIGENSTRAT. In real data, ADMIXTURE correctly identifies admixture patterns in populations such as the MEX and ASW samples. It also performs well in association studies, correcting for population structure effectively. ADMIXTURE's speed and accuracy make it a valuable tool for ancestry estimation and population stratification correction. It is available as a standalone program and is freely downloadable. The program's efficiency allows for the use of a large number of markers, enhancing the accuracy of ancestry estimates. ADMIXTURE's results are comparable to those from other methods, making it a reliable choice for genetic studies.
Reach us at info@study.space
[slides and audio] Fast model-based estimation of ancestry in unrelated individuals.