June 2014 | Anil Raj, Matthew Stephens, Jonathan K. Pritchard
The paper introduces fastSTRUCTURE, an efficient algorithm for inferring population structure from large SNP datasets using a variational Bayesian framework. The authors address the computational challenges posed by large modern data sets by developing fast inference tools based on recent advances in optimization theory. They propose heuristic scores to identify the number of populations in a dataset and a new hierarchical prior to detect weak population structure. fastSTRUCTURE is compared with STRUCTURE and ADMIXTURE on simulated data and the CEPH–Human Genome Diversity Panel, showing that it is almost two orders of magnitude faster than STRUCTURE and achieves comparable accuracy to ADMIXTURE. The heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations, with minimal bias when structure is very weak. The algorithm is freely available online.The paper introduces fastSTRUCTURE, an efficient algorithm for inferring population structure from large SNP datasets using a variational Bayesian framework. The authors address the computational challenges posed by large modern data sets by developing fast inference tools based on recent advances in optimization theory. They propose heuristic scores to identify the number of populations in a dataset and a new hierarchical prior to detect weak population structure. fastSTRUCTURE is compared with STRUCTURE and ADMIXTURE on simulated data and the CEPH–Human Genome Diversity Panel, showing that it is almost two orders of magnitude faster than STRUCTURE and achieves comparable accuracy to ADMIXTURE. The heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations, with minimal bias when structure is very weak. The algorithm is freely available online.