Inferring weak population structure with the assistance of sample group information

Inferring weak population structure with the assistance of sample group information

2009 September | Melissa J. Hubisz, Daniel Falush, Matthew Stephens, and Jonathan K. Pritchard
The paper introduces new models for the STRUCTURE program that improve the detection of population structure by incorporating sample group information. These models modify the prior distribution for individual population assignments, allowing the proportion of individuals in each cluster to vary by location. The new models are tested on simulated data and applied to microsatellite data from the CEPH Human Genome Diversity Panel. They demonstrate better performance in detecting population structure at lower levels of divergence and with less data compared to the original STRUCTURE models and principal components methods. The models are not biased towards detecting structure when it is not present and are implemented in a new version of STRUCTURE available online. The paper discusses the limitations of the original STRUCTURE algorithm when data contain little information about population structure. It highlights that the original model assumes all population partitions are equally likely, making it difficult to detect structure with limited data. The new models address this by placing more prior weight on clustering outcomes correlated with sampling locations, improving performance in data sets with few loci or individuals. The new models are tested in simulations with and without admixture. They show improved accuracy in estimating ancestry and admixture proportions, especially in data sets with small sample sizes. The models also perform well when sampling locations are uncorrelated with ancestry, and they provide similar results to the original models when population structure is strong. The paper also compares the new models to other methods, such as principal components analysis, and shows that the new models provide more accurate results in detecting population structure. The results indicate that the new models are particularly useful for data sets with limited information, where the original STRUCTURE models may fail to detect structure. The new models are implemented in a version of STRUCTURE that is freely available online. The paper concludes that the new models are recommended for most situations with limited data, especially when the original STRUCTURE models do not provide a clear signal of structure. The models are also useful for analyzing data where individuals can be classified into discrete groups based on phenotypic characteristics. The new models are described in detail, including their implementation in the STRUCTURE algorithm and their performance in various scenarios.The paper introduces new models for the STRUCTURE program that improve the detection of population structure by incorporating sample group information. These models modify the prior distribution for individual population assignments, allowing the proportion of individuals in each cluster to vary by location. The new models are tested on simulated data and applied to microsatellite data from the CEPH Human Genome Diversity Panel. They demonstrate better performance in detecting population structure at lower levels of divergence and with less data compared to the original STRUCTURE models and principal components methods. The models are not biased towards detecting structure when it is not present and are implemented in a new version of STRUCTURE available online. The paper discusses the limitations of the original STRUCTURE algorithm when data contain little information about population structure. It highlights that the original model assumes all population partitions are equally likely, making it difficult to detect structure with limited data. The new models address this by placing more prior weight on clustering outcomes correlated with sampling locations, improving performance in data sets with few loci or individuals. The new models are tested in simulations with and without admixture. They show improved accuracy in estimating ancestry and admixture proportions, especially in data sets with small sample sizes. The models also perform well when sampling locations are uncorrelated with ancestry, and they provide similar results to the original models when population structure is strong. The paper also compares the new models to other methods, such as principal components analysis, and shows that the new models provide more accurate results in detecting population structure. The results indicate that the new models are particularly useful for data sets with limited information, where the original STRUCTURE models may fail to detect structure. The new models are implemented in a version of STRUCTURE that is freely available online. The paper concludes that the new models are recommended for most situations with limited data, especially when the original STRUCTURE models do not provide a clear signal of structure. The models are also useful for analyzing data where individuals can be classified into discrete groups based on phenotypic characteristics. The new models are described in detail, including their implementation in the STRUCTURE algorithm and their performance in various scenarios.
Reach us at info@study.space