Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

2010 | Thibaut Jombart, Sébastien Devillard, François Balloux
The paper introduces Discriminant Analysis of Principal Components (DAPC), a new multivariate method for analyzing genetically structured populations. DAPC is designed to identify and describe clusters of genetically related individuals, using sequential K-means and model selection to infer genetic clusters when group priors are lacking. The method extracts rich information from genetic data, including assignment of individuals to groups, visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. DAPC is evaluated using simulated data and compared to STRUCTURE, showing better performance in characterizing population subdivision. The method is also applied to real-world datasets, such as microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. DAPC is faster than Bayesian clustering algorithms and can handle large datasets efficiently. The paper concludes by discussing the advantages of DAPC, including its versatility, speed, and applicability to various types of data.The paper introduces Discriminant Analysis of Principal Components (DAPC), a new multivariate method for analyzing genetically structured populations. DAPC is designed to identify and describe clusters of genetically related individuals, using sequential K-means and model selection to infer genetic clusters when group priors are lacking. The method extracts rich information from genetic data, including assignment of individuals to groups, visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. DAPC is evaluated using simulated data and compared to STRUCTURE, showing better performance in characterizing population subdivision. The method is also applied to real-world datasets, such as microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. DAPC is faster than Bayesian clustering algorithms and can handle large datasets efficiently. The paper concludes by discussing the advantages of DAPC, including its versatility, speed, and applicability to various types of data.
Reach us at info@study.space