[slides and audio] METHODOLOGY ARTICLE Open Access

This paper introduces the Discriminant Analysis of Principal Components (DAPC), a new multivariate method for analyzing genetically structured populations. DAPC combines principal component analysis (PCA) with discriminant analysis (DA) to identify and describe genetically related clusters. It is particularly useful for large datasets and can be applied to both simulated and empirical data. DAPC uses sequential K-means clustering and model selection to infer genetic clusters when group priors are unknown. It provides assignment of individuals to groups, a visual assessment of between-population differentiation, and the contribution of individual alleles to population structuring. DAPC was evaluated using simulated data and compared to STRUCTURE, a Bayesian clustering method. Results showed that DAPC generally performs better than STRUCTURE in characterizing population subdivision. DAPC is faster than Bayesian clustering algorithms and can be applied to a wider range of datasets. It also allows for the graphical representation of between-group structures, making it suitable for unraveling complex population structures. The method was applied to two empirical datasets: microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. For the human dataset, DAPC identified four clusters that were consistent with previous findings. For the influenza dataset, DAPC revealed a temporal pattern of genetic diversity and identified a significant discontinuity between the 2005 and 2006 epidemics, which was attributed to the emergence of new alleles. DAPC is a versatile method that does not rely on specific population genetics models and can be applied to various types of quantitative data. It is particularly useful for association studies where population structuring can introduce spurious correlations. DAPC can also account for covariates, making it suitable for analyzing complex genetic patterns. The method is implemented in the adegenet package for R, which provides tools for population genetics and phylogenetics analysis. DAPC is a fast, powerful, and flexible tool for analyzing genetically structured populations and has potential applications beyond the study of population genetics.This paper introduces the Discriminant Analysis of Principal Components (DAPC), a new multivariate method for analyzing genetically structured populations. DAPC combines principal component analysis (PCA) with discriminant analysis (DA) to identify and describe genetically related clusters. It is particularly useful for large datasets and can be applied to both simulated and empirical data. DAPC uses sequential K-means clustering and model selection to infer genetic clusters when group priors are unknown. It provides assignment of individuals to groups, a visual assessment of between-population differentiation, and the contribution of individual alleles to population structuring. DAPC was evaluated using simulated data and compared to STRUCTURE, a Bayesian clustering method. Results showed that DAPC generally performs better than STRUCTURE in characterizing population subdivision. DAPC is faster than Bayesian clustering algorithms and can be applied to a wider range of datasets. It also allows for the graphical representation of between-group structures, making it suitable for unraveling complex population structures. The method was applied to two empirical datasets: microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. For the human dataset, DAPC identified four clusters that were consistent with previous findings. For the influenza dataset, DAPC revealed a temporal pattern of genetic diversity and identified a significant discontinuity between the 2005 and 2006 epidemics, which was attributed to the emergence of new alleles. DAPC is a versatile method that does not rely on specific population genetics models and can be applied to various types of quantitative data. It is particularly useful for association studies where population structuring can introduce spurious correlations. DAPC can also account for covariates, making it suitable for analyzing complex genetic patterns. The method is implemented in the adegenet package for R, which provides tools for population genetics and phylogenetics analysis. DAPC is a fast, powerful, and flexible tool for analyzing genetically structured populations and has potential applications beyond the study of population genetics.

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

2010 | Thibaut Jombart¹, Sébastien Devillard², François Balloux¹