Population Structure and Eigenanalysis

Population Structure and Eigenanalysis

December 2006 | Volume 2 | Issue 12 | e190 | Nick Patterson, Alkes L. Price, David Reich
The paper by Patterson, Price, and Reich discusses the application of principal components analysis (PCA) to genetic data for studying population structure. They provide a statistical foundation for PCA, developing formal significance tests using modern statistical theory, particularly Tracy-Widom theory. The authors introduce a "phase change" phenomenon, where for a fixed dataset size, divergence between populations below a certain threshold is undetectable, but slightly above this threshold, detection becomes easy. This allows them to estimate the required dataset size to detect population structure. The methods are applicable to both biallelic and highly polymorphic genetic markers and can handle linked markers. The paper also explores the relationship between PCA and cluster-based methods, showing that they are closely related in terms of underlying models. The authors validate their methods through simulations and real data examples, demonstrating their effectiveness in uncovering population structure and detecting additional structure beyond what has already been identified. They conclude that their PCA-based approach is a robust and practical tool for analyzing genetic datasets, providing a solid statistical basis for understanding population structure.The paper by Patterson, Price, and Reich discusses the application of principal components analysis (PCA) to genetic data for studying population structure. They provide a statistical foundation for PCA, developing formal significance tests using modern statistical theory, particularly Tracy-Widom theory. The authors introduce a "phase change" phenomenon, where for a fixed dataset size, divergence between populations below a certain threshold is undetectable, but slightly above this threshold, detection becomes easy. This allows them to estimate the required dataset size to detect population structure. The methods are applicable to both biallelic and highly polymorphic genetic markers and can handle linked markers. The paper also explores the relationship between PCA and cluster-based methods, showing that they are closely related in terms of underlying models. The authors validate their methods through simulations and real data examples, demonstrating their effectiveness in uncovering population structure and detecting additional structure beyond what has already been identified. They conclude that their PCA-based approach is a robust and practical tool for analyzing genetic datasets, providing a solid statistical basis for understanding population structure.
Reach us at info@study.space