2008 November 6 | John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew Stephens, and Carlos D. Bustamante
A study of 3,000 European individuals genotyped at over half a million variable DNA sites reveals that genetic variation in Europe closely mirrors geographic distance. The results show that a geographical map of Europe can be used as an efficient two-dimensional summary of genetic variation. This finding highlights the importance of accounting for genetic structure when mapping the genetic basis of disease phenotypes, as spurious associations can arise if not properly addressed. The study also demonstrates that an individual's DNA can be used to infer their geographic origin with surprising accuracy, often within a few hundred kilometers.
The research analyzed genetic variation in 3,192 European individuals, genotyped at 500,568 loci using the Affymetrix 500K SNP chip. After applying various stringency criteria, the researchers focused on data from 1,387 individuals with high confidence of individual origins. Principal components analysis (PCA) was used to produce a two-dimensional visual summary of the observed genetic variation, which closely resembled a geographic map of Europe. The results show that individuals from the same geographic region cluster together, and major populations are distinguishable. The data reveal structure even among French-, German- and Italian-speaking groups within Switzerland, and between Ireland and the United Kingdom.
The study also found that the direction and strength of the PC axes may reflect a special role for the geographic axis in the demographic history of Europeans. The first principal component (PC1) aligns north-northwest/south-southeast and accounts for approximately twice the amount of variation as PC2. The results suggest that European DNA samples can be very informative about the geographical origins of their donors. Using a multiple-regression-based assignment approach, one can place 50% of individuals within 310 km of their reported origin and 90% within 700 km of their origin.
The study also highlights the importance of considering geographic distribution when evaluating genome-wide association studies among Europeans. The results suggest that population structure correction may be important even in seemingly closely related populations such as Europeans. The success of PCA-based correction is not unexpected here, because the PCs are excellent predictors of latitude and longitude. The study also notes that the geographic resolution presented here is only a lower bound on the performance possible in the near future. The results provide an important insight: the power to detect subtle population structure, and in turn the promise of genetic ancestry tests, may be more substantial than previously imagined.A study of 3,000 European individuals genotyped at over half a million variable DNA sites reveals that genetic variation in Europe closely mirrors geographic distance. The results show that a geographical map of Europe can be used as an efficient two-dimensional summary of genetic variation. This finding highlights the importance of accounting for genetic structure when mapping the genetic basis of disease phenotypes, as spurious associations can arise if not properly addressed. The study also demonstrates that an individual's DNA can be used to infer their geographic origin with surprising accuracy, often within a few hundred kilometers.
The research analyzed genetic variation in 3,192 European individuals, genotyped at 500,568 loci using the Affymetrix 500K SNP chip. After applying various stringency criteria, the researchers focused on data from 1,387 individuals with high confidence of individual origins. Principal components analysis (PCA) was used to produce a two-dimensional visual summary of the observed genetic variation, which closely resembled a geographic map of Europe. The results show that individuals from the same geographic region cluster together, and major populations are distinguishable. The data reveal structure even among French-, German- and Italian-speaking groups within Switzerland, and between Ireland and the United Kingdom.
The study also found that the direction and strength of the PC axes may reflect a special role for the geographic axis in the demographic history of Europeans. The first principal component (PC1) aligns north-northwest/south-southeast and accounts for approximately twice the amount of variation as PC2. The results suggest that European DNA samples can be very informative about the geographical origins of their donors. Using a multiple-regression-based assignment approach, one can place 50% of individuals within 310 km of their reported origin and 90% within 700 km of their origin.
The study also highlights the importance of considering geographic distribution when evaluating genome-wide association studies among Europeans. The results suggest that population structure correction may be important even in seemingly closely related populations such as Europeans. The success of PCA-based correction is not unexpected here, because the PCs are excellent predictors of latitude and longitude. The study also notes that the geographic resolution presented here is only a lower bound on the performance possible in the near future. The results provide an important insight: the power to detect subtle population structure, and in turn the promise of genetic ancestry tests, may be more substantial than previously imagined.