1 OCTOBER 2015 | The 1000 Genomes Project Consortium
The 1000 Genomes Project aimed to comprehensively describe common human genetic variation by sequencing the genomes of 2,504 individuals from 26 populations using whole-genome sequencing, exome sequencing, and microarray genotyping. The project identified over 88 million genetic variants, including 84.7 million SNPs, 3.6 million indels, and 60,000 structural variants, all phased onto high-quality haplotypes. This resource includes over 99% of SNP variants with a frequency of more than 1% for various ancestries. The project provided insights into the distribution of genetic variation across the global sample and its implications for common disease studies. The 1000 Genomes Project has contributed to understanding genetic diversity, disease biology, and advanced methods for array design, genotype imputation, variant cataloging, and filtering of neutral variants. The project sampled individuals from 26 populations across Africa, East Asia, Europe, South Asia, and the Americas, using whole-genome sequencing and exome sequencing, along with high-density SNP microarrays. The project expanded analysis to include multi-allelic SNPs, indels, and structural variants, using an ensemble of 24 sequence analysis tools and machine-learning classifiers to identify high-quality variants. The project also developed a haplotype scaffold using statistical methods and array genotypes, incorporating high-confidence bi-allelic variants and placing multi-allelic and structural variants onto the haplotype scaffold. The project discovered, genotyped, and phased 88 million variant sites, contributing or validating 80 million of the 100 million variants in the public dbSNP catalogue. The project also provided a benchmark for surveys of human genetic variation and is a key component for human genetic studies. The project's data set includes a broad representation of human genetic variation, with a typical genome differing from the reference human genome at 4.1 to 5.0 million sites. The project identified rare variants, with most variants being rare, but the majority of variants in a single genome being common. The project also identified putatively functional variation, including variants affecting gene function and regulatory regions. The project analyzed the sharing of genetic variants among populations, revealing population history and genetic similarities between related populations. The project also analyzed demographic history, showing a shared demographic history for all humans beyond 150,000 to 200,000 years ago. The project also analyzed the resolution of genetic association studies, showing that imputation accuracy varies by population and that African populations have greater genetic diversity, leading to more accurate imputation. The project also analyzed the impact of the new reference panel on GWAS, showing that the new panel improved the number of imputed common and intermediate frequency variants and rare variants. The project also analyzed the resolution of genetic association studies, showing that imputation accuracy varies byThe 1000 Genomes Project aimed to comprehensively describe common human genetic variation by sequencing the genomes of 2,504 individuals from 26 populations using whole-genome sequencing, exome sequencing, and microarray genotyping. The project identified over 88 million genetic variants, including 84.7 million SNPs, 3.6 million indels, and 60,000 structural variants, all phased onto high-quality haplotypes. This resource includes over 99% of SNP variants with a frequency of more than 1% for various ancestries. The project provided insights into the distribution of genetic variation across the global sample and its implications for common disease studies. The 1000 Genomes Project has contributed to understanding genetic diversity, disease biology, and advanced methods for array design, genotype imputation, variant cataloging, and filtering of neutral variants. The project sampled individuals from 26 populations across Africa, East Asia, Europe, South Asia, and the Americas, using whole-genome sequencing and exome sequencing, along with high-density SNP microarrays. The project expanded analysis to include multi-allelic SNPs, indels, and structural variants, using an ensemble of 24 sequence analysis tools and machine-learning classifiers to identify high-quality variants. The project also developed a haplotype scaffold using statistical methods and array genotypes, incorporating high-confidence bi-allelic variants and placing multi-allelic and structural variants onto the haplotype scaffold. The project discovered, genotyped, and phased 88 million variant sites, contributing or validating 80 million of the 100 million variants in the public dbSNP catalogue. The project also provided a benchmark for surveys of human genetic variation and is a key component for human genetic studies. The project's data set includes a broad representation of human genetic variation, with a typical genome differing from the reference human genome at 4.1 to 5.0 million sites. The project identified rare variants, with most variants being rare, but the majority of variants in a single genome being common. The project also identified putatively functional variation, including variants affecting gene function and regulatory regions. The project analyzed the sharing of genetic variants among populations, revealing population history and genetic similarities between related populations. The project also analyzed demographic history, showing a shared demographic history for all humans beyond 150,000 to 200,000 years ago. The project also analyzed the resolution of genetic association studies, showing that imputation accuracy varies by population and that African populations have greater genetic diversity, leading to more accurate imputation. The project also analyzed the impact of the new reference panel on GWAS, showing that the new panel improved the number of imputed common and intermediate frequency variants and rare variants. The project also analyzed the resolution of genetic association studies, showing that imputation accuracy varies by