28 OCTOBER 2010 | The 1000 Genomes Project Consortium
The 1000 Genomes Project aimed to characterize human genome sequence variation to understand the relationship between genotype and phenotype. The project included three phases: low-coverage whole-genome sequencing of 179 individuals from four populations, high-coverage sequencing of two mother-father-child trios, and exon-targeted sequencing of 697 individuals from seven populations. The results identified approximately 15 million SNPs, 1 million short indels, and 20,000 structural variants, most of which were previously unknown. Over 95% of currently accessible variants were cataloged, and each person was found to carry approximately 250–300 loss-of-function variants and 50–100 variants previously implicated in inherited disorders. The project estimated the de novo mutation rate at ~10⁻⁸ per base pair per generation and identified a marked reduction in genetic variation near genes due to selection at linked sites. The data provided insights into functional variation, genetic association, and natural selection.
The project's data, including 4.9 terabases of DNA sequence, were used to generate variant calls, genotypes, and haplotypes. The data were aligned to the NCBI36 reference genome, with a focus on the 'accessible genome' to reduce incorrect alignments. The project introduced innovations in variant calling, including recalibration of base quality scores, local realignment, and consensus genotyping. The results showed that the trio project identified 5.9 million SNPs, 650,000 short indels, and over 14,000 structural variants, while the low-coverage project identified 14.4 million SNPs, 1.3 million short indels, and over 20,000 structural variants. The exon project identified 12,758 SNPs and 96 indels.
The project's data revealed that variation was not evenly distributed across the genome, with high variation in regions like HLA and subtelomeres, and low variation in gene-dense regions. Novel variants were more common in African populations, and the project detected a large number of low-frequency variants. The project also identified 68,300 non-synonymous SNPs, with 34,161 being novel. The data showed that the majority of common variants were already in dbSNP, but many low-frequency and rare variants were not. The project's data were used to improve imputation accuracy and to identify associations between genetic variants and diseases.
The project's data also provided insights into mutation, recombination, and natural selection. The trio project identified de novo mutations, with an estimated mutation rate of ~10⁻⁸ per base pair per generation. The project also found that natural selection reduced genetic variation near genes. The data were used to improve the understanding of genetic variation andThe 1000 Genomes Project aimed to characterize human genome sequence variation to understand the relationship between genotype and phenotype. The project included three phases: low-coverage whole-genome sequencing of 179 individuals from four populations, high-coverage sequencing of two mother-father-child trios, and exon-targeted sequencing of 697 individuals from seven populations. The results identified approximately 15 million SNPs, 1 million short indels, and 20,000 structural variants, most of which were previously unknown. Over 95% of currently accessible variants were cataloged, and each person was found to carry approximately 250–300 loss-of-function variants and 50–100 variants previously implicated in inherited disorders. The project estimated the de novo mutation rate at ~10⁻⁸ per base pair per generation and identified a marked reduction in genetic variation near genes due to selection at linked sites. The data provided insights into functional variation, genetic association, and natural selection.
The project's data, including 4.9 terabases of DNA sequence, were used to generate variant calls, genotypes, and haplotypes. The data were aligned to the NCBI36 reference genome, with a focus on the 'accessible genome' to reduce incorrect alignments. The project introduced innovations in variant calling, including recalibration of base quality scores, local realignment, and consensus genotyping. The results showed that the trio project identified 5.9 million SNPs, 650,000 short indels, and over 14,000 structural variants, while the low-coverage project identified 14.4 million SNPs, 1.3 million short indels, and over 20,000 structural variants. The exon project identified 12,758 SNPs and 96 indels.
The project's data revealed that variation was not evenly distributed across the genome, with high variation in regions like HLA and subtelomeres, and low variation in gene-dense regions. Novel variants were more common in African populations, and the project detected a large number of low-frequency variants. The project also identified 68,300 non-synonymous SNPs, with 34,161 being novel. The data showed that the majority of common variants were already in dbSNP, but many low-frequency and rare variants were not. The project's data were used to improve imputation accuracy and to identify associations between genetic variants and diseases.
The project's data also provided insights into mutation, recombination, and natural selection. The trio project identified de novo mutations, with an estimated mutation rate of ~10⁻⁸ per base pair per generation. The project also found that natural selection reduced genetic variation near genes. The data were used to improve the understanding of genetic variation and