The complete sequence of a human genome

The complete sequence of a human genome

2022 April | Evan E. Eichler, Karen H. Miga, Adam M. Phillippy
A complete human genome sequence has been published, revealing previously unsequenced regions of the genome. The Telomere-to-Telomere (T2T) Consortium has assembled a 3.055 billion base pair (bp) sequence of a human genome, T2T-CHM13, which includes gapless assemblies for all chromosomes except Y. This sequence corrects errors in the prior reference and introduces nearly 200 million bp of new sequence containing 1,956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions for variational and functional studies. The current human reference genome, GRCh38, has significant gaps, including regions of pericentromeric and subtelomeric areas, recent segmental duplications, and ribosomal DNA (rDNA) arrays. These gaps have been addressed by the T2T-CHM13 assembly, which uses long-read sequencing technologies to overcome the limitations of BAC-based assembly. The T2T-CHM13 assembly includes a complete sequence of the human genome, including the short arms of five acrocentric chromosomes and centromeric regions, which were previously unsequenced. The T2T-CHM13 assembly is more complete, accurate, and representative than the current human reference genome, GRCh38. It includes 238 Mbp of sequence that does not co-linearly align to GRCh38 over a 1 Mbp interval, primarily comprising centromeric satellites, non-satellite segmental duplications, and rDNAs. The assembly also includes 219 complete rDNA copies, totaling 9.9 Mbp of sequence. The T2T-CHM13 assembly has been validated and polished, resulting in a complete, telomere-to-telomere assembly of a human genome. It includes a comprehensive annotation of the genome, with 63,494 genes and 233,615 transcripts, of which 19,969 genes (86,245 transcripts) are predicted to be protein coding. The assembly also includes a detailed analysis of the genomic structure of the short arms of the five acrocentric chromosomes, which have remained largely unsequenced due to their enrichment for satellite repeats and segmental duplications. The T2T-CHM13 assembly has significant implications for the field of genomics, as it provides a more complete and accurate reference genome that can be used for variant calling and other genomic studies. It also highlights the importance of long-read sequencing technologies in overcoming the limitations of traditional sequencing methods. The assembly is expected to drive future discovery in human genomic health and disease.A complete human genome sequence has been published, revealing previously unsequenced regions of the genome. The Telomere-to-Telomere (T2T) Consortium has assembled a 3.055 billion base pair (bp) sequence of a human genome, T2T-CHM13, which includes gapless assemblies for all chromosomes except Y. This sequence corrects errors in the prior reference and introduces nearly 200 million bp of new sequence containing 1,956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions for variational and functional studies. The current human reference genome, GRCh38, has significant gaps, including regions of pericentromeric and subtelomeric areas, recent segmental duplications, and ribosomal DNA (rDNA) arrays. These gaps have been addressed by the T2T-CHM13 assembly, which uses long-read sequencing technologies to overcome the limitations of BAC-based assembly. The T2T-CHM13 assembly includes a complete sequence of the human genome, including the short arms of five acrocentric chromosomes and centromeric regions, which were previously unsequenced. The T2T-CHM13 assembly is more complete, accurate, and representative than the current human reference genome, GRCh38. It includes 238 Mbp of sequence that does not co-linearly align to GRCh38 over a 1 Mbp interval, primarily comprising centromeric satellites, non-satellite segmental duplications, and rDNAs. The assembly also includes 219 complete rDNA copies, totaling 9.9 Mbp of sequence. The T2T-CHM13 assembly has been validated and polished, resulting in a complete, telomere-to-telomere assembly of a human genome. It includes a comprehensive annotation of the genome, with 63,494 genes and 233,615 transcripts, of which 19,969 genes (86,245 transcripts) are predicted to be protein coding. The assembly also includes a detailed analysis of the genomic structure of the short arms of the five acrocentric chromosomes, which have remained largely unsequenced due to their enrichment for satellite repeats and segmental duplications. The T2T-CHM13 assembly has significant implications for the field of genomics, as it provides a more complete and accurate reference genome that can be used for variant calling and other genomic studies. It also highlights the importance of long-read sequencing technologies in overcoming the limitations of traditional sequencing methods. The assembly is expected to drive future discovery in human genomic health and disease.
Reach us at info@study.space