This paper presents a method for accurate whole human genome sequencing using reversible terminator chemistry. The approach involves attaching single DNA molecules to a surface, amplifying them in situ, and using synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analyzed to generate high-quality sequence data. The method was applied to human genome sequencing on flow-sorted X chromosomes and then scaled to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. The approach generated accurate consensus sequences from over 30× average depth of paired 35-base reads, characterizing four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown.
The method enables accurate, rapid, and economical whole-genome re-sequencing and has applications in many other biomedical areas. DNA sequencing provides an unparalleled resource of genetic information, allowing the characterization of individual genomes, transcriptional states, and genetic variation in populations and disease. Until recently, sequencing projects were limited by the cost and throughput of Sanger sequencing. The raw data for the three billion base human genome sequence, completed in 2004, was generated over several years for ~300 million using several hundred capillary sequencers. More recently, an individual human genome sequence has been determined for ~10 million by capillary sequencing.
The paper describes a massively parallel synthetic sequencing approach that transforms the ability to use DNA and RNA sequence information in biological systems. The method was used to re-sequence an individual human genome to high accuracy, delivering data at very high throughput and low cost, and enabling the extraction of genetic information of high biological value, including single-nucleotide polymorphisms (SNPs) and structural variants.
The method involves generating high-density single-molecule arrays of genomic DNA fragments attached to the surface of the reaction chamber (the flow cell) and using isothermal 'bridging' amplification to form DNA 'clusters' from each fragment. The DNA in each cluster is linearized by cleavage within one adaptor sequence and denatured, generating single-stranded template for sequencing by synthesis to obtain a sequence read. Paired-read sequencing involves removing the original strands, leaving the complementary strand as template for the second sequencing reaction.
The method uses a set of four reversible terminators, each labeled with a different removable fluorophore, to ensure base-by-base nucleotide incorporation in a stepwise manner. After each cycle of incorporation, the identity of the inserted base is determined by laser-induced excitation of the fluorophores and imaging. The method was used to sequence a human bacterial artificial chromosome (BAC) clone containing 162,752 bp of the major histocompatibility complex, demonstrating high raw read accuracy.
The study also identified 92,485 candidate SNPs in the X chromosome using ELAND, most of which matched previous entries in the public database dbSNP. TheThis paper presents a method for accurate whole human genome sequencing using reversible terminator chemistry. The approach involves attaching single DNA molecules to a surface, amplifying them in situ, and using synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analyzed to generate high-quality sequence data. The method was applied to human genome sequencing on flow-sorted X chromosomes and then scaled to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. The approach generated accurate consensus sequences from over 30× average depth of paired 35-base reads, characterizing four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown.
The method enables accurate, rapid, and economical whole-genome re-sequencing and has applications in many other biomedical areas. DNA sequencing provides an unparalleled resource of genetic information, allowing the characterization of individual genomes, transcriptional states, and genetic variation in populations and disease. Until recently, sequencing projects were limited by the cost and throughput of Sanger sequencing. The raw data for the three billion base human genome sequence, completed in 2004, was generated over several years for ~300 million using several hundred capillary sequencers. More recently, an individual human genome sequence has been determined for ~10 million by capillary sequencing.
The paper describes a massively parallel synthetic sequencing approach that transforms the ability to use DNA and RNA sequence information in biological systems. The method was used to re-sequence an individual human genome to high accuracy, delivering data at very high throughput and low cost, and enabling the extraction of genetic information of high biological value, including single-nucleotide polymorphisms (SNPs) and structural variants.
The method involves generating high-density single-molecule arrays of genomic DNA fragments attached to the surface of the reaction chamber (the flow cell) and using isothermal 'bridging' amplification to form DNA 'clusters' from each fragment. The DNA in each cluster is linearized by cleavage within one adaptor sequence and denatured, generating single-stranded template for sequencing by synthesis to obtain a sequence read. Paired-read sequencing involves removing the original strands, leaving the complementary strand as template for the second sequencing reaction.
The method uses a set of four reversible terminators, each labeled with a different removable fluorophore, to ensure base-by-base nucleotide incorporation in a stepwise manner. After each cycle of incorporation, the identity of the inserted base is determined by laser-induced excitation of the fluorophores and imaging. The method was used to sequence a human bacterial artificial chromosome (BAC) clone containing 162,752 bp of the major histocompatibility complex, demonstrating high raw read accuracy.
The study also identified 92,485 candidate SNPs in the X chromosome using ELAND, most of which matched previous entries in the public database dbSNP. The