De novo assembly of human genomes with massively parallel short read sequencing

De novo assembly of human genomes with massively parallel short read sequencing

2010 | Ruiqiang Li, Hongmei Zhu, Jue Ruan, Wubin Qian, Xiaodong Fang, Zhongbin Shi, Yingrui Li, Shengting Li, Gao Shan, Karsten Kristiansen, Songgang Li, Huanming Yang, Jian Wang, Jun Wang
This study presents a novel method for de novo assembly of human genomes using short read sequencing data. The method, called SOAPdenovo, successfully assembled both the Asian and African human genomes, achieving an N50 contig size of 7.4 kb and 5.9 kb, and a scaffold size of 446.3 kb and 61.9 kb, respectively. The method uses a de Bruijn graph data structure to represent short read overlaps, allowing efficient assembly of large genomes. The assembly process involved error correction, contig assembly, scaffolding, and gap closure. The resulting assemblies were compared to the NCBI reference genome and demonstrated the ability to accurately identify structural variations, including small deletions and insertions. The method was tested on a large dataset of 200 Gb of Illumina GA reads, and the assemblies were validated against the NCBI reference genome. The study highlights the potential of de novo assembly methods for building reference sequences and analyzing unexplored genomes in a cost-effective manner. The method is available as open-source software and has been integrated into the SOAP package for short read alignment. The study also discusses the computational complexity and performance of the method compared to other assemblers, and highlights the importance of paired-end information in improving assembly accuracy. The results show that the method can achieve high-quality assemblies with high coverage and accuracy, making it a valuable tool for genome research.This study presents a novel method for de novo assembly of human genomes using short read sequencing data. The method, called SOAPdenovo, successfully assembled both the Asian and African human genomes, achieving an N50 contig size of 7.4 kb and 5.9 kb, and a scaffold size of 446.3 kb and 61.9 kb, respectively. The method uses a de Bruijn graph data structure to represent short read overlaps, allowing efficient assembly of large genomes. The assembly process involved error correction, contig assembly, scaffolding, and gap closure. The resulting assemblies were compared to the NCBI reference genome and demonstrated the ability to accurately identify structural variations, including small deletions and insertions. The method was tested on a large dataset of 200 Gb of Illumina GA reads, and the assemblies were validated against the NCBI reference genome. The study highlights the potential of de novo assembly methods for building reference sequences and analyzing unexplored genomes in a cost-effective manner. The method is available as open-source software and has been integrated into the SOAP package for short read alignment. The study also discusses the computational complexity and performance of the method compared to other assemblers, and highlights the importance of paired-end information in improving assembly accuracy. The results show that the method can achieve high-quality assemblies with high coverage and accuracy, making it a valuable tool for genome research.
Reach us at info@study.space
[slides] De novo assembly of human genomes with massively parallel short read sequencing. | StudySpace