January 25, 2011 | Sante Gnerre, Iain MacCallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance P. Shea, Sean Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Eric S. Lander, and David B. Jaffe
This paper presents the development of an algorithm, ALLPATHS-LG, for de novo genome assembly using massively parallel sequencing data. The algorithm was applied to human and mouse genomes sequenced on the Illumina platform, resulting in high-quality draft assemblies with good accuracy, contiguity, and coverage. The assemblies achieved scaffold sizes comparable to those obtained with older capillary-based sequencing methods. The study highlights the potential of improved sequencing technology and computational methods to significantly reduce the cost of de novo genome assembly.
The algorithm addresses challenges in assembling large, repeat-rich genomes by improving error correction, handling repetitive sequences, using jumping libraries, and optimizing memory usage. It also allows for efficient assembly of both large and small genomes. The resulting assemblies showed high base accuracy (≥99.95%) and good short- and long-range accuracy when compared to finished reference genomes. The assemblies covered most of the human and mouse genomes, with the exception of repetitive sequences and segmental duplications, which remain challenging for de novo assembly.
The study also discusses the limitations of current de novo assembly methods and the need for further improvements to achieve higher accuracy and completeness. The results demonstrate that it is possible to generate high-quality genome assemblies at a cost that is approximately 1,000-fold lower than with capillary-based sequencing. The ALLPATHS-LG algorithm is available for public use and has been applied to both human and mouse genomes, showing that it can produce assemblies that approach the quality of those obtained with traditional sequencing methods. The study emphasizes the importance of continued research and development in genome assembly to improve the accuracy and completeness of de novo assemblies, particularly for complex regions such as segmental duplications.This paper presents the development of an algorithm, ALLPATHS-LG, for de novo genome assembly using massively parallel sequencing data. The algorithm was applied to human and mouse genomes sequenced on the Illumina platform, resulting in high-quality draft assemblies with good accuracy, contiguity, and coverage. The assemblies achieved scaffold sizes comparable to those obtained with older capillary-based sequencing methods. The study highlights the potential of improved sequencing technology and computational methods to significantly reduce the cost of de novo genome assembly.
The algorithm addresses challenges in assembling large, repeat-rich genomes by improving error correction, handling repetitive sequences, using jumping libraries, and optimizing memory usage. It also allows for efficient assembly of both large and small genomes. The resulting assemblies showed high base accuracy (≥99.95%) and good short- and long-range accuracy when compared to finished reference genomes. The assemblies covered most of the human and mouse genomes, with the exception of repetitive sequences and segmental duplications, which remain challenging for de novo assembly.
The study also discusses the limitations of current de novo assembly methods and the need for further improvements to achieve higher accuracy and completeness. The results demonstrate that it is possible to generate high-quality genome assemblies at a cost that is approximately 1,000-fold lower than with capillary-based sequencing. The ALLPATHS-LG algorithm is available for public use and has been applied to both human and mouse genomes, showing that it can produce assemblies that approach the quality of those obtained with traditional sequencing methods. The study emphasizes the importance of continued research and development in genome assembly to improve the accuracy and completeness of de novo assemblies, particularly for complex regions such as segmental duplications.