January 25, 2011 | Sante Gnerre, Iain MacCallum, Dariusz Przybylski, Felipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance P. Shea, Sean Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, and David B. Jaffe
The paper presents the development and application of an algorithm called ALLPATHS-LG for de novo genome assembly from massively parallel DNA sequencing data. The authors demonstrate that this algorithm can produce high-quality draft assemblies of the human and mouse genomes, achieving good accuracy, short-range contiguity, long-range connectivity, and genome coverage. The resulting assemblies have base accuracy of at least 99.95% and scaffold sizes of 11.5 Mb for humans and 7.2 Mb for mice, approaching the quality of assemblies obtained with capillary-based sequencing. The study highlights the potential of combining improved sequencing technology with advanced computational methods to significantly reduce the cost of generating high-quality draft genome assemblies. The ALLPATHS-LG program is freely available and can assemble mammalian genomes on commercial servers within a few weeks. The paper also discusses the challenges and limitations of segmental duplications and gaps in the assemblies, emphasizing the need for further improvements in both algorithms and data to achieve even higher-quality assemblies.The paper presents the development and application of an algorithm called ALLPATHS-LG for de novo genome assembly from massively parallel DNA sequencing data. The authors demonstrate that this algorithm can produce high-quality draft assemblies of the human and mouse genomes, achieving good accuracy, short-range contiguity, long-range connectivity, and genome coverage. The resulting assemblies have base accuracy of at least 99.95% and scaffold sizes of 11.5 Mb for humans and 7.2 Mb for mice, approaching the quality of assemblies obtained with capillary-based sequencing. The study highlights the potential of combining improved sequencing technology with advanced computational methods to significantly reduce the cost of generating high-quality draft genome assemblies. The ALLPATHS-LG program is freely available and can assemble mammalian genomes on commercial servers within a few weeks. The paper also discusses the challenges and limitations of segmental duplications and gaps in the assemblies, emphasizing the need for further improvements in both algorithms and data to achieve even higher-quality assemblies.