ALLPATHS: De novo assembly of whole-genome shotgun microreads

ALLPATHS: De novo assembly of whole-genome shotgun microreads

2008 | Jonathan Butler, Iain MacCallum, Michael Kleber, Ilya A. Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, and David B. Jaffe
The paper presents a theoretical and computational solution to the challenge of de novo assembly from whole-genome shotgun "microreads" produced by new DNA sequencing technologies. The authors describe an algorithm called ALLPATHS, which is designed to handle the very short reads (25-50 bases) and high coverage (80×) generated by technologies like Illumina-Solexa. The algorithm is applied to simulated data based on real Solexa reads, and the results for small to mid-size genomes (up to 39 Mb) are presented. The key concepts in the ALLPATHS algorithm include finding all paths across a read pair and localization, which helps in isolating and assembling small regions of the genome independently. The assemblies are presented as graphs that retain intrinsic ambiguities, such as those arising from polymorphism, providing a more comprehensive view of the genome sequence. The paper also discusses the limitations of unpaired-read assembly and the complexities of paired-read assembly, highlighting the computational challenges and strategies to overcome them. The results show that the ALLPATHS algorithm can produce high-quality assemblies with high completeness, continuity, and accuracy, even for complex genomes.The paper presents a theoretical and computational solution to the challenge of de novo assembly from whole-genome shotgun "microreads" produced by new DNA sequencing technologies. The authors describe an algorithm called ALLPATHS, which is designed to handle the very short reads (25-50 bases) and high coverage (80×) generated by technologies like Illumina-Solexa. The algorithm is applied to simulated data based on real Solexa reads, and the results for small to mid-size genomes (up to 39 Mb) are presented. The key concepts in the ALLPATHS algorithm include finding all paths across a read pair and localization, which helps in isolating and assembling small regions of the genome independently. The assemblies are presented as graphs that retain intrinsic ambiguities, such as those arising from polymorphism, providing a more comprehensive view of the genome sequence. The paper also discusses the limitations of unpaired-read assembly and the complexities of paired-read assembly, highlighting the computational challenges and strategies to overcome them. The results show that the ALLPATHS algorithm can produce high-quality assemblies with high completeness, continuity, and accuracy, even for complex genomes.
Reach us at info@study.space