ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads

ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads

1 October 2009 | Iain MacCallum*, Dariusz Przybylski*, Sante Gnerre*, Joshua Burton*, Ilya Shlyakhter*, Andreas Gnirke*, Joel Malek†‡, Kevin McKernan†, Swati Ranade§, Terrance P Shea*, Louise Williams*, Sarah Young*, Chad Nusbaum* and David B Jaffe*
ALLPATHS 2 is a method for assembling small genomes accurately and with high continuity from short paired reads. The study demonstrates that high-quality genome sequences can be generated from short paired reads, achieving results comparable to or better than the current standard of 'draft sequence' quality. Using 36-base and 26-base reads from five microbial genomes, ALLPATHS 2 produced assemblies with long, accurate contigs and scaffolds, outperforming Velvet and EULER-SR. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% for ALLPATHS 2, 68.7% for Velvet, and 42.1% for EULER-SR. The study used Illumina platform reads from two libraries, yielding linking information of different sizes. The inclusion of longer fragment libraries increased the potential contiguity of the assemblies. ALLPATHS 2 was tested on five finished microbial genomes, including three bacteria and two fungi. The assemblies were evaluated for contiguity, completeness, and accuracy. ALLPATHS 2 assemblies were highly contiguous, with N50 contig sizes ranging from 156 to 477 kb and N50 scaffold sizes from 611 to 2,680 kb. The assemblies were nearly complete, with coverage ranging from 98.5% to 99.3%. The fraction of perfect 10-kb chunks ranged from 99.3% to 99.8%, and the inferred base accuracy was approximately Q60, indicating about one error in 10^6 bases. For the fungal genomes, the assemblies were accurate but less complete and contiguous. The N50 contig sizes for S. pombe and N. crassa were 51 kb and 19 kb, respectively, with genome coverage of 95.9% and 89.5%. The base quality was approximately Q40, though this is a floor estimate. Long-range validity was very good for both bacterial and fungal assemblies. The study also compared ALLPATHS 2 with Velvet and EULER-SR, finding that ALLPATHS 2 assemblies were more accurate, with lower misassembly rates. The results showed that ALLPATHS 2 assemblies were significantly more accurate than those produced by Velvet and EULER-SR. The study concluded that ALLPATHS 2 can produce high-quality assemblies of small genomes from short reads, with accuracy comparable to finished sequences. The method is suitable for assembling large and complex genomes, such as those of mammals, with further improvements in read length, representation, library construction, and computational methods.ALLPATHS 2 is a method for assembling small genomes accurately and with high continuity from short paired reads. The study demonstrates that high-quality genome sequences can be generated from short paired reads, achieving results comparable to or better than the current standard of 'draft sequence' quality. Using 36-base and 26-base reads from five microbial genomes, ALLPATHS 2 produced assemblies with long, accurate contigs and scaffolds, outperforming Velvet and EULER-SR. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% for ALLPATHS 2, 68.7% for Velvet, and 42.1% for EULER-SR. The study used Illumina platform reads from two libraries, yielding linking information of different sizes. The inclusion of longer fragment libraries increased the potential contiguity of the assemblies. ALLPATHS 2 was tested on five finished microbial genomes, including three bacteria and two fungi. The assemblies were evaluated for contiguity, completeness, and accuracy. ALLPATHS 2 assemblies were highly contiguous, with N50 contig sizes ranging from 156 to 477 kb and N50 scaffold sizes from 611 to 2,680 kb. The assemblies were nearly complete, with coverage ranging from 98.5% to 99.3%. The fraction of perfect 10-kb chunks ranged from 99.3% to 99.8%, and the inferred base accuracy was approximately Q60, indicating about one error in 10^6 bases. For the fungal genomes, the assemblies were accurate but less complete and contiguous. The N50 contig sizes for S. pombe and N. crassa were 51 kb and 19 kb, respectively, with genome coverage of 95.9% and 89.5%. The base quality was approximately Q40, though this is a floor estimate. Long-range validity was very good for both bacterial and fungal assemblies. The study also compared ALLPATHS 2 with Velvet and EULER-SR, finding that ALLPATHS 2 assemblies were more accurate, with lower misassembly rates. The results showed that ALLPATHS 2 assemblies were significantly more accurate than those produced by Velvet and EULER-SR. The study concluded that ALLPATHS 2 can produce high-quality assemblies of small genomes from short reads, with accuracy comparable to finished sequences. The method is suitable for assembling large and complex genomes, such as those of mammals, with further improvements in read length, representation, library construction, and computational methods.
Reach us at info@study.space
Understanding Open Access Method