Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

June 8, 2017 | Ryan R. Wick*, Louise M. Judd, Claire L. Gorrie, Kathryn E. Holt
Unicycler is a new tool for assembling bacterial genomes from short and long reads, producing accurate, complete, and cost-effective assemblies. It uses SPAdes to build an initial assembly graph from short reads and then simplifies the graph using information from short and long reads. Unicycler uses a semi-global aligner to align long reads to the assembly graph. Tests on synthetic and real reads show that Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler. Bacterial genomics is dominated by Illumina sequencing, which produces accurate but short reads. However, short reads are not sufficient to resolve the full genome, leading to fragmented assemblies. Long reads from PacBio and Oxford Nanopore Technologies can produce complete assemblies but are more expensive and error-prone. Hybrid assembly, combining short and long reads, offers a cost-effective alternative. Unicycler uses a short-read-first approach, building an assembly graph from short reads and then using long reads to resolve repeats and simplify the graph. It uses a greedy algorithm to assign copy numbers to contigs based on depth and graph connections. Bridges are used to connect single-copy contigs, with long-read bridges providing greater accuracy. Unicycler also uses semi-global alignment to align long reads to single-copy contigs and uses this information to find the best graph paths. The final assembly is polished using short-read alignments to reduce errors. Unicycler can be run in three modes: conservative, normal, and bold. Conservative mode uses high-quality bridges, normal mode uses a moderate threshold, and bold mode uses lower-quality bridges. Unicycler's performance was evaluated using simulated and real reads from eight species, including E. coli. It outperformed other assemblers in terms of misassembly rates and NGA50 values. Unicycler's performance was also tested on real E. coli K-12 reads and on a Klebsiella pneumoniae isolate, where it produced accurate assemblies. Unicycler is open source and available for use in other pipelines. It is designed to be efficient and accurate, with a focus on minimizing misassemblies and improving the quality of bacterial genome assemblies.Unicycler is a new tool for assembling bacterial genomes from short and long reads, producing accurate, complete, and cost-effective assemblies. It uses SPAdes to build an initial assembly graph from short reads and then simplifies the graph using information from short and long reads. Unicycler uses a semi-global aligner to align long reads to the assembly graph. Tests on synthetic and real reads show that Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler. Bacterial genomics is dominated by Illumina sequencing, which produces accurate but short reads. However, short reads are not sufficient to resolve the full genome, leading to fragmented assemblies. Long reads from PacBio and Oxford Nanopore Technologies can produce complete assemblies but are more expensive and error-prone. Hybrid assembly, combining short and long reads, offers a cost-effective alternative. Unicycler uses a short-read-first approach, building an assembly graph from short reads and then using long reads to resolve repeats and simplify the graph. It uses a greedy algorithm to assign copy numbers to contigs based on depth and graph connections. Bridges are used to connect single-copy contigs, with long-read bridges providing greater accuracy. Unicycler also uses semi-global alignment to align long reads to single-copy contigs and uses this information to find the best graph paths. The final assembly is polished using short-read alignments to reduce errors. Unicycler can be run in three modes: conservative, normal, and bold. Conservative mode uses high-quality bridges, normal mode uses a moderate threshold, and bold mode uses lower-quality bridges. Unicycler's performance was evaluated using simulated and real reads from eight species, including E. coli. It outperformed other assemblers in terms of misassembly rates and NGA50 values. Unicycler's performance was also tested on real E. coli K-12 reads and on a Klebsiella pneumoniae isolate, where it produced accurate assemblies. Unicycler is open source and available for use in other pipelines. It is designed to be efficient and accurate, with a focus on minimizing misassemblies and improving the quality of bacterial genome assemblies.
Reach us at info@study.space
Understanding Unicycler%3A Resolving bacterial genome assemblies from short and long sequencing reads