A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data

A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data

June 5, 2014 | David Coil, Guillaume Jospin, and Aaron E. Darling
A5-miseq is an updated pipeline for assembling microbial genomes from Illumina MiSeq data. It automates adapter trimming, quality filtering, error correction, contig and scaffold generation, and misassembly detection. Unlike the original A5 pipeline, A5-miseq uses long reads from the MiSeq, incorporates read pairing information during contig generation, and improves read trimming. These changes result in significantly better assemblies that recover more reference genes. A5-miseq is available under the GPL license and can be run on a laptop with minimal parameter tuning. It produces high-quality assemblies from as little as 20x sequence coverage. The pipeline consists of five steps: read cleaning, contig assembly, crude scaffolding, misassembly correction, and final scaffolding. A5-miseq improves upon A5 by using a more efficient contig assembly algorithm that exploits read pairing information, reducing the number of misassemblies. It also trims contaminated portions of reads instead of discarding entire reads. Benchmarking on GAGE-B data shows that A5-miseq outperforms A5 and other assemblers in terms of assembly accuracy, with higher NGA50 values, fewer misassemblies, and more full-length genes. A5-miseq requires less sequence data to achieve comparable contiguity and is computationally efficient. It is suitable for researchers with limited bioinformatics experience or computing resources. The authors note that A5-miseq has not been tuned for the GAGE-B dataset and that other assemblers may produce better results. A5-miseq provides automated adapter trimming, more full-length genes, NCBI-ready outputs, and base call quality scores. Researchers are encouraged to become familiar with various assembly algorithms before selecting a specific approach.A5-miseq is an updated pipeline for assembling microbial genomes from Illumina MiSeq data. It automates adapter trimming, quality filtering, error correction, contig and scaffold generation, and misassembly detection. Unlike the original A5 pipeline, A5-miseq uses long reads from the MiSeq, incorporates read pairing information during contig generation, and improves read trimming. These changes result in significantly better assemblies that recover more reference genes. A5-miseq is available under the GPL license and can be run on a laptop with minimal parameter tuning. It produces high-quality assemblies from as little as 20x sequence coverage. The pipeline consists of five steps: read cleaning, contig assembly, crude scaffolding, misassembly correction, and final scaffolding. A5-miseq improves upon A5 by using a more efficient contig assembly algorithm that exploits read pairing information, reducing the number of misassemblies. It also trims contaminated portions of reads instead of discarding entire reads. Benchmarking on GAGE-B data shows that A5-miseq outperforms A5 and other assemblers in terms of assembly accuracy, with higher NGA50 values, fewer misassemblies, and more full-length genes. A5-miseq requires less sequence data to achieve comparable contiguity and is computationally efficient. It is suitable for researchers with limited bioinformatics experience or computing resources. The authors note that A5-miseq has not been tuned for the GAGE-B dataset and that other assemblers may produce better results. A5-miseq provides automated adapter trimming, more full-length genes, NCBI-ready outputs, and base call quality scores. Researchers are encouraged to become familiar with various assembly algorithms before selecting a specific approach.
Reach us at info@study.space