NOVOPlasty: de novo assembly of organelle genomes from whole genome data

NOVOPlasty: de novo assembly of organelle genomes from whole genome data

2017 | Nicolas Dierckxsens, Patrick Mardulyn and Guillaume Smits
NOVOPlasty is a de novo assembler that efficiently assembles mitochondrial and chloroplast genomes from whole genome data. It uses a seed-and-extend algorithm, starting from a related or distant single seed sequence, to assemble organelle genomes. The algorithm was tested on new and public whole genome data sets, where it outperformed existing assemblers in terms of assembly accuracy and coverage. NOVOPlasty assembled all tested circular genomes in less than 30 minutes with a maximum memory requirement of 16 GB and an accuracy over 99.99%. It is the only de novo assembler that provides a fast and straightforward extraction of extranuclear genomes from whole genome data in one circular high-quality contig. The software is open source and can be downloaded from https://github.com/ndierckx/NOVOPlasty. NOVOPlasty is a seed-extend based assembler similar to string overlap algorithms. It starts with storing the sequences into a hash table, which allows quick accessibility of the reads. The assembly has to be initiated by a seed, which is iteratively extended bidirectionally. This seed sequence is not used for initiating the assembly, but to retrieve one sequence read of the targeted genome from the NGS data set. The algorithm is capable of extending one read into a complete circular genome. It also incorporates case-based adjustments to achieve higher quality assemblies by automatically detecting and resolving problematic regions caused by sequencing errors or inclusion of genomic elements. NOVOPlasty was tested on several new and public whole genome data sets, including the mitochondrial genome of Gonioctena intermedia and the chloroplast genome of Avicennia marina. It successfully assembled 17 unpublished chloroplast genomes and four public data sets. For mitochondrial genomes, it was tested on seven mitochondrial genomes from three different species. NOVOPlasty achieved the highest coverage and quality for the four public data sets and showed a 100% identity with previously known fragments. It was also able to assemble the mitochondrial genome of G. intermedia, which contains a highly repetitive section, using long PacBio reads and short Illumina reads. NOVOPlasty was compared with other assemblers, including MITOBIM, MIRA, SOAPdenovo2, CLC, and ARC. It performed best in terms of genome coverage, accuracy, and contig count. It was able to assemble the complete mitochondrial genome of G. intermedia in a single contig, while other assemblers struggled with the repetitive region. NOVOPlasty also demonstrated good performance in seed compatibility, allowing it to use sequences from more distantly related species as seeds. It was able to assemble the chloroplast genome of Arabidopsis thaliana using seed sequences from 12 different chloroplast genomes. NOVOPlasty is an open-source tool that provides a fast and efficient way to assemble mitochondrial and chloroplast genomes from whole genome data. It is particularly usefulNOVOPlasty is a de novo assembler that efficiently assembles mitochondrial and chloroplast genomes from whole genome data. It uses a seed-and-extend algorithm, starting from a related or distant single seed sequence, to assemble organelle genomes. The algorithm was tested on new and public whole genome data sets, where it outperformed existing assemblers in terms of assembly accuracy and coverage. NOVOPlasty assembled all tested circular genomes in less than 30 minutes with a maximum memory requirement of 16 GB and an accuracy over 99.99%. It is the only de novo assembler that provides a fast and straightforward extraction of extranuclear genomes from whole genome data in one circular high-quality contig. The software is open source and can be downloaded from https://github.com/ndierckx/NOVOPlasty. NOVOPlasty is a seed-extend based assembler similar to string overlap algorithms. It starts with storing the sequences into a hash table, which allows quick accessibility of the reads. The assembly has to be initiated by a seed, which is iteratively extended bidirectionally. This seed sequence is not used for initiating the assembly, but to retrieve one sequence read of the targeted genome from the NGS data set. The algorithm is capable of extending one read into a complete circular genome. It also incorporates case-based adjustments to achieve higher quality assemblies by automatically detecting and resolving problematic regions caused by sequencing errors or inclusion of genomic elements. NOVOPlasty was tested on several new and public whole genome data sets, including the mitochondrial genome of Gonioctena intermedia and the chloroplast genome of Avicennia marina. It successfully assembled 17 unpublished chloroplast genomes and four public data sets. For mitochondrial genomes, it was tested on seven mitochondrial genomes from three different species. NOVOPlasty achieved the highest coverage and quality for the four public data sets and showed a 100% identity with previously known fragments. It was also able to assemble the mitochondrial genome of G. intermedia, which contains a highly repetitive section, using long PacBio reads and short Illumina reads. NOVOPlasty was compared with other assemblers, including MITOBIM, MIRA, SOAPdenovo2, CLC, and ARC. It performed best in terms of genome coverage, accuracy, and contig count. It was able to assemble the complete mitochondrial genome of G. intermedia in a single contig, while other assemblers struggled with the repetitive region. NOVOPlasty also demonstrated good performance in seed compatibility, allowing it to use sequences from more distantly related species as seeds. It was able to assemble the chloroplast genome of Arabidopsis thaliana using seed sequences from 12 different chloroplast genomes. NOVOPlasty is an open-source tool that provides a fast and efficient way to assemble mitochondrial and chloroplast genomes from whole genome data. It is particularly useful
Reach us at info@study.space
[slides] NOVOPlasty%3A de novo assembly of organelle genomes from whole genome data | StudySpace