Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

2014 | Rei Kajitani, Kouta Toshimoto, Hideki Noguchi, Atsushi Toyoda, Yoshitoshi Ogura, Miki Okuno, Mitsuru Yabana, Masayuki Harada, Eiji Nagayasu, Haruhiko Maruyama, Yuji Kohara, Asao Fujiyama, Tetsuya Hayashi, and Takehiko Itoh
Platanus is a novel de novo assembler that efficiently handles highly heterozygous genomes from whole-genome shotgun short reads. It constructs de Bruijn graphs with automatically optimized k-mer sizes and scaffolds contigs using paired-end information. The algorithm simplifies complex graph structures resulting from heterozygosity during both contig assembly and scaffolding steps. Platanus outperforms other assemblers in scaffold NG50 length without compromising accuracy, achieving the highest scaffold NG50 values for two of three low-heterozygosity species in the Assemblathon 2 contest. It effectively resolves heterozygous regions containing structural variations, repeats, and low-coverage sites by merging haplotypes during both contig assembly and scaffolding. Platanus is efficient and suitable for assembling gigabase-sized highly heterozygous genomes, offering an alternative to existing assemblers designed for lower heterozygosity genomes. Platanus consists of three subprograms: Contig-assembly, Scaffolding, and Gap-close. Contig-assembly constructs de Bruijn graphs, removes short branches and bubbles, and resolves repeats and heterozygous regions. Scaffolding determines contig order using paired-end information and removes bubbles and branches. Gap-close maps reads to scaffolds and closes gaps using collected reads. Platanus uses improved algorithms to handle large and repetitive genomes, and its performance is validated using simulated and real data. In benchmark tests, Platanus outperformed other assemblers in scaffold NG50 values and error rates, particularly for highly heterozygous data. It produced the highest scaffold NG50 values for S. venezuelensis and oyster genomes, and showed superior performance in resolving complex heterozygous regions. Platanus also demonstrated high accuracy in gene annotation for highly heterozygous genomes, with high coverage and identity in RNA-seq data alignment. It is efficient and does not require manual parameter optimization, making it suitable for large-scale genome assembly projects. Platanus is effective for assembling highly heterozygous genomes, including those with complex variations and repetitive sequences. It outperforms other assemblers in scaffold NG50 values and error rates, and is suitable for large and highly heterozygous genomes. Platanus is a versatile and efficient tool for de novo genome assembly, particularly for non-model and wild-type samples. It is suitable for a wide range of organisms, including those with low heterozygosity, and can be used in genome sequencing projects without the need for inbreeding. Platanus is a promising alternative to existing assemblers for highly heterozygous genomes.Platanus is a novel de novo assembler that efficiently handles highly heterozygous genomes from whole-genome shotgun short reads. It constructs de Bruijn graphs with automatically optimized k-mer sizes and scaffolds contigs using paired-end information. The algorithm simplifies complex graph structures resulting from heterozygosity during both contig assembly and scaffolding steps. Platanus outperforms other assemblers in scaffold NG50 length without compromising accuracy, achieving the highest scaffold NG50 values for two of three low-heterozygosity species in the Assemblathon 2 contest. It effectively resolves heterozygous regions containing structural variations, repeats, and low-coverage sites by merging haplotypes during both contig assembly and scaffolding. Platanus is efficient and suitable for assembling gigabase-sized highly heterozygous genomes, offering an alternative to existing assemblers designed for lower heterozygosity genomes. Platanus consists of three subprograms: Contig-assembly, Scaffolding, and Gap-close. Contig-assembly constructs de Bruijn graphs, removes short branches and bubbles, and resolves repeats and heterozygous regions. Scaffolding determines contig order using paired-end information and removes bubbles and branches. Gap-close maps reads to scaffolds and closes gaps using collected reads. Platanus uses improved algorithms to handle large and repetitive genomes, and its performance is validated using simulated and real data. In benchmark tests, Platanus outperformed other assemblers in scaffold NG50 values and error rates, particularly for highly heterozygous data. It produced the highest scaffold NG50 values for S. venezuelensis and oyster genomes, and showed superior performance in resolving complex heterozygous regions. Platanus also demonstrated high accuracy in gene annotation for highly heterozygous genomes, with high coverage and identity in RNA-seq data alignment. It is efficient and does not require manual parameter optimization, making it suitable for large-scale genome assembly projects. Platanus is effective for assembling highly heterozygous genomes, including those with complex variations and repetitive sequences. It outperforms other assemblers in scaffold NG50 values and error rates, and is suitable for large and highly heterozygous genomes. Platanus is a versatile and efficient tool for de novo genome assembly, particularly for non-model and wild-type samples. It is suitable for a wide range of organisms, including those with low heterozygosity, and can be used in genome sequencing projects without the need for inbreeding. Platanus is a promising alternative to existing assemblers for highly heterozygous genomes.
Reach us at info@study.space