[slides and audio] Fast and accurate long-read assembly with wtdbg2

Wtdbg2 is a fast and accurate long-read assembler that outperforms existing tools in speed while maintaining comparable contiguity and accuracy. It is 2–17 times faster than other assemblers and is suitable for population-scale long-read assembly. The algorithm uses a fuzzy-Bruijn graph (FBG) to assemble long reads, which is more efficient than traditional de Bruijn graphs. Wtdbg2 processes all reads into memory, counts k-mers, and builds a hash table for efficient alignment. It then performs all-vs-all read alignment and constructs the FBG, which allows for efficient assembly of long reads. The algorithm is optimized for memory usage and can handle large genomes with relatively low memory requirements. Wtdbg2 was evaluated on multiple datasets, including human, plant, and bacterial genomes, and showed improved performance in terms of speed and contiguity. It was compared with other assemblers such as CANU, Flye, and MECAT, and was found to be faster and more efficient in assembling large genomes. Wtdbg2 also demonstrated better performance in terms of contiguity and accuracy, especially for genomes with high heterozygosity. The algorithm is scalable and can handle large non-human genomes. It is also suitable for population-scale long-read assembly due to its efficiency and accuracy. Wtdbg2 is available as open-source software and is hosted on GitHub. The algorithm has been tested on various datasets and has shown promising results in terms of performance and accuracy. The study highlights the importance of efficient long-read assembly in genomics and the potential of Wtdbg2 to improve the analysis of sequence data.Wtdbg2 is a fast and accurate long-read assembler that outperforms existing tools in speed while maintaining comparable contiguity and accuracy. It is 2–17 times faster than other assemblers and is suitable for population-scale long-read assembly. The algorithm uses a fuzzy-Bruijn graph (FBG) to assemble long reads, which is more efficient than traditional de Bruijn graphs. Wtdbg2 processes all reads into memory, counts k-mers, and builds a hash table for efficient alignment. It then performs all-vs-all read alignment and constructs the FBG, which allows for efficient assembly of long reads. The algorithm is optimized for memory usage and can handle large genomes with relatively low memory requirements. Wtdbg2 was evaluated on multiple datasets, including human, plant, and bacterial genomes, and showed improved performance in terms of speed and contiguity. It was compared with other assemblers such as CANU, Flye, and MECAT, and was found to be faster and more efficient in assembling large genomes. Wtdbg2 also demonstrated better performance in terms of contiguity and accuracy, especially for genomes with high heterozygosity. The algorithm is scalable and can handle large non-human genomes. It is also suitable for population-scale long-read assembly due to its efficiency and accuracy. Wtdbg2 is available as open-source software and is hosted on GitHub. The algorithm has been tested on various datasets and has shown promising results in terms of performance and accuracy. The study highlights the importance of efficient long-read assembly in genomics and the potential of Wtdbg2 to improve the analysis of sequence data.

Fast and accurate long-read assembly with wtdbg2

2020 February | Jue Ruan, Heng Li