The MaSuRCA genome assembler

The MaSuRCA genome assembler

Advance Access publication August 29, 2013 | Aleksey V. Zimin1,*; Guillaume Marçais1, Daniela Puiu2, Michael Roberts1, Steven L. Salzberg2 and James A. Yorke1,3,4
The article introduces a new genome assembler called MaSuRCA, which combines the computational efficiency of de Bruijn graph methods with the flexibility of overlap-based assembly strategies. MaSuRCA transforms a large number of paired-end reads into a smaller number of longer 'super-reads,' allowing for the assembly of Illumina reads of varying lengths with longer reads from 454 and Sanger sequencing technologies. The method is evaluated against two widely used assemblers, Allpaths-LG and SOAPdenovo2, on datasets from the bacterium *Rhodobacter sphaeroides* and chromosome 16 of the mouse genome. MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo2, especially when augmented with long reads. The article also discusses the theoretical foundation of the super-reads approach and provides a detailed description of the MaSuRCA assembler, including its key modules such as error correction, k-unit creation, super-read generation, and gap filling.The article introduces a new genome assembler called MaSuRCA, which combines the computational efficiency of de Bruijn graph methods with the flexibility of overlap-based assembly strategies. MaSuRCA transforms a large number of paired-end reads into a smaller number of longer 'super-reads,' allowing for the assembly of Illumina reads of varying lengths with longer reads from 454 and Sanger sequencing technologies. The method is evaluated against two widely used assemblers, Allpaths-LG and SOAPdenovo2, on datasets from the bacterium *Rhodobacter sphaeroides* and chromosome 16 of the mouse genome. MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo2, especially when augmented with long reads. The article also discusses the theoretical foundation of the super-reads approach and provides a detailed description of the MaSuRCA assembler, including its key modules such as error correction, k-unit creation, super-read generation, and gap filling.
Reach us at info@study.space
[slides and audio] The MaSuRCA genome assembler