2012 | Ruibang Luo, Bingham Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiaoqian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam and Jun Wang
SOAPdenovo2 is an improved version of the SOAPdenovo assembler, designed to enhance memory efficiency and performance in de novo genome assembly. It addresses challenges in assembly continuity, accuracy, and coverage, particularly in repeat regions. The new algorithm reduces memory consumption during graph construction, resolves more repeat regions in contig assembly, increases scaffold coverage and length, improves gap closing, and is optimized for large genomes. Benchmarking against Assemblathon1 and GAGE datasets showed that SOAPdenovo2 outperforms its predecessor and is competitive with other assemblers in terms of assembly length and accuracy. The updated assembly of the 2008 Asian (YH) genome using SOAPdenovo2 achieved a contig N50 of ~20.9 kbp and a scaffold N50 of ~22 Mbp, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Key improvements in SOAPdenovo2 include enhanced error correction, reduced memory usage in de Bruijn graph (DBG) construction, better resolution of repeat regions, increased assembly length and coverage in scaffolding, and improved gap closure. The error correction module was redeveloped to support memory-efficient long-k-mer error correction. The DBG construction was optimized using a sparse de Bruijn graph method. A multiple k-mer strategy was introduced to leverage both the advantages of large and small k-mers. Scaffold construction was improved by detecting heterozygous contig pairs and rectifying chimeric scaffolds. The GapCloser module was enhanced to better resolve gaps using information from previous cycles.
SOAPdenovo2 was tested on the Assemblathon1 benchmark dataset and showed better performance than SOAPdenovo1 and SOAPdenovo v1.05. It produced longer contig and scaffold N50 values and had higher accuracy. The updated YH genome assembly showed improved coverage and reduced copy number errors. The new version also demonstrated better performance in assembling the GAGE dataset. The work highlights the improvements in SOAPdenovo2, making it a more effective tool for de novo genome assembly, especially for eukaryotic genomes. The software is available under the GNU General Public License version 3.0 and is compatible with Unix, Linux, and Mac operating systems.SOAPdenovo2 is an improved version of the SOAPdenovo assembler, designed to enhance memory efficiency and performance in de novo genome assembly. It addresses challenges in assembly continuity, accuracy, and coverage, particularly in repeat regions. The new algorithm reduces memory consumption during graph construction, resolves more repeat regions in contig assembly, increases scaffold coverage and length, improves gap closing, and is optimized for large genomes. Benchmarking against Assemblathon1 and GAGE datasets showed that SOAPdenovo2 outperforms its predecessor and is competitive with other assemblers in terms of assembly length and accuracy. The updated assembly of the 2008 Asian (YH) genome using SOAPdenovo2 achieved a contig N50 of ~20.9 kbp and a scaffold N50 of ~22 Mbp, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Key improvements in SOAPdenovo2 include enhanced error correction, reduced memory usage in de Bruijn graph (DBG) construction, better resolution of repeat regions, increased assembly length and coverage in scaffolding, and improved gap closure. The error correction module was redeveloped to support memory-efficient long-k-mer error correction. The DBG construction was optimized using a sparse de Bruijn graph method. A multiple k-mer strategy was introduced to leverage both the advantages of large and small k-mers. Scaffold construction was improved by detecting heterozygous contig pairs and rectifying chimeric scaffolds. The GapCloser module was enhanced to better resolve gaps using information from previous cycles.
SOAPdenovo2 was tested on the Assemblathon1 benchmark dataset and showed better performance than SOAPdenovo1 and SOAPdenovo v1.05. It produced longer contig and scaffold N50 values and had higher accuracy. The updated YH genome assembly showed improved coverage and reduced copy number errors. The new version also demonstrated better performance in assembling the GAGE dataset. The work highlights the improvements in SOAPdenovo2, making it a more effective tool for de novo genome assembly, especially for eukaryotic genomes. The software is available under the GNU General Public License version 3.0 and is compatible with Unix, Linux, and Mac operating systems.