SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

Volume 19, Number 5, 2012 | ANTON BANKEVICH,1,2 SERGEY NURK,1,2 DMITRY ANTIPOV,1 ALEXEY A. GUREVICH,1 MIKHAIL DVORKIN,1 ALEXANDER S. KULIKOV,1,3 VALERY M. LESIN,1 SERGEY I. NIKOLENKO,1,3 SON PHAM,4 ANDREY D. PRIJBELSKI,1 ALEXEY V. PYSHKIN,1 ALEXANDER V. SIROTKIN,1 NIKOLAY VYAHHI,1 GLENN TESLER,5 MAX A. ALEKSEYEV,1,6 and PAVEL A. PEVZNER1,4
The paper introduces SPAdes, a new genome assembly algorithm designed to address the challenges of single-cell sequencing (SCS) and standard multicell sequencing. SCS faces significant difficulties due to highly non-uniform read coverage, elevated sequencing errors, and chimeric reads. SPAdes improves upon existing assemblers like Velvet and SoapDeNovo, as well as specialized single-cell assemblers like E+V-SC. The key contributions of SPAdes include: 1. **Multisized De Bruijn Graphs**: SPAdes uses multisized de Bruijn graphs to handle non-uniform coverage more effectively than standard de Bruijn graphs. 2. **k-bimer Adjustment**: It introduces k-bimer adjustment to derive accurate distance estimates between k-mers, improving the accuracy of read-pair information. 3. **Paired Assembly Graphs**: Inspired by Paired de Bruijn Graphs (PDBGs), SPAdes constructs paired assembly graphs to better handle variable insert sizes and chimeric reads. 4. **Error Correction**: SPAdes incorporates error correction tools to improve the quality of the final assemblies. The paper outlines the four stages of SPAdes: 1. **Assembly Graph Construction**: Simplifies the de Bruijn graph using multisized graphs, removes bulges, tips, and chimeric reads, and aggregates read-pair information. 2. **k-bimer Adjustment**: Derives accurate distance estimates between k-mers using bireads. 3. **Paired Assembly Graph Construction**: Constructs a paired assembly graph to handle variable insert sizes and chimeric reads. 4. **Contig Construction**: Generates high-quality contigs by backtracking graph simplifications. SPAdes is available online and is distributed as open-source software. The paper also includes benchmarking results showing that SPAdes outperforms other assemblers in both single-cell and multicell datasets.The paper introduces SPAdes, a new genome assembly algorithm designed to address the challenges of single-cell sequencing (SCS) and standard multicell sequencing. SCS faces significant difficulties due to highly non-uniform read coverage, elevated sequencing errors, and chimeric reads. SPAdes improves upon existing assemblers like Velvet and SoapDeNovo, as well as specialized single-cell assemblers like E+V-SC. The key contributions of SPAdes include: 1. **Multisized De Bruijn Graphs**: SPAdes uses multisized de Bruijn graphs to handle non-uniform coverage more effectively than standard de Bruijn graphs. 2. **k-bimer Adjustment**: It introduces k-bimer adjustment to derive accurate distance estimates between k-mers, improving the accuracy of read-pair information. 3. **Paired Assembly Graphs**: Inspired by Paired de Bruijn Graphs (PDBGs), SPAdes constructs paired assembly graphs to better handle variable insert sizes and chimeric reads. 4. **Error Correction**: SPAdes incorporates error correction tools to improve the quality of the final assemblies. The paper outlines the four stages of SPAdes: 1. **Assembly Graph Construction**: Simplifies the de Bruijn graph using multisized graphs, removes bulges, tips, and chimeric reads, and aggregates read-pair information. 2. **k-bimer Adjustment**: Derives accurate distance estimates between k-mers using bireads. 3. **Paired Assembly Graph Construction**: Constructs a paired assembly graph to handle variable insert sizes and chimeric reads. 4. **Contig Construction**: Generates high-quality contigs by backtracking graph simplifications. SPAdes is available online and is distributed as open-source software. The paper also includes benchmarking results showing that SPAdes outperforms other assemblers in both single-cell and multicell datasets.
Reach us at info@study.space
[slides and audio] SPAdes%3A A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing