2010 June ; 95(6): 315–327 | Jason R. Miller, Sergey Koren, and Granger Sutton
The article reviews and compares various assembly algorithms and software packages designed for de novo whole-genome shotgun assembly from next-generation sequencing (NGS) data. NGS platforms, such as Roche 454, Illumina/Solexa, and ABI SOLiD, produce shorter read lengths, higher coverage, and different error profiles compared to Sanger sequencing. The review focuses on packages like SSAKE, SHARCGS, VCAGE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo, and compares two standard methods: the de Bruijn graph approach and the overlap/layout/consensus (OLC) approach.
The OLC approach, similar to Sanger-era assemblers, involves overlap discovery, construction of an overlap graph, and multiple sequence alignment to determine the consensus sequence. The de Bruijn graph approach, on the other hand, uses K-mer graphs to represent sequences and their overlaps, which are more sensitive to repeats and sequencing errors. The article discusses the challenges of assembly, including repeat resolution, sequencing error, and non-uniform coverage, and highlights the complexity of graph-based algorithms.
Key algorithms and software are described in detail, including SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Edena, Shorty, Euler, Velvet, and ABySS. These tools employ various techniques to handle sequencing errors, resolve repeats, and exploit paired-end reads. The article also discusses the scalability and memory requirements of these tools, particularly for large genomes.
Overall, the review provides a comprehensive overview of the current state of NGS assembly algorithms and software, highlighting their strengths and limitations in handling the unique characteristics of NGS data.The article reviews and compares various assembly algorithms and software packages designed for de novo whole-genome shotgun assembly from next-generation sequencing (NGS) data. NGS platforms, such as Roche 454, Illumina/Solexa, and ABI SOLiD, produce shorter read lengths, higher coverage, and different error profiles compared to Sanger sequencing. The review focuses on packages like SSAKE, SHARCGS, VCAGE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo, and compares two standard methods: the de Bruijn graph approach and the overlap/layout/consensus (OLC) approach.
The OLC approach, similar to Sanger-era assemblers, involves overlap discovery, construction of an overlap graph, and multiple sequence alignment to determine the consensus sequence. The de Bruijn graph approach, on the other hand, uses K-mer graphs to represent sequences and their overlaps, which are more sensitive to repeats and sequencing errors. The article discusses the challenges of assembly, including repeat resolution, sequencing error, and non-uniform coverage, and highlights the complexity of graph-based algorithms.
Key algorithms and software are described in detail, including SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Edena, Shorty, Euler, Velvet, and ABySS. These tools employ various techniques to handle sequencing errors, resolve repeats, and exploit paired-end reads. The article also discusses the scalability and memory requirements of these tools, particularly for large genomes.
Overall, the review provides a comprehensive overview of the current state of NGS assembly algorithms and software, highlighting their strengths and limitations in handling the unique characteristics of NGS data.