August 14, 2001 | Pavel A. Pevzner*, Haixu Tang†, and Michael S. Waterman‡§
The paper introduces an Eulerian path approach to DNA fragment assembly, a novel method that overcomes the limitations of the traditional "overlap–layout–consensus" paradigm. This approach resolves the "repeat problem" by transforming the fragment assembly problem into a variation of the Eulerian path problem, allowing for accurate solutions of large-scale sequencing problems. The authors propose an Eulerian Superpath algorithm that can handle complex genomes with repetitive sequences, significantly reducing the number of contigs and improving assembly accuracy. The method also includes an error correction step that reduces sequencing errors, further enhancing the quality of the assembled genome. The Eulerian Superpath approach is demonstrated to be effective in assembling bacterial genomes, with minimal errors and high accuracy. The paper highlights the advantages of this method over existing tools, particularly in handling repetitive regions and improving the finishing process in large-scale DNA sequencing projects.The paper introduces an Eulerian path approach to DNA fragment assembly, a novel method that overcomes the limitations of the traditional "overlap–layout–consensus" paradigm. This approach resolves the "repeat problem" by transforming the fragment assembly problem into a variation of the Eulerian path problem, allowing for accurate solutions of large-scale sequencing problems. The authors propose an Eulerian Superpath algorithm that can handle complex genomes with repetitive sequences, significantly reducing the number of contigs and improving assembly accuracy. The method also includes an error correction step that reduces sequencing errors, further enhancing the quality of the assembled genome. The Eulerian Superpath approach is demonstrated to be effective in assembling bacterial genomes, with minimal errors and high accuracy. The paper highlights the advantages of this method over existing tools, particularly in handling repetitive regions and improving the finishing process in large-scale DNA sequencing projects.