An Eulerian path approach to DNA fragment assembly

An Eulerian path approach to DNA fragment assembly

August 14, 2001 | Pavel A. Pevzner*, Haixu Tang†, and Michael S. Waterman‡§
This paper introduces a new algorithm, EULER, for DNA fragment assembly that replaces the traditional "overlap–layout–consensus" approach. The EULER algorithm transforms the fragment assembly problem into a variation of the Eulerian path problem, allowing for accurate solutions of large-scale sequencing problems. Unlike the CELERA assembler, which masks repeats, EULER uses them as a powerful tool for fragment assembly. The algorithm works by breaking down the puzzle into smaller pieces and then solving the problem using Eulerian path algorithms, which are efficient and can be solved in linear time. The paper discusses the challenges of the traditional approach, particularly the "repeat problem," which is difficult to solve with the overlap–layout–consensus paradigm. The EULER algorithm addresses this by reducing the fragment assembly problem to an Eulerian path problem in a de Bruijn graph. This approach allows for the resolution of repeats and the generation of accurate assemblies. The paper also describes the error correction process used in EULER, which involves approximating the set of all l-tuples in the genome to correct sequencing errors. This process significantly reduces the number of errors in the data, leading to more accurate assemblies. The EULER algorithm is shown to produce error-free solutions for large-scale assembly projects. The paper also discusses the use of clone-end data (DB data) in the EULER-DB algorithm, which helps to untangle the de Bruijn graph and resolve tangles. EULER-DB is shown to be effective in reducing the number of contigs and improving the accuracy of assemblies. The paper concludes that the EULER algorithm provides a more efficient and accurate approach to DNA fragment assembly compared to traditional methods. It is particularly effective in resolving repeats and generating accurate assemblies, making it a valuable tool for large-scale sequencing projects.This paper introduces a new algorithm, EULER, for DNA fragment assembly that replaces the traditional "overlap–layout–consensus" approach. The EULER algorithm transforms the fragment assembly problem into a variation of the Eulerian path problem, allowing for accurate solutions of large-scale sequencing problems. Unlike the CELERA assembler, which masks repeats, EULER uses them as a powerful tool for fragment assembly. The algorithm works by breaking down the puzzle into smaller pieces and then solving the problem using Eulerian path algorithms, which are efficient and can be solved in linear time. The paper discusses the challenges of the traditional approach, particularly the "repeat problem," which is difficult to solve with the overlap–layout–consensus paradigm. The EULER algorithm addresses this by reducing the fragment assembly problem to an Eulerian path problem in a de Bruijn graph. This approach allows for the resolution of repeats and the generation of accurate assemblies. The paper also describes the error correction process used in EULER, which involves approximating the set of all l-tuples in the genome to correct sequencing errors. This process significantly reduces the number of errors in the data, leading to more accurate assemblies. The EULER algorithm is shown to produce error-free solutions for large-scale assembly projects. The paper also discusses the use of clone-end data (DB data) in the EULER-DB algorithm, which helps to untangle the de Bruijn graph and resolve tangles. EULER-DB is shown to be effective in reducing the number of contigs and improving the accuracy of assemblies. The paper concludes that the EULER algorithm provides a more efficient and accurate approach to DNA fragment assembly compared to traditional methods. It is particularly effective in resolving repeats and generating accurate assemblies, making it a valuable tool for large-scale sequencing projects.
Reach us at info@study.space