Fast and accurate long-read alignment with Burrows-Wheeler transform

Fast and accurate long-read alignment with Burrows-Wheeler transform

January 15, 2010 | Heng Li and Richard Durbin
BWA-SW is a fast and accurate algorithm for aligning long sequences up to 1 Mb against a large sequence database, such as the human genome, using FM-indexes. It is as accurate as SSAHA2 and more accurate than BLAT, and significantly faster than both. The algorithm uses a seed-and-extend approach, combining dynamic programming with heuristics to reduce computational time. It builds FM-indexes for both the reference and query sequences, and uses a prefix trie and DAWG to efficiently align sequences. BWA-SW applies two heuristic rules to accelerate the alignment process: pruning low-scoring matches and reporting only non-overlapping alignments. It also uses a Z-best strategy to select top-scoring matches and reverse–reverse alignment to improve accuracy. BWA-SW is efficient in memory usage and supports multi-threading. Evaluation on simulated and real data shows that BWA-SW is faster and more accurate than BLAT and SSAHA2, especially for long reads. It can detect chimeric reads and is more accurate for longer reads. BWA-SW is designed for practical use with large-scale real data and provides a balance between speed and accuracy. It is available at http://bio-bwa.sourceforge.net.BWA-SW is a fast and accurate algorithm for aligning long sequences up to 1 Mb against a large sequence database, such as the human genome, using FM-indexes. It is as accurate as SSAHA2 and more accurate than BLAT, and significantly faster than both. The algorithm uses a seed-and-extend approach, combining dynamic programming with heuristics to reduce computational time. It builds FM-indexes for both the reference and query sequences, and uses a prefix trie and DAWG to efficiently align sequences. BWA-SW applies two heuristic rules to accelerate the alignment process: pruning low-scoring matches and reporting only non-overlapping alignments. It also uses a Z-best strategy to select top-scoring matches and reverse–reverse alignment to improve accuracy. BWA-SW is efficient in memory usage and supports multi-threading. Evaluation on simulated and real data shows that BWA-SW is faster and more accurate than BLAT and SSAHA2, especially for long reads. It can detect chimeric reads and is more accurate for longer reads. BWA-SW is designed for practical use with large-scale real data and provides a balance between speed and accuracy. It is available at http://bio-bwa.sourceforge.net.
Reach us at info@study.space