Minimap2: pairwise alignment for nucleotide sequences

Minimap2: pairwise alignment for nucleotide sequences

16 Mar 2018 | Heng Li
Minimap2 is a versatile alignment tool for nucleotide sequences, capable of handling short reads, long genomic and RNA-seq reads, and assembly contigs. It is designed to align DNA or long mRNA sequences against a large reference database. Minimap2 is significantly faster than existing alignment tools, especially for long reads, and offers high accuracy. It supports split-read alignment, concave gap cost for long insertions and deletions, and new heuristics to reduce spurious alignments. Minimap2 is 3–4 times faster than mainstream short-read mappers and up to 30 times faster than long-read mappers at higher accuracy. Minimap2 uses a seed-chain-align approach, collecting minimizers of reference sequences and indexing them in a hash table. It then finds exact matches (anchors) and identifies colinear anchors as chains. For base-level alignment, dynamic programming is used to extend from the ends of chains and close regions between adjacent anchors. Minimap2 also supports spliced alignment, distinguishing insertions and deletions from the reference. The chaining algorithm in Minimap2 uses dynamic programming to calculate the maximum chaining score, with a heuristic to accelerate the process. It also includes backtracking to identify primary chains and estimate sequence divergence. Minimap2 uses a 2-piece affine gap cost for alignment, which helps recover longer insertions and deletions. It also includes a Z-drop heuristic to avoid misalignments and a filtering step to remove misplaced anchors. Minimap2 is effective for aligning long genomic reads, spliced reads, and short genomic reads. It outperforms other aligners in terms of speed and accuracy, especially for long reads and spliced sequences. It is also efficient for aligning long-read assemblies and has been shown to have higher precision in variant calling compared to other tools. Minimap2 is available as open-source software and is widely used for various sequencing applications.Minimap2 is a versatile alignment tool for nucleotide sequences, capable of handling short reads, long genomic and RNA-seq reads, and assembly contigs. It is designed to align DNA or long mRNA sequences against a large reference database. Minimap2 is significantly faster than existing alignment tools, especially for long reads, and offers high accuracy. It supports split-read alignment, concave gap cost for long insertions and deletions, and new heuristics to reduce spurious alignments. Minimap2 is 3–4 times faster than mainstream short-read mappers and up to 30 times faster than long-read mappers at higher accuracy. Minimap2 uses a seed-chain-align approach, collecting minimizers of reference sequences and indexing them in a hash table. It then finds exact matches (anchors) and identifies colinear anchors as chains. For base-level alignment, dynamic programming is used to extend from the ends of chains and close regions between adjacent anchors. Minimap2 also supports spliced alignment, distinguishing insertions and deletions from the reference. The chaining algorithm in Minimap2 uses dynamic programming to calculate the maximum chaining score, with a heuristic to accelerate the process. It also includes backtracking to identify primary chains and estimate sequence divergence. Minimap2 uses a 2-piece affine gap cost for alignment, which helps recover longer insertions and deletions. It also includes a Z-drop heuristic to avoid misalignments and a filtering step to remove misplaced anchors. Minimap2 is effective for aligning long genomic reads, spliced reads, and short genomic reads. It outperforms other aligners in terms of speed and accuracy, especially for long reads and spliced sequences. It is also efficient for aligning long-read assemblies and has been shown to have higher precision in variant calling compared to other tools. Minimap2 is available as open-source software and is widely used for various sequencing applications.
Reach us at info@study.space