[slides] HISAT%3A a fast spliced aligner with low memory requirements

HISAT is a fast and memory-efficient spliced aligner for RNA sequencing data. It uses a hierarchical indexing strategy based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index. HISAT employs a global FM index for genome-wide alignment and numerous local FM indexes for rapid extension of alignments. The human genome is indexed with 48,000 local FM indexes, each covering ~64,000 bp, allowing efficient alignment of reads spanning introns. HISAT requires only 4.3 GB of memory, making it suitable for use on conventional desktop computers. RNA-seq data analysis requires fast and scalable alignment tools due to the large volume of data generated. HISAT outperforms existing aligners in speed and memory usage, with tests showing it is the fastest available system. It also provides high accuracy, with sensitivity and precision comparable to or better than other methods. HISAT supports any genome size, including those larger than 4 billion bases. HISAT uses a two-pass approach for improved accuracy, with the first pass identifying splice sites and the second pass aligning reads with short anchors. A hybrid version of HISAT combines the efficiency of the one-pass approach with the accuracy of the two-pass method, achieving high sensitivity with minimal runtime. HISAT also includes an option to use known splice sites from gene annotations. Compared to other aligners like STAR, GSNAP, and TopHat2, HISAT is significantly faster, with runtimes 49% faster than STAR, 8 times faster than GSNAP, and 62 times faster than TopHat2. HISAT also has lower memory requirements, using 3.7–4.3 GB of RAM, compared to 28 GB for STAR and 20.2 GB for GSNAP. HISAT's hierarchical indexing allows it to efficiently align reads spanning introns, with high sensitivity for reads with short anchors. It correctly aligns over 95% of reads with intermediate-length anchors and over 92% of reads with the shortest anchors. HISAT also performs well on real data, aligning the greatest number of reads and finding the highest number of spliced alignments. HISAT is open-source and available at http://www.ccb.jhu.edu/software/hisat/. It is designed to be used as the core alignment engine for the next major version of TopHat, TopHat3. HISAT's hierarchical indexing strategy is efficient and can be adapted by other methods if their data structures are redesigned.HISAT is a fast and memory-efficient spliced aligner for RNA sequencing data. It uses a hierarchical indexing strategy based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index. HISAT employs a global FM index for genome-wide alignment and numerous local FM indexes for rapid extension of alignments. The human genome is indexed with 48,000 local FM indexes, each covering ~64,000 bp, allowing efficient alignment of reads spanning introns. HISAT requires only 4.3 GB of memory, making it suitable for use on conventional desktop computers. RNA-seq data analysis requires fast and scalable alignment tools due to the large volume of data generated. HISAT outperforms existing aligners in speed and memory usage, with tests showing it is the fastest available system. It also provides high accuracy, with sensitivity and precision comparable to or better than other methods. HISAT supports any genome size, including those larger than 4 billion bases. HISAT uses a two-pass approach for improved accuracy, with the first pass identifying splice sites and the second pass aligning reads with short anchors. A hybrid version of HISAT combines the efficiency of the one-pass approach with the accuracy of the two-pass method, achieving high sensitivity with minimal runtime. HISAT also includes an option to use known splice sites from gene annotations. Compared to other aligners like STAR, GSNAP, and TopHat2, HISAT is significantly faster, with runtimes 49% faster than STAR, 8 times faster than GSNAP, and 62 times faster than TopHat2. HISAT also has lower memory requirements, using 3.7–4.3 GB of RAM, compared to 28 GB for STAR and 20.2 GB for GSNAP. HISAT's hierarchical indexing allows it to efficiently align reads spanning introns, with high sensitivity for reads with short anchors. It correctly aligns over 95% of reads with intermediate-length anchors and over 92% of reads with the shortest anchors. HISAT also performs well on real data, aligning the greatest number of reads and finding the highest number of spliced alignments. HISAT is open-source and available at http://www.ccb.jhu.edu/software/hisat/. It is designed to be used as the core alignment engine for the next major version of TopHat, TopHat3. HISAT's hierarchical indexing strategy is efficient and can be adapted by other methods if their data structures are redesigned.

HISAT: a fast spliced aligner with low memory requirements

2015 April | Daehwan Kim, Ben Langmead, and Steven L Salzberg