STAR: ultrafast universal RNA-seq aligner

STAR: ultrafast universal RNA-seq aligner

October 25, 2012 | Alexander Dobin, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson and Thomas R. Gingeras
STAR is an ultrafast RNA-seq aligner designed to address the challenges of aligning high-throughput RNA-seq data. It uses a novel algorithm based on sequential maximum map-pable seed search in uncompressed suffix arrays, followed by seed clustering and stitching. STAR outperforms other aligners by a factor of over 50 in mapping speed, aligning 550 million 2×76 bp paired-end reads per hour on a 12-core server. It also improves alignment sensitivity and precision, and can detect both canonical and non-canonical splice junctions, as well as chimeric transcripts. Experimental validation using Roche 454 sequencing confirmed STAR's high precision in detecting novel splice junctions. STAR is implemented as standalone C++ code and is free open-source software under the GPLv3 license. It is capable of aligning long reads from third-generation sequencing technologies, which have longer lengths and higher error rates. STAR's algorithm includes a seed search phase to find maximal mappable prefixes (MMPs) and a clustering/stitching/scoring phase to align the entire read sequence. The seed search uses uncompressed suffix arrays for efficient and fast searching, while the clustering and stitching process allows for accurate alignment of reads with mismatches, insertions, and deletions. STAR's performance was evaluated on both simulated and experimental RNA-seq data. On simulated data, STAR showed high sensitivity and low false-positive rates, outperforming other aligners. On experimental data from the ENCODE project, STAR aligned the largest percentage of reads and demonstrated high precision in detecting splice junctions. STAR's speed and accuracy make it suitable for large-scale sequencing projects, and its ability to handle long reads makes it particularly useful for emerging sequencing technologies. STAR's performance is optimized for mammalian genomes, but may require adjustments for other species. It is also compatible with various sequencing platforms and can be used in continuous streaming mode. The algorithm's extensibility to long reads suggests its potential as a universal alignment tool across different sequencing platforms. STAR's ability to align long reads and its high performance make it a valuable tool for RNA-seq analysis.STAR is an ultrafast RNA-seq aligner designed to address the challenges of aligning high-throughput RNA-seq data. It uses a novel algorithm based on sequential maximum map-pable seed search in uncompressed suffix arrays, followed by seed clustering and stitching. STAR outperforms other aligners by a factor of over 50 in mapping speed, aligning 550 million 2×76 bp paired-end reads per hour on a 12-core server. It also improves alignment sensitivity and precision, and can detect both canonical and non-canonical splice junctions, as well as chimeric transcripts. Experimental validation using Roche 454 sequencing confirmed STAR's high precision in detecting novel splice junctions. STAR is implemented as standalone C++ code and is free open-source software under the GPLv3 license. It is capable of aligning long reads from third-generation sequencing technologies, which have longer lengths and higher error rates. STAR's algorithm includes a seed search phase to find maximal mappable prefixes (MMPs) and a clustering/stitching/scoring phase to align the entire read sequence. The seed search uses uncompressed suffix arrays for efficient and fast searching, while the clustering and stitching process allows for accurate alignment of reads with mismatches, insertions, and deletions. STAR's performance was evaluated on both simulated and experimental RNA-seq data. On simulated data, STAR showed high sensitivity and low false-positive rates, outperforming other aligners. On experimental data from the ENCODE project, STAR aligned the largest percentage of reads and demonstrated high precision in detecting splice junctions. STAR's speed and accuracy make it suitable for large-scale sequencing projects, and its ability to handle long reads makes it particularly useful for emerging sequencing technologies. STAR's performance is optimized for mammalian genomes, but may require adjustments for other species. It is also compatible with various sequencing platforms and can be used in continuous streaming mode. The algorithm's extensibility to long reads suggests its potential as a universal alignment tool across different sequencing platforms. STAR's ability to align long reads and its high performance make it a valuable tool for RNA-seq analysis.
Reach us at info@study.space
Understanding STAR%3A ultrafast universal RNA-seq aligner