Understanding Salmon%3A fast and bias-aware quantification of transcript expression using dual-phase inference

Salmon is a novel method for quantifying transcript abundance from RNA-seq reads, designed to be both accurate and fast. It is the first transcriptome-wide quantifier to correct for fragment GC content bias, which significantly improves the accuracy of abundance estimates and the reliability of differential expression analysis. Salmon combines a dual-phase parallel inference algorithm with feature-rich bias models and an ultra-fast read mapping procedure. The method consists of three components: a lightweight-mapping model, an online phase that estimates initial expression levels and model parameters, and an offline phase that refines expression estimates. This two-phase inference procedure allows Salmon to build a probabilistic model of the sequencing experiment, incorporating information such as fragment-transcript agreement and sequence-specific biases. Salmon outperforms existing methods like kallisto and eXpress in terms of accuracy and speed, while also providing sample-specific bias models that account for sequence-specific, fragment-GC, and positional biases. Salmon's ability to compute high-quality estimates of transcript abundances at the scale of thousands of samples, while accounting for prevalent technical biases, will enable more comprehensive comparisons of experimental data across large populations and different conditions.Salmon is a novel method for quantifying transcript abundance from RNA-seq reads, designed to be both accurate and fast. It is the first transcriptome-wide quantifier to correct for fragment GC content bias, which significantly improves the accuracy of abundance estimates and the reliability of differential expression analysis. Salmon combines a dual-phase parallel inference algorithm with feature-rich bias models and an ultra-fast read mapping procedure. The method consists of three components: a lightweight-mapping model, an online phase that estimates initial expression levels and model parameters, and an offline phase that refines expression estimates. This two-phase inference procedure allows Salmon to build a probabilistic model of the sequencing experiment, incorporating information such as fragment-transcript agreement and sequence-specific biases. Salmon outperforms existing methods like kallisto and eXpress in terms of accuracy and speed, while also providing sample-specific bias models that account for sequence-specific, fragment-GC, and positional biases. Salmon's ability to compute high-quality estimates of transcript abundances at the scale of thousands of samples, while accounting for prevalent technical biases, will enable more comprehensive comparisons of experimental data across large populations and different conditions.

Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference

2017 April | Rob Patro¹,*, Geet Duggal²,†, Michael I Love³,†, Rafael A Irizarry³,§, and Carl Kingsford¹,⁴