Understanding voom%3A precision weights unlock linear model analysis tools for RNA-seq read counts

The voom method is a new approach for analyzing RNA-seq read counts using linear modeling techniques originally developed for microarray data. It estimates the mean-variance relationship of log-counts, generates precision weights for each observation, and incorporates these into the limma empirical Bayes analysis pipeline. This allows RNA-seq analysts to use a wide range of microarray-like statistical methods. Simulation studies show that voom performs as well or better than count-based RNA-seq methods, even when data is generated according to the assumptions of earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods. Gene expression profiling is a common technique in biological research. DNA microarrays have been the primary technology for genome-wide gene expression experiments for over 16 years, with a large body of mature statistical methods developed for analyzing microarray data. These include methods for differential expression analysis, random effects, gene set enrichment, and gene set testing. The limma software package provides a popular differential expression pipeline that includes linear modeling, quantitative weights, and empirical Bayes statistical methods. RNA-seq has emerged as a revolutionary technology for expression profiling. One common approach is to count the number of sequence reads mapping to each gene or genomic feature. RNA-seq profiles consist of integer counts, unlike microarray intensities, which are continuous. Early RNA-seq publications applied microarray statistical methods to RNA-seq read counts. For example, the limma package has been used to analyze log-counts after normalization by sequencing depth. Later statistical publications argued that RNA-seq data should be analyzed using methods designed for counts. The negative binomial (NB) distribution is a common model for read counts, and methods for estimating biological variability for experiments with small numbers of replicates have been proposed. However, the mathematical theory of count distributions is less tractable than that of the normal distribution, which tends to limit the performance and usefulness of RNA-seq analysis methods. The voom method addresses the issue of error rate control with small sample sizes by estimating the mean-variance relationship of log-counts and generating precision weights for each observation. This allows for more accurate type I error rate control even with small sample sizes. The voom method also performs well in handling heterogeneous data and complex experiments, facilitating pathway analysis and gene set testing. The voom method is compared to other RNA-seq analysis methods, including edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom performs at least as well as these methods in terms of power and error rate control. When sequencing depths are the same, voom and limma-trend perform almost equally well. When sequencing depths are different, voom is the clear best performer. The voom method has the lowest false discovery rate among the methods compared. It is also faster than specialist RNA-seq methods. The voom method has been applied to RNA-seq datasets with smallThe voom method is a new approach for analyzing RNA-seq read counts using linear modeling techniques originally developed for microarray data. It estimates the mean-variance relationship of log-counts, generates precision weights for each observation, and incorporates these into the limma empirical Bayes analysis pipeline. This allows RNA-seq analysts to use a wide range of microarray-like statistical methods. Simulation studies show that voom performs as well or better than count-based RNA-seq methods, even when data is generated according to the assumptions of earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods. Gene expression profiling is a common technique in biological research. DNA microarrays have been the primary technology for genome-wide gene expression experiments for over 16 years, with a large body of mature statistical methods developed for analyzing microarray data. These include methods for differential expression analysis, random effects, gene set enrichment, and gene set testing. The limma software package provides a popular differential expression pipeline that includes linear modeling, quantitative weights, and empirical Bayes statistical methods. RNA-seq has emerged as a revolutionary technology for expression profiling. One common approach is to count the number of sequence reads mapping to each gene or genomic feature. RNA-seq profiles consist of integer counts, unlike microarray intensities, which are continuous. Early RNA-seq publications applied microarray statistical methods to RNA-seq read counts. For example, the limma package has been used to analyze log-counts after normalization by sequencing depth. Later statistical publications argued that RNA-seq data should be analyzed using methods designed for counts. The negative binomial (NB) distribution is a common model for read counts, and methods for estimating biological variability for experiments with small numbers of replicates have been proposed. However, the mathematical theory of count distributions is less tractable than that of the normal distribution, which tends to limit the performance and usefulness of RNA-seq analysis methods. The voom method addresses the issue of error rate control with small sample sizes by estimating the mean-variance relationship of log-counts and generating precision weights for each observation. This allows for more accurate type I error rate control even with small sample sizes. The voom method also performs well in handling heterogeneous data and complex experiments, facilitating pathway analysis and gene set testing. The voom method is compared to other RNA-seq analysis methods, including edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom performs at least as well as these methods in terms of power and error rate control. When sequencing depths are the same, voom and limma-trend perform almost equally well. When sequencing depths are different, voom is the clear best performer. The voom method has the lowest false discovery rate among the methods compared. It is also faster than specialist RNA-seq methods. The voom method has been applied to RNA-seq datasets with small

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

2014 | Charity W Law, Yunshun Chen, Wei Shi and Gordon K Smyth