[slides] Differential expression analysis for sequence count data

The paper by Simon Anders and Wolfgang Huber introduces a method for differential expression analysis of sequence count data, such as those obtained from RNA-Seq, ChIP-Seq, or barcode counting. The authors propose a method based on the negative binomial distribution, where the variance and mean are linked by local regression. This method is implemented in the R/Bioconductor package DESeq. The background section explains that high-throughput sequencing assays provide quantitative readouts in the form of count data, and the goal is to infer differential signal correctly and with good statistical power. The Poisson distribution, commonly used for count data, is noted to be too restrictive and can lead to an underestimation of variability, resulting in inflated type-I error rates. To address this issue, the authors propose using the negative binomial distribution, which allows for overdispersion. They develop a model that relates the mean and variance of the negative binomial distribution to the data, allowing for more flexible estimation of these parameters. The model is fitted to data using a combination of size factors, expression strength parameters, and smooth functions of the raw variance. The results and discussion section demonstrates the effectiveness of the method through simulations and real data applications. The authors show that their method controls type-I error rates better than alternative approaches, such as the Poisson-based χ² test and edgeR. They also compare the performance of DESeq with edgeR, finding that DESeq provides more balanced selection of differentially expressed genes across the dynamic range of counts. The paper concludes by discussing the advantages of the proposed method over existing approaches, including its ability to handle small numbers of replicates and its flexibility in estimating variance-mean relationships. The authors also provide guidelines for experiment design and highlight the importance of considering both biological and technical variability in differential expression analysis.The paper by Simon Anders and Wolfgang Huber introduces a method for differential expression analysis of sequence count data, such as those obtained from RNA-Seq, ChIP-Seq, or barcode counting. The authors propose a method based on the negative binomial distribution, where the variance and mean are linked by local regression. This method is implemented in the R/Bioconductor package DESeq. The background section explains that high-throughput sequencing assays provide quantitative readouts in the form of count data, and the goal is to infer differential signal correctly and with good statistical power. The Poisson distribution, commonly used for count data, is noted to be too restrictive and can lead to an underestimation of variability, resulting in inflated type-I error rates. To address this issue, the authors propose using the negative binomial distribution, which allows for overdispersion. They develop a model that relates the mean and variance of the negative binomial distribution to the data, allowing for more flexible estimation of these parameters. The model is fitted to data using a combination of size factors, expression strength parameters, and smooth functions of the raw variance. The results and discussion section demonstrates the effectiveness of the method through simulations and real data applications. The authors show that their method controls type-I error rates better than alternative approaches, such as the Poisson-based χ² test and edgeR. They also compare the performance of DESeq with edgeR, finding that DESeq provides more balanced selection of differentially expressed genes across the dynamic range of counts. The paper concludes by discussing the advantages of the proposed method over existing approaches, including its ability to handle small numbers of replicates and its flexibility in estimating variance-mean relationships. The authors also provide guidelines for experiment design and highlight the importance of considering both biological and technical variability in differential expression analysis.

Differential expression analysis for sequence count data

2010 | Simon Anders*, Wolfgang Huber