2012, Vol. 40, No. 10 | Davis J. McCarthy, Yunshun Chen, Gordon K. Smyth
The article presents a flexible statistical framework for analyzing read counts from RNA-Seq gene expression studies, particularly in complex experiments involving multiple treatment conditions and blocking variables. The framework accounts for both biological and technical variations, with biological variation estimated separately from sequencing technology-related errors. Novel empirical Bayes methods allow each gene to have its own specific variability, even with limited biological replicates. The methods are implemented in the edgeR package of the Bioconductor project. A case study on carcinoma data demonstrates the effectiveness of generalized linear model (GLM) methods in detecting differential expression in a paired design and identifying tumor-specific expression changes. The study highlights the importance of allowing for gene-specific variability to focus on consistent changes between biological replicates. The article also discusses computational approaches to make non-linear model fitting faster and more reliable, and simulations show the accuracy of adjusted profile likelihood estimators in complex scenarios. The methods are applicable to various genomic data types, including DNA-Seq applications such as ChIP-Seq and DNA methylation analyses.The article presents a flexible statistical framework for analyzing read counts from RNA-Seq gene expression studies, particularly in complex experiments involving multiple treatment conditions and blocking variables. The framework accounts for both biological and technical variations, with biological variation estimated separately from sequencing technology-related errors. Novel empirical Bayes methods allow each gene to have its own specific variability, even with limited biological replicates. The methods are implemented in the edgeR package of the Bioconductor project. A case study on carcinoma data demonstrates the effectiveness of generalized linear model (GLM) methods in detecting differential expression in a paired design and identifying tumor-specific expression changes. The study highlights the importance of allowing for gene-specific variability to focus on consistent changes between biological replicates. The article also discusses computational approaches to make non-linear model fitting faster and more reliable, and simulations show the accuracy of adjusted profile likelihood estimators in complex scenarios. The methods are applicable to various genomic data types, including DNA-Seq applications such as ChIP-Seq and DNA methylation analyses.