2012 | Davis J. McCarthy, Yunshun Chen and Gordon K. Smyth
This paper presents a statistical framework for analyzing RNA-Seq data with a focus on biological variation. The method uses generalized linear models (GLMs) to detect differential gene expression in complex experiments involving multiple treatment conditions and blocking variables. It accounts for both biological and technical variation, allowing for gene-specific variability even when few biological replicates are available. The method is implemented in the edgeR package of the Bioconductor project. The approach uses empirical Bayes methods to estimate gene-specific dispersions, which are used to adjust for variability in the data. The method is demonstrated using a case study of carcinoma data, where it successfully detected differential expression in a paired design and tumour-specific expression changes. The results show that allowing for gene-specific variability is crucial for accurate detection of differential expression. The method also provides a pipeline for analyzing arbitrarily complex RNA-Seq experiments with some degree of biological replication. The paper also discusses the importance of estimating biological variation accurately, as Poisson-based models may underestimate variability in biological replicates. The methods are applicable to other types of genomic data, including DNA-Seq applications. The paper concludes that the methods provide a flexible and powerful approach for analyzing RNA-Seq data, with the ability to detect differential expression in complex experiments.This paper presents a statistical framework for analyzing RNA-Seq data with a focus on biological variation. The method uses generalized linear models (GLMs) to detect differential gene expression in complex experiments involving multiple treatment conditions and blocking variables. It accounts for both biological and technical variation, allowing for gene-specific variability even when few biological replicates are available. The method is implemented in the edgeR package of the Bioconductor project. The approach uses empirical Bayes methods to estimate gene-specific dispersions, which are used to adjust for variability in the data. The method is demonstrated using a case study of carcinoma data, where it successfully detected differential expression in a paired design and tumour-specific expression changes. The results show that allowing for gene-specific variability is crucial for accurate detection of differential expression. The method also provides a pipeline for analyzing arbitrarily complex RNA-Seq experiments with some degree of biological replication. The paper also discusses the importance of estimating biological variation accurately, as Poisson-based models may underestimate variability in biological replicates. The methods are applicable to other types of genomic data, including DNA-Seq applications. The paper concludes that the methods provide a flexible and powerful approach for analyzing RNA-Seq data, with the ability to detect differential expression in complex experiments.