12(2002) | Sandrine Dudoit, Yee Hwa Yang, Matthew J. Callow, Terence P. Speed
This paper presents statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. The methods include new pre-processing steps for image analysis and normalization. After normalization, the problem of differential expression is restated as a multiple hypothesis testing problem, where the null hypothesis is that there is no association between gene expression levels and the treatment/control status. Adjusted p-values are used to control the family-wise Type I error rate and account for the dependence structure between gene expression levels. A permutation procedure is used to estimate adjusted p-values. Several data displays are suggested for visual identification of differentially expressed genes. The methods are applied to microarray data from a study of gene expression in the livers of mice with very low HDL cholesterol levels. The genes identified using data from multiple slides are compared to those identified by recently published single-slide methods. The paper discusses the importance of normalization, the use of adjusted p-values for multiple testing, and the comparison of results with single-slide methods. The results show that the methods developed here identify differentially expressed genes with higher specificity and accuracy compared to single-slide methods. The paper also highlights the importance of replication in microarray experiments and the need for appropriate statistical methods to account for the complex dependencies between gene expression levels.This paper presents statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. The methods include new pre-processing steps for image analysis and normalization. After normalization, the problem of differential expression is restated as a multiple hypothesis testing problem, where the null hypothesis is that there is no association between gene expression levels and the treatment/control status. Adjusted p-values are used to control the family-wise Type I error rate and account for the dependence structure between gene expression levels. A permutation procedure is used to estimate adjusted p-values. Several data displays are suggested for visual identification of differentially expressed genes. The methods are applied to microarray data from a study of gene expression in the livers of mice with very low HDL cholesterol levels. The genes identified using data from multiple slides are compared to those identified by recently published single-slide methods. The paper discusses the importance of normalization, the use of adjusted p-values for multiple testing, and the comparison of results with single-slide methods. The results show that the methods developed here identify differentially expressed genes with higher specificity and accuracy compared to single-slide methods. The paper also highlights the importance of replication in microarray experiments and the need for appropriate statistical methods to account for the complex dependencies between gene expression levels.