Understanding Statistical significance for genomewide studies

The article by Storey and Tibshirani introduces the concept of the false discovery rate (FDR) and proposes the q value as a measure of statistical significance for genomewide studies. The FDR is the expected proportion of false positives among all significant features, and the q value is the expected proportion of false positives among all features as or more extreme than a given feature. This approach provides a balance between true and false positives, avoiding the flood of false positives that can occur with traditional p-value thresholds while offering a more liberal criterion than genome scans for linkage. The q value is calculated for each feature and can be thresholded to control the FDR, making it easier to interpret and apply in genomewide studies. The authors provide a method for estimating q values and demonstrate its effectiveness through examples from DNA microarray experiments, exonic splicing enhancer identification, transcriptional regulation, and transcriptional regulator binding site discovery. They also discuss the theoretical properties of their method, showing that it is conservative and can be applied to data with weak dependence.The article by Storey and Tibshirani introduces the concept of the false discovery rate (FDR) and proposes the q value as a measure of statistical significance for genomewide studies. The FDR is the expected proportion of false positives among all significant features, and the q value is the expected proportion of false positives among all features as or more extreme than a given feature. This approach provides a balance between true and false positives, avoiding the flood of false positives that can occur with traditional p-value thresholds while offering a more liberal criterion than genome scans for linkage. The q value is calculated for each feature and can be thresholded to control the FDR, making it easier to interpret and apply in genomewide studies. The authors provide a method for estimating q values and demonstrate its effectiveness through examples from DNA microarray experiments, exonic splicing enhancer identification, transcriptional regulation, and transcriptional regulator binding site discovery. They also discuss the theoretical properties of their method, showing that it is conservative and can be applied to data with weak dependence.

Statistical significance for genomewide studies

August 5, 2003 | John D. Storey* and Robert Tibshirani*