Statistical significance for genomewide studies

Statistical significance for genomewide studies

August 5, 2003 | John D. Storey* and Robert Tibshirani*
The article introduces the q value as a measure of statistical significance for genomewide studies, which addresses the issue of false discoveries in large-scale data analysis. Traditional p-value thresholds are often too strict, leading to an excessive number of false positives, while other methods are too liberal. The q value, based on the false discovery rate (FDR), provides a balanced approach by estimating the proportion of false positives among significant features. This measure is more interpretable and allows for a more accurate assessment of significance in genomewide studies. The q value is calculated by estimating the FDR, which is the expected proportion of false positives among all significant features. This approach is particularly useful in genomewide studies where thousands of features are tested simultaneously. The q value is similar to the p value but instead of measuring the probability of a false positive, it measures the expected proportion of false positives among the significant features. The article discusses several motivating examples where the q value is applied to genomewide data, including the detection of differentially expressed genes, identification of exonic splice enhancers, genetic dissection of transcriptional regulation, and finding binding sites of transcriptional regulators. In each case, the q value provides a more accurate and interpretable measure of significance compared to traditional p-value thresholds. The methodology for estimating q values involves calculating the FDR for different thresholds and using this to determine the q value for each feature. The q value is then used to identify significant features while controlling the proportion of false positives. The article also discusses the theoretical properties of the q value, including its conservative nature and its ability to provide a more accurate measure of significance in genomewide studies. The q value is particularly useful in genomewide studies where the number of features is large, and the goal is to identify as many significant features as possible while minimizing the number of false positives. The q value provides a more accurate and interpretable measure of significance compared to traditional p-value thresholds, making it a valuable tool for genomewide studies.The article introduces the q value as a measure of statistical significance for genomewide studies, which addresses the issue of false discoveries in large-scale data analysis. Traditional p-value thresholds are often too strict, leading to an excessive number of false positives, while other methods are too liberal. The q value, based on the false discovery rate (FDR), provides a balanced approach by estimating the proportion of false positives among significant features. This measure is more interpretable and allows for a more accurate assessment of significance in genomewide studies. The q value is calculated by estimating the FDR, which is the expected proportion of false positives among all significant features. This approach is particularly useful in genomewide studies where thousands of features are tested simultaneously. The q value is similar to the p value but instead of measuring the probability of a false positive, it measures the expected proportion of false positives among the significant features. The article discusses several motivating examples where the q value is applied to genomewide data, including the detection of differentially expressed genes, identification of exonic splice enhancers, genetic dissection of transcriptional regulation, and finding binding sites of transcriptional regulators. In each case, the q value provides a more accurate and interpretable measure of significance compared to traditional p-value thresholds. The methodology for estimating q values involves calculating the FDR for different thresholds and using this to determine the q value for each feature. The q value is then used to identify significant features while controlling the proportion of false positives. The article also discusses the theoretical properties of the q value, including its conservative nature and its ability to provide a more accurate measure of significance in genomewide studies. The q value is particularly useful in genomewide studies where the number of features is large, and the goal is to identify as many significant features as possible while minimizing the number of false positives. The q value provides a more accurate and interpretable measure of significance compared to traditional p-value thresholds, making it a valuable tool for genomewide studies.
Reach us at info@study.space