Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse

Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse

2010 | Wolfgang Forstmeier · Holger Schielzeth
This paper discusses the issue of multiple hypothesis testing in linear models, particularly in evolutionary and behavioural research. When fitting generalised linear models (GLMs) with multiple predictors, researchers often start with a complex full model and simplify it by removing non-significant terms. This process can lead to an overestimation of effect sizes and the 'winner's curse', where significant effects are inflated and often cannot be reproduced in follow-up studies. The probability of finding at least one significant effect is high, even when all null hypotheses are true, and this probability is close to theoretical expectations when the sample size is large relative to the number of predictors. However, type I error rates often exceed these expectations, especially when the model is over-fitted (low N/k ratio). The paper argues that full model tests and P value adjustments can help guide how frequently type I errors arise due to sampling variation. However, the presentation of full models is preferred as they better reflect the range of predictors investigated and ensure a balanced representation of non-significant results. The authors also highlight the problem of overestimation of effect sizes and the need for more statistical conservatism to reduce publication bias. They suggest that researchers should report all effects along with their standard errors, as this is most valuable for the scientific community. The paper also discusses the importance of considering the number of predictors and sample size in model selection and the potential issues with correlated predictors. Overall, the paper emphasizes the need for careful model selection and reporting to avoid inflated effect sizes and the winner's curse.This paper discusses the issue of multiple hypothesis testing in linear models, particularly in evolutionary and behavioural research. When fitting generalised linear models (GLMs) with multiple predictors, researchers often start with a complex full model and simplify it by removing non-significant terms. This process can lead to an overestimation of effect sizes and the 'winner's curse', where significant effects are inflated and often cannot be reproduced in follow-up studies. The probability of finding at least one significant effect is high, even when all null hypotheses are true, and this probability is close to theoretical expectations when the sample size is large relative to the number of predictors. However, type I error rates often exceed these expectations, especially when the model is over-fitted (low N/k ratio). The paper argues that full model tests and P value adjustments can help guide how frequently type I errors arise due to sampling variation. However, the presentation of full models is preferred as they better reflect the range of predictors investigated and ensure a balanced representation of non-significant results. The authors also highlight the problem of overestimation of effect sizes and the need for more statistical conservatism to reduce publication bias. They suggest that researchers should report all effects along with their standard errors, as this is most valuable for the scientific community. The paper also discusses the importance of considering the number of predictors and sample size in model selection and the potential issues with correlated predictors. Overall, the paper emphasizes the need for careful model selection and reporting to avoid inflated effect sizes and the winner's curse.
Reach us at info@study.space
[slides] Cryptic multiple hypotheses testing in linear models%3A overestimated effect sizes and the winner's curse | StudySpace