21 MARCH 2019 | Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories
Scientists call for the retirement of statistical significance, arguing that it leads to misleading conclusions. Valentin Amrhein, Sander Greenland, Blake McShane, and over 800 signatories urge the abandonment of the practice of categorizing results as 'statistically significant' or 'non-significant'. This approach can mislead researchers and the public, as it implies a binary distinction between results that support or refute a hypothesis, when in reality, statistical measures like P values and confidence intervals vary continuously.
The authors argue that statistical significance thresholds, such as P < 0.05, are not reliable indicators of truth or importance. Instead, they advocate for a more nuanced interpretation of statistical results, such as using confidence intervals as 'compatibility intervals' that reflect the range of plausible values for an effect. This approach would avoid overconfidence in results and prevent the misinterpretation of non-significant results as 'no effect'.
The authors also highlight that statistical significance can lead to biased interpretations, as statistically significant results are often overestimated in magnitude, while non-significant results are underestimated. This can result in misleading conclusions and a distortion of the scientific literature. They emphasize the importance of pre-registering studies and publishing all results to mitigate these issues.
The call to retire statistical significance is not a ban on P values or other statistical measures, but rather a rejection of their categorical use. The authors stress the need for humility in interpreting statistical results, acknowledging the limitations of statistical assumptions and the importance of considering other factors such as background evidence, study design, and data quality.
In conclusion, the authors advocate for a shift away from the binary interpretation of statistical results and towards a more nuanced understanding of uncertainty. This would help to prevent overconfident claims and ensure that scientific conclusions are based on a comprehensive consideration of all available evidence.Scientists call for the retirement of statistical significance, arguing that it leads to misleading conclusions. Valentin Amrhein, Sander Greenland, Blake McShane, and over 800 signatories urge the abandonment of the practice of categorizing results as 'statistically significant' or 'non-significant'. This approach can mislead researchers and the public, as it implies a binary distinction between results that support or refute a hypothesis, when in reality, statistical measures like P values and confidence intervals vary continuously.
The authors argue that statistical significance thresholds, such as P < 0.05, are not reliable indicators of truth or importance. Instead, they advocate for a more nuanced interpretation of statistical results, such as using confidence intervals as 'compatibility intervals' that reflect the range of plausible values for an effect. This approach would avoid overconfidence in results and prevent the misinterpretation of non-significant results as 'no effect'.
The authors also highlight that statistical significance can lead to biased interpretations, as statistically significant results are often overestimated in magnitude, while non-significant results are underestimated. This can result in misleading conclusions and a distortion of the scientific literature. They emphasize the importance of pre-registering studies and publishing all results to mitigate these issues.
The call to retire statistical significance is not a ban on P values or other statistical measures, but rather a rejection of their categorical use. The authors stress the need for humility in interpreting statistical results, acknowledging the limitations of statistical assumptions and the importance of considering other factors such as background evidence, study design, and data quality.
In conclusion, the authors advocate for a shift away from the binary interpretation of statistical results and towards a more nuanced understanding of uncertainty. This would help to prevent overconfident claims and ensure that scientific conclusions are based on a comprehensive consideration of all available evidence.