2002 | Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen
The article "The Importance of the Normality Assumption in Large Public Health Data Sets" by Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen addresses the misconception that the t-test and linear regression are only valid for normally distributed outcomes. The authors argue that these statistical methods are valid in large samples regardless of the distribution of the outcome variable, as long as the variance assumption holds. They demonstrate this through simulations using extremely non-Normal data and discuss the limitations of other methods like the Wilcoxon rank-sum test and ordinal logistic regression. The article emphasizes that the primary concern with these methods is not the distribution but the ability to detect and estimate differences in means. The authors provide criteria for choosing between different summary measures and statistical techniques, based on clinical or scientific relevance, plausibility of differences, and statistical precision. They conclude that the t-test and linear regression are useful default tools in public health research, especially when dealing with large samples, and that formal statistical tests for normality are often unnecessary.The article "The Importance of the Normality Assumption in Large Public Health Data Sets" by Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen addresses the misconception that the t-test and linear regression are only valid for normally distributed outcomes. The authors argue that these statistical methods are valid in large samples regardless of the distribution of the outcome variable, as long as the variance assumption holds. They demonstrate this through simulations using extremely non-Normal data and discuss the limitations of other methods like the Wilcoxon rank-sum test and ordinal logistic regression. The article emphasizes that the primary concern with these methods is not the distribution but the ability to detect and estimate differences in means. The authors provide criteria for choosing between different summary measures and statistical techniques, based on clinical or scientific relevance, plausibility of differences, and statistical precision. They conclude that the t-test and linear regression are useful default tools in public health research, especially when dealing with large samples, and that formal statistical tests for normality are often unnecessary.