2002 | Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen
The t-test and linear regression are often mistakenly believed to require normality in the outcome variable. However, these methods are valid for large samples regardless of the distribution of the outcome. This is supported by simulations using extremely non-normal data, showing that the t-test and linear regression perform well even with non-normal distributions. The validity of these methods in large samples depends on the variance of the response, not the distribution. While the t-test and linear regression are useful for inference about associations, their major limitation is not distributional but whether estimating the mean answers the scientific question.
The t-test and linear regression are often preferred over nonparametric tests like the Wilcoxon rank sum test because they are more convenient and practical. However, in cases of extreme heteroscedasticity, other summary measures may be more appropriate. The Central Limit Theorem ensures that the average of a large number of independent variables is approximately normally distributed, which underlies the validity of the t-test and linear regression. This also applies to logistic regression and other rank tests.
The t-test has two versions: one assuming equal variances and another not. The version without equal variance is more general and is appropriate for large samples. The version with equal variances is less preferred in large samples due to potential issues with heteroscedasticity. Linear regression assumes constant variance of the outcome, but heteroscedasticity can affect the validity of the results. The Central Limit Theorem ensures that the regression coefficients are normally distributed for large samples, allowing for valid inference.
The literature on the t-test often focuses on small samples, but simulations show that the t-test and linear regression perform well in large samples even with non-normal data. The t-test is robust to non-normality in large samples, but heteroscedasticity can cause issues for linear regression. The bootstrap method is used to compute confidence intervals and significance levels using the t-statistic, and it is effective for non-normal data.
Linear regression is often introduced with the assumption of normality, but this is not strictly necessary for inference. The Central Limit Theorem ensures that the regression coefficients are normally distributed for large samples, allowing for valid inference. The assumption of constant variance is more important than normality for linear regression. The t-test and linear regression are often preferred over nonparametric tests in public health research due to their simplicity and effectiveness in large samples.
The t-test and linear regression are valid for large samples regardless of the distribution of the outcome variable. They are often preferred over nonparametric tests because they are more convenient and practical. However, in cases of extreme heteroscedasticity, other summary measures may be more appropriate. The Central Limit Theorem ensures that the average of a large number of independent variables is approximately normally distributed, which underlies the validity of the t-test and linear regression. This also applies to logistic regression and other rank tests. The t-test and linear regression are often preferred over nonparametric testsThe t-test and linear regression are often mistakenly believed to require normality in the outcome variable. However, these methods are valid for large samples regardless of the distribution of the outcome. This is supported by simulations using extremely non-normal data, showing that the t-test and linear regression perform well even with non-normal distributions. The validity of these methods in large samples depends on the variance of the response, not the distribution. While the t-test and linear regression are useful for inference about associations, their major limitation is not distributional but whether estimating the mean answers the scientific question.
The t-test and linear regression are often preferred over nonparametric tests like the Wilcoxon rank sum test because they are more convenient and practical. However, in cases of extreme heteroscedasticity, other summary measures may be more appropriate. The Central Limit Theorem ensures that the average of a large number of independent variables is approximately normally distributed, which underlies the validity of the t-test and linear regression. This also applies to logistic regression and other rank tests.
The t-test has two versions: one assuming equal variances and another not. The version without equal variance is more general and is appropriate for large samples. The version with equal variances is less preferred in large samples due to potential issues with heteroscedasticity. Linear regression assumes constant variance of the outcome, but heteroscedasticity can affect the validity of the results. The Central Limit Theorem ensures that the regression coefficients are normally distributed for large samples, allowing for valid inference.
The literature on the t-test often focuses on small samples, but simulations show that the t-test and linear regression perform well in large samples even with non-normal data. The t-test is robust to non-normality in large samples, but heteroscedasticity can cause issues for linear regression. The bootstrap method is used to compute confidence intervals and significance levels using the t-statistic, and it is effective for non-normal data.
Linear regression is often introduced with the assumption of normality, but this is not strictly necessary for inference. The Central Limit Theorem ensures that the regression coefficients are normally distributed for large samples, allowing for valid inference. The assumption of constant variance is more important than normality for linear regression. The t-test and linear regression are often preferred over nonparametric tests in public health research due to their simplicity and effectiveness in large samples.
The t-test and linear regression are valid for large samples regardless of the distribution of the outcome variable. They are often preferred over nonparametric tests because they are more convenient and practical. However, in cases of extreme heteroscedasticity, other summary measures may be more appropriate. The Central Limit Theorem ensures that the average of a large number of independent variables is approximately normally distributed, which underlies the validity of the t-test and linear regression. This also applies to logistic regression and other rank tests. The t-test and linear regression are often preferred over nonparametric tests