REGRESSION WITH MISSING Y'S: AN IMPROVED STRATEGY FOR ANALYZING MULTIPLY IMPUTED DATA

REGRESSION WITH MISSING Y'S: AN IMPROVED STRATEGY FOR ANALYZING MULTIPLY IMPUTED DATA

2007 | Paul T. von Hippel
Paul T. von Hippel (2007) discusses a strategy for analyzing data with missing values of the dependent variable Y, called multiple imputation, then deletion (MID). Traditional multiple imputation (MI) involves imputing missing values and then using all imputed data for analysis. However, von Hippel argues that this approach can introduce noise into estimates. MID improves upon MI by first imputing missing values and then deleting cases with imputed Y values before analysis. This strategy protects estimates from problematic imputations and provides more efficient estimates when imputed Y values are acceptable. MID is more efficient than MI because it reduces variability in point estimates, improves standard error estimates, and produces shorter confidence intervals with equal or higher coverage rates. It is also robust to issues in the imputation model, as cases with imputed Y values are excluded before analysis. This reduces the impact of imputation errors on the final estimates. The paper explains that cases with missing Y values are useful for imputation but not for analysis. They provide information for imputing other variables but do not contribute to the regression of interest. MID is therefore more efficient than MI, especially when there are many missing values and a limited number of imputed data sets. The paper also discusses the limitations of MID, such as the assumption that imputed Y values contain no useful information. However, in most practical situations, MID offers at least a small advantage over MI. The paper provides examples of MID in social research, including studies on sexual harassment and longitudinal data analysis. It also compares MI and MID in terms of efficiency, standard errors, and confidence intervals, showing that MID often produces more accurate results. Finally, the paper addresses the question of whether MID can be worse than conventional MI. It argues that while there are special circumstances where MI may be superior, under most practical conditions, MID is at least as good as MI and often provides better results. The paper concludes that MID is a valuable strategy for analyzing data with missing Y values.Paul T. von Hippel (2007) discusses a strategy for analyzing data with missing values of the dependent variable Y, called multiple imputation, then deletion (MID). Traditional multiple imputation (MI) involves imputing missing values and then using all imputed data for analysis. However, von Hippel argues that this approach can introduce noise into estimates. MID improves upon MI by first imputing missing values and then deleting cases with imputed Y values before analysis. This strategy protects estimates from problematic imputations and provides more efficient estimates when imputed Y values are acceptable. MID is more efficient than MI because it reduces variability in point estimates, improves standard error estimates, and produces shorter confidence intervals with equal or higher coverage rates. It is also robust to issues in the imputation model, as cases with imputed Y values are excluded before analysis. This reduces the impact of imputation errors on the final estimates. The paper explains that cases with missing Y values are useful for imputation but not for analysis. They provide information for imputing other variables but do not contribute to the regression of interest. MID is therefore more efficient than MI, especially when there are many missing values and a limited number of imputed data sets. The paper also discusses the limitations of MID, such as the assumption that imputed Y values contain no useful information. However, in most practical situations, MID offers at least a small advantage over MI. The paper provides examples of MID in social research, including studies on sexual harassment and longitudinal data analysis. It also compares MI and MID in terms of efficiency, standard errors, and confidence intervals, showing that MID often produces more accurate results. Finally, the paper addresses the question of whether MID can be worse than conventional MI. It argues that while there are special circumstances where MI may be superior, under most practical conditions, MID is at least as good as MI and often provides better results. The paper concludes that MID is a valuable strategy for analyzing data with missing Y values.
Reach us at info@study.space
Understanding Regression with missing Ys%3A An improved strategy for analyzing multiply imputed data