Understanding Regression with missing Ys%3A An improved strategy for analyzing multiply imputed data

The paper introduces a new strategy called *multiple imputation, then deletion* (MID) for analyzing data with missing dependent variables in generalized linear models. Unlike traditional multiple imputation (MI), which retains imputed values for analysis, MID excludes cases with imputed values from the analysis after imputation. This approach reduces noise and improves efficiency by excluding problematic imputations. The paper demonstrates that MID can provide more accurate estimates, more efficient standard errors, and shorter confidence intervals compared to MI, especially when there are a large number of missing values and few imputed datasets. MID also robustifies the analysis to issues in the imputation model, as it deletes cases with imputed values, which are likely to be problematic. The paper supports these claims with theoretical explanations, simulations, and real-world examples from social research. Additionally, it discusses extensions of MID to multiple parameters and multivariate outcomes, such as repeated measures. Overall, MID is shown to be a valuable alternative to traditional MI, offering improved precision and robustness in the presence of missing data.The paper introduces a new strategy called *multiple imputation, then deletion* (MID) for analyzing data with missing dependent variables in generalized linear models. Unlike traditional multiple imputation (MI), which retains imputed values for analysis, MID excludes cases with imputed values from the analysis after imputation. This approach reduces noise and improves efficiency by excluding problematic imputations. The paper demonstrates that MID can provide more accurate estimates, more efficient standard errors, and shorter confidence intervals compared to MI, especially when there are a large number of missing values and few imputed datasets. MID also robustifies the analysis to issues in the imputation model, as it deletes cases with imputed values, which are likely to be problematic. The paper supports these claims with theoretical explanations, simulations, and real-world examples from social research. Additionally, it discusses extensions of MID to multiple parameters and multivariate outcomes, such as repeated measures. Overall, MID is shown to be a valuable alternative to traditional MI, offering improved precision and robustness in the presence of missing data.

REGRESSION WITH MISSING Y'S: AN IMPROVED STRATEGY FOR ANALYZING MULTIPLY IMPUTED DATA

2007 | Paul T. von Hippel