Understanding Multiple Imputation After 18%2B Years

Donald B. Rubin's 1996 paper discusses the development and application of multiple imputation for handling missing data in public-use databases. The paper emphasizes that multiple imputation is the preferred method for addressing missing data when the database constructor and ultimate users are distinct entities. The goal is to provide statistically valid inference for users who may not have specialized knowledge of nonresponse mechanisms. Rubin outlines the theoretical framework of multiple imputation, its advantages over alternative methods, and its validity under different statistical paradigms. He argues that multiple imputation is particularly suitable for complex surveys where data are shared among many users. The paper also addresses criticisms of multiple imputation, including concerns about its computational demands, storage requirements, and the validity of inferences when the imputation model is not perfectly specified. Rubin concludes that multiple imputation is a robust and effective method for handling missing data, especially when combined with Bayesian and frequentist approaches. The paper highlights the importance of including all relevant variables in the imputation model to ensure proper inference and discusses the practical considerations of implementing multiple imputation in real-world scenarios.Donald B. Rubin's 1996 paper discusses the development and application of multiple imputation for handling missing data in public-use databases. The paper emphasizes that multiple imputation is the preferred method for addressing missing data when the database constructor and ultimate users are distinct entities. The goal is to provide statistically valid inference for users who may not have specialized knowledge of nonresponse mechanisms. Rubin outlines the theoretical framework of multiple imputation, its advantages over alternative methods, and its validity under different statistical paradigms. He argues that multiple imputation is particularly suitable for complex surveys where data are shared among many users. The paper also addresses criticisms of multiple imputation, including concerns about its computational demands, storage requirements, and the validity of inferences when the imputation model is not perfectly specified. Rubin concludes that multiple imputation is a robust and effective method for handling missing data, especially when combined with Bayesian and frequentist approaches. The paper highlights the importance of including all relevant variables in the imputation model to ensure proper inference and discusses the practical considerations of implementing multiple imputation in real-world scenarios.

Multiple Imputation After 18+ Years

June 1996 | Donald B. Rubin