Multiple Imputation After 18+ Years

Multiple Imputation After 18+ Years

June 1996 | Donald B. Rubin
Donald B. Rubin's article "Multiple Imputation After 18+ Years" discusses the use of multiple imputation to handle missing data in public-use databases, where the database constructor and ultimate users are distinct entities. The primary objective is to provide valid frequency inference for ultimate users who have limited knowledge of specific reasons and models for nonresponse. Rubin emphasizes that multiple imputation is the method of choice for addressing missing data problems, as it allows for statistically valid inference in complex survey contexts. The article begins by describing the assumed statistical computing environment and objectives, highlighting the need for statistically valid inference in the presence of missing data. It then reviews the multiple imputation framework and its standard results, including the concept of repeated imputations and the evaluation of procedures under the randomization-based paradigm. Key points include: 1. **Assumed Environment**: Public-use databases are analyzed by many users with varying degrees of expertise and computing power. Ultimate users have access to standard complete-data techniques but often lack the tools to handle missing data effectively. 2. **Achievable Basic Objective**: The goal is to apply standard complete-data statistical tools to incomplete data sets using the same command structure and output standards as if there were no missing data. 3. **Scientific Estimands**: These are quantities of scientific interest that can be calculated in the population and do not change with the data collection design. 4. **Statistical Validity**: This is defined as frequency validity, averaging over randomization distributions generated by known sampling mechanisms and posited distributions for nonresponse. 5. **Proper Multiple Imputation**: A key concept is "proper multiple imputation," which involves approximately unbiased estimation of complete-data statistics and their variance-covariance matrices. 6. **Implementation Issues**: The article addresses concerns about the implementation of multiple imputation, including operational difficulties and the acceptability of answers obtained through simulation. 7. **Validity of Inferences**: Despite criticisms, the author argues that repeated imputations under an appropriate Bayesian model can lead to valid inferences, provided the imputation model is correctly specified. Overall, the article provides a comprehensive review of multiple imputation, its theoretical foundations, and practical considerations, emphasizing its importance in handling missing data in public-use databases.Donald B. Rubin's article "Multiple Imputation After 18+ Years" discusses the use of multiple imputation to handle missing data in public-use databases, where the database constructor and ultimate users are distinct entities. The primary objective is to provide valid frequency inference for ultimate users who have limited knowledge of specific reasons and models for nonresponse. Rubin emphasizes that multiple imputation is the method of choice for addressing missing data problems, as it allows for statistically valid inference in complex survey contexts. The article begins by describing the assumed statistical computing environment and objectives, highlighting the need for statistically valid inference in the presence of missing data. It then reviews the multiple imputation framework and its standard results, including the concept of repeated imputations and the evaluation of procedures under the randomization-based paradigm. Key points include: 1. **Assumed Environment**: Public-use databases are analyzed by many users with varying degrees of expertise and computing power. Ultimate users have access to standard complete-data techniques but often lack the tools to handle missing data effectively. 2. **Achievable Basic Objective**: The goal is to apply standard complete-data statistical tools to incomplete data sets using the same command structure and output standards as if there were no missing data. 3. **Scientific Estimands**: These are quantities of scientific interest that can be calculated in the population and do not change with the data collection design. 4. **Statistical Validity**: This is defined as frequency validity, averaging over randomization distributions generated by known sampling mechanisms and posited distributions for nonresponse. 5. **Proper Multiple Imputation**: A key concept is "proper multiple imputation," which involves approximately unbiased estimation of complete-data statistics and their variance-covariance matrices. 6. **Implementation Issues**: The article addresses concerns about the implementation of multiple imputation, including operational difficulties and the acceptability of answers obtained through simulation. 7. **Validity of Inferences**: Despite criticisms, the author argues that repeated imputations under an appropriate Bayesian model can lead to valid inferences, provided the imputation model is correctly specified. Overall, the article provides a comprehensive review of multiple imputation, its theoretical foundations, and practical considerations, emphasizing its importance in handling missing data in public-use databases.
Reach us at info@study.space
[slides] Multiple Imputation After 18%2B Years | StudySpace