[slides and audio] Multiple Imputation of Missing Values

This article discusses the implementation of multiple imputation for handling missing data in Stata, following the seminal work of Rubin. The author, Patrick Royston, introduces five ado-files: `mvis`, `uvis`, `micombine`, `mijoin`, and `misplit`, which facilitate the creation and analysis of multiple imputations. The core method described is the "switching regression" technique, also known as multivariate imputation by chained equations (MICE). The basic idea is to create multiple copies of the data, each with missing values imputed, and then analyze each complete dataset independently. Parameter estimates are averaged across the imputed datasets to provide a single estimate, with standard errors computed using Rubin's rules. The article includes detailed syntax for each ado-file and examples, particularly focusing on a breast cancer dataset. It also discusses the importance of choosing an appropriate number of imputations, \( m \), to ensure reliable confidence intervals. The author proposes a rule of thumb for selecting \( m \) based on the coefficient of variation of the confidence coefficient, suggesting that \( m \) should be large enough to keep this coefficient below 5%. The article concludes with a discussion on the limitations and considerations when using multiple imputation, emphasizing the importance of the missing at random (MAR) assumption.This article discusses the implementation of multiple imputation for handling missing data in Stata, following the seminal work of Rubin. The author, Patrick Royston, introduces five ado-files: `mvis`, `uvis`, `micombine`, `mijoin`, and `misplit`, which facilitate the creation and analysis of multiple imputations. The core method described is the "switching regression" technique, also known as multivariate imputation by chained equations (MICE). The basic idea is to create multiple copies of the data, each with missing values imputed, and then analyze each complete dataset independently. Parameter estimates are averaged across the imputed datasets to provide a single estimate, with standard errors computed using Rubin's rules. The article includes detailed syntax for each ado-file and examples, particularly focusing on a breast cancer dataset. It also discusses the importance of choosing an appropriate number of imputations, \( m \), to ensure reliable confidence intervals. The author proposes a rule of thumb for selecting \( m \) based on the coefficient of variation of the confidence coefficient, suggesting that \( m \) should be large enough to keep this coefficient below 5%. The article concludes with a discussion on the limitations and considerations when using multiple imputation, emphasizing the importance of the missing at random (MAR) assumption.

Multiple imputation of missing values

2004 | Patrick Royston