Multiple Imputation by Chained Equations (MICE): Implementation in Stata

Multiple Imputation by Chained Equations (MICE): Implementation in Stata

December 2011 | Patrick Royston, Ian R. White
This paper describes the implementation of the Multiple Imputation by Chained Equations (MICE) method in Stata, called 'ice'. MICE is a practical approach for handling missing data by creating imputed datasets based on a set of imputation models, one for each variable with missing values. The paper illustrates the use of ice with a real dataset from an observational study on ovarian cancer. It also briefly discusses new database architecture and procedures for multiple imputation in Stata 11 and 12. The MICE algorithm works by iteratively imputing missing values in each variable using regression models based on other variables. The process is repeated for several cycles to stabilize the results. The number of imputation cycles is typically around 10, though more may be needed if variables are highly correlated. The number of imputed datasets (M) is usually between 3 and 5, but recent recommendations suggest larger values, especially for studies comparing methods. The paper outlines the implementation of MICE in Stata, including the 'ice' command and related programs. It discusses the imputation models available for different types of variables, such as continuous, binary, and categorical. The paper also describes the ovarian cancer dataset used for illustration, focusing on imputing missing values of albumin, a prognostic factor. The paper provides examples of imputing a single variable with missing values, including methods such as normality assumptions, ordinal logistic regression, predictive mean matching, and interval-censored regression. It also discusses the format of multiply imputed datasets and how to fit analysis models using the 'mim' program. The paper concludes with a discussion of other features of ice, including stratified imputation, conditional imputation, monotone imputation, and handling of perfect prediction. It emphasizes the importance of checking the quality of imputations and the limitations of MICE, such as the need for careful model specification and the potential for estimation difficulties with large numbers of variables. The paper highlights the flexibility and wide range of options available in ice, making it a valuable tool for handling missing data in real-world datasets.This paper describes the implementation of the Multiple Imputation by Chained Equations (MICE) method in Stata, called 'ice'. MICE is a practical approach for handling missing data by creating imputed datasets based on a set of imputation models, one for each variable with missing values. The paper illustrates the use of ice with a real dataset from an observational study on ovarian cancer. It also briefly discusses new database architecture and procedures for multiple imputation in Stata 11 and 12. The MICE algorithm works by iteratively imputing missing values in each variable using regression models based on other variables. The process is repeated for several cycles to stabilize the results. The number of imputation cycles is typically around 10, though more may be needed if variables are highly correlated. The number of imputed datasets (M) is usually between 3 and 5, but recent recommendations suggest larger values, especially for studies comparing methods. The paper outlines the implementation of MICE in Stata, including the 'ice' command and related programs. It discusses the imputation models available for different types of variables, such as continuous, binary, and categorical. The paper also describes the ovarian cancer dataset used for illustration, focusing on imputing missing values of albumin, a prognostic factor. The paper provides examples of imputing a single variable with missing values, including methods such as normality assumptions, ordinal logistic regression, predictive mean matching, and interval-censored regression. It also discusses the format of multiply imputed datasets and how to fit analysis models using the 'mim' program. The paper concludes with a discussion of other features of ice, including stratified imputation, conditional imputation, monotone imputation, and handling of perfect prediction. It emphasizes the importance of checking the quality of imputations and the limitations of MICE, such as the need for careful model specification and the potential for estimation difficulties with large numbers of variables. The paper highlights the flexibility and wide range of options available in ice, making it a valuable tool for handling missing data in real-world datasets.
Reach us at info@study.space
Understanding Multiple Imputation by Chained Equations (MICE)%3A Implementation in Stata