mice: Multivariate Imputation by Chained Equations in R

mice: Multivariate Imputation by Chained Equations in R

December 2011 | Stef van Buuren, Karin Groothuis-Oudshoorn
The R package mice implements multiple imputation by chained equations (MICE) for incomplete multivariate data. It was first released in 2000 as an S-PLUS library and later as an R package. The mice 2.9 version extends the functionality of mice 1.0 by allowing more general analysis of imputed data and extending the range of models under which pooling works. It includes new features for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. It also improves imputation of categorical data to avoid problems caused by perfect prediction. The mice package provides a modular approach to multiple imputation, consisting of three main steps: imputation, analysis, and pooling. The imputation step involves generating multiple imputed datasets by replacing missing values with plausible values drawn from a distribution specific to each missing entry. The analysis step involves estimating the quantity of interest (Q) on each imputed dataset. The pooling step involves combining the estimates from the imputed datasets into a single estimate and estimating its variance. The mice algorithm uses a chained equations approach, where each variable is imputed based on the other variables. The imputation model should account for the process that created the missing data, preserve the relations in the data, and preserve the uncertainty about these relations. The algorithm iteratively samples from conditional distributions to obtain the posterior distribution of the parameters. The mice package includes various functions for imputation, analysis, and pooling. It supports a wide range of imputation methods, including predictive mean matching, linear regression, logistic regression, and multinomial regression. It also includes functions for handling multilevel data, categorical data, and other complex data types. The mice package is widely used in various fields, including medicine, social sciences, and economics, for handling missing data. It provides a hands-on, stepwise approach to solving applied incomplete data problems. The package is available from the Comprehensive R Archive Network and is compatible with previous versions. It is designed for applied researchers who want to address problems caused by missing data using multiple imputation. The package assumes basic familiarity with R and provides a simple architecture that allows easy access to all program code from within the R environment.The R package mice implements multiple imputation by chained equations (MICE) for incomplete multivariate data. It was first released in 2000 as an S-PLUS library and later as an R package. The mice 2.9 version extends the functionality of mice 1.0 by allowing more general analysis of imputed data and extending the range of models under which pooling works. It includes new features for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. It also improves imputation of categorical data to avoid problems caused by perfect prediction. The mice package provides a modular approach to multiple imputation, consisting of three main steps: imputation, analysis, and pooling. The imputation step involves generating multiple imputed datasets by replacing missing values with plausible values drawn from a distribution specific to each missing entry. The analysis step involves estimating the quantity of interest (Q) on each imputed dataset. The pooling step involves combining the estimates from the imputed datasets into a single estimate and estimating its variance. The mice algorithm uses a chained equations approach, where each variable is imputed based on the other variables. The imputation model should account for the process that created the missing data, preserve the relations in the data, and preserve the uncertainty about these relations. The algorithm iteratively samples from conditional distributions to obtain the posterior distribution of the parameters. The mice package includes various functions for imputation, analysis, and pooling. It supports a wide range of imputation methods, including predictive mean matching, linear regression, logistic regression, and multinomial regression. It also includes functions for handling multilevel data, categorical data, and other complex data types. The mice package is widely used in various fields, including medicine, social sciences, and economics, for handling missing data. It provides a hands-on, stepwise approach to solving applied incomplete data problems. The package is available from the Comprehensive R Archive Network and is compatible with previous versions. It is designed for applied researchers who want to address problems caused by missing data using multiple imputation. The package assumes basic familiarity with R and provides a simple architecture that allows easy access to all program code from within the R environment.
Reach us at info@study.space