[slides and audio] The prevention and handling of the missing data

Missing data is a common issue in research, including medical studies, and can affect the validity of conclusions. This review discusses the types of missing data, their mechanisms, and methods for handling them. Missing data can reduce statistical power, introduce bias, and affect the representativeness of samples. There are three main types of missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). MCAR data are not related to the missing values, while MAR data depend on observed values. MNAR data are not explained by observed or missing values. Various techniques are used to handle missing data, including listwise deletion, pairwise deletion, mean substitution, regression imputation, last observation carried forward (LOCF), maximum likelihood, expectation-maximization, and multiple imputation. Each method has its advantages and limitations. Listwise deletion is simple but may introduce bias. Mean substitution can lead to inconsistent bias. LOCF is easy to use but may produce biased results. Maximum likelihood and expectation-maximization are more complex but can provide unbiased estimates. Multiple imputation is considered the most robust method as it accounts for uncertainty and variability in missing data. Sensitivity analysis is also recommended to assess the robustness of results under different assumptions about missing data. The review concludes that preventing missing data through careful study design and data collection is ideal. If prevention is not possible, appropriate statistical methods should be used to handle missing data. Single imputation methods like LOCF are not optimal for final analysis as they can lead to biased results. Multiple imputation is generally preferred as it provides valid statistical inference and is robust to violations of normality assumptions. Researchers should also consider the reasons for missing data and include relevant variables in the analysis.Missing data is a common issue in research, including medical studies, and can affect the validity of conclusions. This review discusses the types of missing data, their mechanisms, and methods for handling them. Missing data can reduce statistical power, introduce bias, and affect the representativeness of samples. There are three main types of missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). MCAR data are not related to the missing values, while MAR data depend on observed values. MNAR data are not explained by observed or missing values. Various techniques are used to handle missing data, including listwise deletion, pairwise deletion, mean substitution, regression imputation, last observation carried forward (LOCF), maximum likelihood, expectation-maximization, and multiple imputation. Each method has its advantages and limitations. Listwise deletion is simple but may introduce bias. Mean substitution can lead to inconsistent bias. LOCF is easy to use but may produce biased results. Maximum likelihood and expectation-maximization are more complex but can provide unbiased estimates. Multiple imputation is considered the most robust method as it accounts for uncertainty and variability in missing data. Sensitivity analysis is also recommended to assess the robustness of results under different assumptions about missing data. The review concludes that preventing missing data through careful study design and data collection is ideal. If prevention is not possible, appropriate statistical methods should be used to handle missing data. Single imputation methods like LOCF are not optimal for final analysis as they can lead to biased results. Multiple imputation is generally preferred as it provides valid statistical inference and is robust to violations of normality assumptions. Researchers should also consider the reasons for missing data and include relevant variables in the analysis.

The prevention and handling of the missing data

2013 | Hyun Kang