Principled missing data methods for researchers

Principled missing data methods for researchers

2013 | Yiran Dong and Chao-Ying Joanne Peng
The paper discusses three principled methods for handling missing data: multiple imputation (MI), full information maximum likelihood (FIML), and expectation-maximization (EM). These methods are contrasted with listwise deletion (LD) and applied to a real-world dataset. The methods are shown to provide more accurate parameter estimates than LD, which is known for biased results. The paper emphasizes the importance of statistical assumptions and recommends that researchers explicitly acknowledge missing data, use principled methods, and incorporate appropriate treatment of missing data into manuscript review standards. Missing data are common in quantitative research, with 36% of studies having no missing data, 48% having missing data, and 16% having uncertain missing data. Among studies with missing data, 97% used LD or pairwise deletion (PD), which are ad hoc and inefficient. The three principled methods (MI, FIML, EM) consider the conditions under which data are missing and provide better estimates than LD or PD. These methods do not replace missing values directly but use available data and statistical assumptions to estimate parameters and missing data mechanisms. The paper reviews quantitative studies published in the Journal of Educational Psychology (JEP) between 2009 and 2010 and finds that 67.6% of articles explicitly acknowledged missing data, while 16.2% had no missing data and 11 did not provide sufficient information. Of the 46 articles with missing data, 37% did not use any method, 28.3% used LD or PD, 26.1% used FIML, 8.7% used EM, 6.5% used MI, and 2.2% used both EM and LD. Only two articles explained their rationale for using FIML and LD, and one misinterpreted FIML as an imputation method. Compared to studies from 1998 to 2004, there has been an improvement in the decreased use of LD and PD and an increased use of FIML, EM, and MI. However, several research practices from a decade ago still persist, such as not explicitly acknowledging missing data, not describing the approach used, and not testing assumptions. These findings suggest that researchers in educational psychology have not fully embraced principled missing data methods. Treating missing data is usually not the focus of a substantive study, but failing to do so properly causes serious problems. Missing data can introduce potential bias in parameter estimation and weaken the generalizability of results. Ignoring cases with missing data leads to the loss of information, which decreases statistical power and increases standard errors. Most statistical procedures are designed for complete data, and failing to edit data properly can make the data unsuitable for analysis. The paper promotes three principled methods: MI, FIML, and EM, by illustrating them with an empirical dataset and discussing their applicationsThe paper discusses three principled methods for handling missing data: multiple imputation (MI), full information maximum likelihood (FIML), and expectation-maximization (EM). These methods are contrasted with listwise deletion (LD) and applied to a real-world dataset. The methods are shown to provide more accurate parameter estimates than LD, which is known for biased results. The paper emphasizes the importance of statistical assumptions and recommends that researchers explicitly acknowledge missing data, use principled methods, and incorporate appropriate treatment of missing data into manuscript review standards. Missing data are common in quantitative research, with 36% of studies having no missing data, 48% having missing data, and 16% having uncertain missing data. Among studies with missing data, 97% used LD or pairwise deletion (PD), which are ad hoc and inefficient. The three principled methods (MI, FIML, EM) consider the conditions under which data are missing and provide better estimates than LD or PD. These methods do not replace missing values directly but use available data and statistical assumptions to estimate parameters and missing data mechanisms. The paper reviews quantitative studies published in the Journal of Educational Psychology (JEP) between 2009 and 2010 and finds that 67.6% of articles explicitly acknowledged missing data, while 16.2% had no missing data and 11 did not provide sufficient information. Of the 46 articles with missing data, 37% did not use any method, 28.3% used LD or PD, 26.1% used FIML, 8.7% used EM, 6.5% used MI, and 2.2% used both EM and LD. Only two articles explained their rationale for using FIML and LD, and one misinterpreted FIML as an imputation method. Compared to studies from 1998 to 2004, there has been an improvement in the decreased use of LD and PD and an increased use of FIML, EM, and MI. However, several research practices from a decade ago still persist, such as not explicitly acknowledging missing data, not describing the approach used, and not testing assumptions. These findings suggest that researchers in educational psychology have not fully embraced principled missing data methods. Treating missing data is usually not the focus of a substantive study, but failing to do so properly causes serious problems. Missing data can introduce potential bias in parameter estimation and weaken the generalizability of results. Ignoring cases with missing data leads to the loss of information, which decreases statistical power and increases standard errors. Most statistical procedures are designed for complete data, and failing to edit data properly can make the data unsuitable for analysis. The paper promotes three principled methods: MI, FIML, and EM, by illustrating them with an empirical dataset and discussing their applications
Reach us at info@study.space