2018 | VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, MERT DEMIRER, ESTHER DUFO, CHRISTIAN HANSEN, WHITNEY NEWEY AND JAMES ROBINS
The paper presents a method for estimating and performing inference on a low-dimensional parameter θ₀ in the presence of high-dimensional nuisance parameters η₀. The method, called double/debiased machine learning (DML), addresses the issue of regularization bias and overfitting that arise when using machine learning (ML) methods to estimate η₀. DML uses Neyman-orthogonal moments/scores to reduce sensitivity to nuisance parameters and cross-fitting to provide an efficient form of data-splitting. The resulting estimators are shown to be N⁻¹/² consistent, approximately unbiased, and normally distributed, allowing for valid confidence statements. The method is applicable to a wide range of ML methods, including random forests, lasso, ridge, deep neural nets, and boosted trees. The paper illustrates the method's application to various models, including partially linear regression, partially linear instrumental variables models, and models for average treatment effects. It also discusses the role of sample splitting in removing bias induced by overfitting and provides theoretical results for the method in high-dimensional settings. The paper also compares the performance of DML with conventional ML estimators and shows that DML can significantly reduce bias and improve efficiency. The method is shown to be robust to high-dimensional nuisance parameters and is applicable in a wide range of econometric and statistical settings.The paper presents a method for estimating and performing inference on a low-dimensional parameter θ₀ in the presence of high-dimensional nuisance parameters η₀. The method, called double/debiased machine learning (DML), addresses the issue of regularization bias and overfitting that arise when using machine learning (ML) methods to estimate η₀. DML uses Neyman-orthogonal moments/scores to reduce sensitivity to nuisance parameters and cross-fitting to provide an efficient form of data-splitting. The resulting estimators are shown to be N⁻¹/² consistent, approximately unbiased, and normally distributed, allowing for valid confidence statements. The method is applicable to a wide range of ML methods, including random forests, lasso, ridge, deep neural nets, and boosted trees. The paper illustrates the method's application to various models, including partially linear regression, partially linear instrumental variables models, and models for average treatment effects. It also discusses the role of sample splitting in removing bias induced by overfitting and provides theoretical results for the method in high-dimensional settings. The paper also compares the performance of DML with conventional ML estimators and shows that DML can significantly reduce bias and improve efficiency. The method is shown to be robust to high-dimensional nuisance parameters and is applicable in a wide range of econometric and statistical settings.