March 2002 | Marianne Bertrand, Esther Duflo, Sendhil Mullainathan
This paper examines the reliability of standard errors in Difference-in-Differences (DD) estimation, which is a common method for estimating causal relationships. The authors find that DD estimates often suffer from severe bias in standard errors due to serial correlation in both the dependent variable and the treatment variable. This bias can lead to over-rejection of the null hypothesis of no effect, even when there is no true effect.
The authors test this by generating placebo laws and applying DD estimation to them. They find that with 21 years of data, up to 45% of the placebo laws show statistically significant effects at the 5% level, indicating a serious problem with the standard errors.
To address this issue, the authors propose three solutions. The first two are simple and work well for large samples: collapsing the data into pre- and post-intervention periods, and allowing for an arbitrary covariance structure over time within each state. A third solution, based on randomization inference, works well regardless of sample size. This method uses the empirical distribution of estimated effects for placebo laws to form the test distribution.
The authors also find that the magnitude of the problem depends on the length of the time series and the serial correlation of the dependent variable. They show that the problem is particularly severe when the treatment variable is serially correlated, which is often the case in DD estimation.
The paper concludes that the standard errors from OLS estimation in DD models are often severely underestimated due to serial correlation, leading to over-rejection of the null hypothesis. The proposed solutions, particularly the randomization inference method, can help address this issue.This paper examines the reliability of standard errors in Difference-in-Differences (DD) estimation, which is a common method for estimating causal relationships. The authors find that DD estimates often suffer from severe bias in standard errors due to serial correlation in both the dependent variable and the treatment variable. This bias can lead to over-rejection of the null hypothesis of no effect, even when there is no true effect.
The authors test this by generating placebo laws and applying DD estimation to them. They find that with 21 years of data, up to 45% of the placebo laws show statistically significant effects at the 5% level, indicating a serious problem with the standard errors.
To address this issue, the authors propose three solutions. The first two are simple and work well for large samples: collapsing the data into pre- and post-intervention periods, and allowing for an arbitrary covariance structure over time within each state. A third solution, based on randomization inference, works well regardless of sample size. This method uses the empirical distribution of estimated effects for placebo laws to form the test distribution.
The authors also find that the magnitude of the problem depends on the length of the time series and the serial correlation of the dependent variable. They show that the problem is particularly severe when the treatment variable is serially correlated, which is often the case in DD estimation.
The paper concludes that the standard errors from OLS estimation in DD models are often severely underestimated due to serial correlation, leading to over-rejection of the null hypothesis. The proposed solutions, particularly the randomization inference method, can help address this issue.