When Should You Adjust Standard Errors for Clustering?

When Should You Adjust Standard Errors for Clustering?

September 21, 2022 | Alberto Abadie, Susan Athey, Guido W. Imbens, Jeffrey M. Wooldridge
When should standard errors be adjusted for clustering? This paper addresses this question by proposing a new framework for clustered inference on average treatment effects. The authors argue that conventional clustering methods, which assume that clusters are sampled randomly from an infinite population, may lead to severely inflated standard errors when the number of clusters in the sample is a non-negligible fraction of the population. They propose new variance estimators that correct for this bias. The paper highlights three common misconceptions about clustering adjustments. First, that clustering is needed only when there is non-zero correlation between residuals within clusters. Second, that clustering adjustments are harmless even when they are not required. Third, that researchers have only two choices: either fully adjust for clustering or not at all. The authors propose a new framework that incorporates both a sampling component and a design component. The design component accounts for between-cluster variation in treatment assignments. This is important because between-cluster variation in treatment assignments often motivates the use of clustered standard errors in empirical studies. The new framework shifts the focus from features of infinite super-populations to average treatment effects defined for the finite population at hand. This means that the presence of cluster-level unobserved components of the outcome variable becomes irrelevant for the choice of clustering level. The authors derive large sample variances for the least squares and fixed effect estimators under their proposed framework and show that they differ in general from both the robust and cluster variances. They also propose two estimators for the large sample variances, one analytic and one based on a resampling (bootstrap) approach. The new clustering framework provides actionable guidance on when standard errors should be clustered and at what level. It is particularly useful in settings where it is difficult to justify a particular error component structure. The authors argue that the new framework is more appropriate for many of the data sets economists and other social scientists analyze than the conventional model-based econometric framework.When should standard errors be adjusted for clustering? This paper addresses this question by proposing a new framework for clustered inference on average treatment effects. The authors argue that conventional clustering methods, which assume that clusters are sampled randomly from an infinite population, may lead to severely inflated standard errors when the number of clusters in the sample is a non-negligible fraction of the population. They propose new variance estimators that correct for this bias. The paper highlights three common misconceptions about clustering adjustments. First, that clustering is needed only when there is non-zero correlation between residuals within clusters. Second, that clustering adjustments are harmless even when they are not required. Third, that researchers have only two choices: either fully adjust for clustering or not at all. The authors propose a new framework that incorporates both a sampling component and a design component. The design component accounts for between-cluster variation in treatment assignments. This is important because between-cluster variation in treatment assignments often motivates the use of clustered standard errors in empirical studies. The new framework shifts the focus from features of infinite super-populations to average treatment effects defined for the finite population at hand. This means that the presence of cluster-level unobserved components of the outcome variable becomes irrelevant for the choice of clustering level. The authors derive large sample variances for the least squares and fixed effect estimators under their proposed framework and show that they differ in general from both the robust and cluster variances. They also propose two estimators for the large sample variances, one analytic and one based on a resampling (bootstrap) approach. The new clustering framework provides actionable guidance on when standard errors should be clustered and at what level. It is particularly useful in settings where it is difficult to justify a particular error component structure. The authors argue that the new framework is more appropriate for many of the data sets economists and other social scientists analyze than the conventional model-based econometric framework.
Reach us at info@study.space