[slides and audio] Estimating Log Models%3A To Transform or Not to Transform%3F

This paper, authored by Willard G. Manning and John Mullahy, explores the finite-sample behavior of two sets of estimators used to analyze data with nonnegative outcomes, including zero outcomes, and positively-skewed distributions. The estimators considered are generalized linear models (GLM) and those derived from least-squares estimators for the logarithm of the outcome (ln(y)). The study aims to evaluate the performance of these estimators under various data problems, such as skewness, kurtosis, and heteroscedasticity. The authors use simulation-based evidence to compare the first- and second-order behavior of these estimators under different assumptions about the data generating processes. They find that the choice of estimator can have significant implications for empirical results, with some estimators being biased or losing precision under certain conditions. For instance, ordinary least squares (OLS) methods can be biased in the presence of heteroscedasticity if not appropriately retransformed, while GLM models can yield imprecise estimates if the error term is heavy-tailed on the log scale. The paper also introduces a method for selecting the appropriate estimation method for a given dataset, using tests that are relatively easy to implement. These tests help determine whether the OLS-based models or GLM models are more suitable for the data. The authors illustrate their approach using data on doctor visits from the National Health Interview Survey. In conclusion, the choice of estimator for models of ln(E(y)) can have substantial implications for empirical results. The standard use of OLS with a logged dependent variable can be resilient to various data problems, except for issues like heteroscedasticity. GLM models, such as nonlinear least squares (NLS), Poisson-like, and Gamma models, provide an alternative to OLS-based models by directly estimating E(y) or ln(E(y)) without the need to estimate the variance function. However, the precision of GLM models can be diminished by higher variance and kurtosis on the log scale. The authors recommend that analysts carefully consider the specific data generating mechanism and use appropriate diagnostics to select the most suitable estimator for their research.This paper, authored by Willard G. Manning and John Mullahy, explores the finite-sample behavior of two sets of estimators used to analyze data with nonnegative outcomes, including zero outcomes, and positively-skewed distributions. The estimators considered are generalized linear models (GLM) and those derived from least-squares estimators for the logarithm of the outcome (ln(y)). The study aims to evaluate the performance of these estimators under various data problems, such as skewness, kurtosis, and heteroscedasticity. The authors use simulation-based evidence to compare the first- and second-order behavior of these estimators under different assumptions about the data generating processes. They find that the choice of estimator can have significant implications for empirical results, with some estimators being biased or losing precision under certain conditions. For instance, ordinary least squares (OLS) methods can be biased in the presence of heteroscedasticity if not appropriately retransformed, while GLM models can yield imprecise estimates if the error term is heavy-tailed on the log scale. The paper also introduces a method for selecting the appropriate estimation method for a given dataset, using tests that are relatively easy to implement. These tests help determine whether the OLS-based models or GLM models are more suitable for the data. The authors illustrate their approach using data on doctor visits from the National Health Interview Survey. In conclusion, the choice of estimator for models of ln(E(y)) can have substantial implications for empirical results. The standard use of OLS with a logged dependent variable can be resilient to various data problems, except for issues like heteroscedasticity. GLM models, such as nonlinear least squares (NLS), Poisson-like, and Gamma models, provide an alternative to OLS-based models by directly estimating E(y) or ln(E(y)) without the need to estimate the variance function. However, the precision of GLM models can be diminished by higher variance and kurtosis on the log scale. The authors recommend that analysts carefully consider the specific data generating mechanism and use appropriate diagnostics to select the most suitable estimator for their research.

ESTIMATING LOG MODELS: TO TRANSFORM OR NOT TO TRANSFORM?

November 1999 | Willard G. Manning, John Mullahy