2008 | Achim Zeileis, Christian Kleiber, Simon Jackman
This paper introduces R functions for count data regression, particularly hurdle() and zeroinfl() from the countreg package. These functions extend classical generalized linear models (GLMs) for count data, such as Poisson and negative binomial models, by incorporating over-dispersion and excess zeros. The hurdle model combines a truncated count component with a hurdle component for zero vs. positive counts, while the zero-inflated model uses a mixture of a count component and a point mass at zero. Both models are implemented in R with functionality similar to base R GLM functions, making them easy to use in the R statistical computing environment. The paper illustrates how these models can be applied to real-world data, such as the demand for medical care, and compares their performance in terms of explaining over-dispersion and zero counts. The results show that hurdle and zero-inflated models provide better fits than classical models for count data with excess zeros. The paper also discusses the implementation details of these models in R, including their use of maximum likelihood estimation and the availability of diagnostic and inference tools.This paper introduces R functions for count data regression, particularly hurdle() and zeroinfl() from the countreg package. These functions extend classical generalized linear models (GLMs) for count data, such as Poisson and negative binomial models, by incorporating over-dispersion and excess zeros. The hurdle model combines a truncated count component with a hurdle component for zero vs. positive counts, while the zero-inflated model uses a mixture of a count component and a point mass at zero. Both models are implemented in R with functionality similar to base R GLM functions, making them easy to use in the R statistical computing environment. The paper illustrates how these models can be applied to real-world data, such as the demand for medical care, and compares their performance in terms of explaining over-dispersion and zero counts. The results show that hurdle and zero-inflated models provide better fits than classical models for count data with excess zeros. The paper also discusses the implementation details of these models in R, including their use of maximum likelihood estimation and the availability of diagnostic and inference tools.