1 September 2016 | Aki Vehtari, Andrew Gelman, Jonah Gabry
This paper introduces efficient computations for leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) for Bayesian models. LOO and WAIC estimate out-of-sample predictive accuracy using posterior simulations. LOO computes predictive accuracy by leaving out one data point at a time, while WAIC is an approximation of LOO that uses the posterior distribution of the parameters. Both methods are more accurate than simpler estimates like AIC and DIC but require more computation. To improve the stability of LOO, the authors introduce Pareto-smoothed importance sampling (PSIS), which regularizes importance weights and reduces variance. PSIS-LOO is more robust than LOO in cases with weak priors or influential observations. The authors also provide approximate standard errors for predictive errors and model comparisons. They implement the computations in an R package called 1oo and demonstrate using models fit with Stan. The paper also discusses K-fold cross-validation as an alternative to LOO when PSIS-LOO fails. The authors show that PSIS-LOO performs well in various examples, including hierarchical models, linear regression, nonlinear regression, logistic regression, and multilevel regression. They also demonstrate that WAIC can be biased in cases with weak priors, and that PSIS-LOO provides a more accurate estimate. The paper concludes that PSIS-LOO is a reliable method for Bayesian model evaluation and that it can be used in routine statistical practice.This paper introduces efficient computations for leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) for Bayesian models. LOO and WAIC estimate out-of-sample predictive accuracy using posterior simulations. LOO computes predictive accuracy by leaving out one data point at a time, while WAIC is an approximation of LOO that uses the posterior distribution of the parameters. Both methods are more accurate than simpler estimates like AIC and DIC but require more computation. To improve the stability of LOO, the authors introduce Pareto-smoothed importance sampling (PSIS), which regularizes importance weights and reduces variance. PSIS-LOO is more robust than LOO in cases with weak priors or influential observations. The authors also provide approximate standard errors for predictive errors and model comparisons. They implement the computations in an R package called 1oo and demonstrate using models fit with Stan. The paper also discusses K-fold cross-validation as an alternative to LOO when PSIS-LOO fails. The authors show that PSIS-LOO performs well in various examples, including hierarchical models, linear regression, nonlinear regression, logistic regression, and multilevel regression. They also demonstrate that WAIC can be biased in cases with weak priors, and that PSIS-LOO provides a more accurate estimate. The paper concludes that PSIS-LOO is a reliable method for Bayesian model evaluation and that it can be used in routine statistical practice.