A Practical Probabilistic Benchmark for AI Weather Models

A Practical Probabilistic Benchmark for AI Weather Models

27 Jan 2024 | Noah D. Brenowitz, Yair Cohen, Jaideep Pathak, Ankur Mahesh, Boris Bonev, Thorsten Kurth, Dale R. Durran, Peter Harrington, Michael S. Pritchard
A practical probabilistic benchmark for AI weather models evaluates the probabilistic skill of AI weather models against an operational baseline using lagged ensembles. This method allows for parameter-free comparison of leading AI models like GraphCast and Pangu against deterministic forecasts. The results show that while GraphCast outperforms Pangu in deterministic metrics, they are tied in probabilistic CRPS. The study highlights that multi-step loss functions, though beneficial for deterministic scores, increase dissipation and reduce probabilistic skill. The spherical Fourier Neural Operator (SFNO) approach is tested, revealing that effective resolution affects ensemble dispersion. The study also shows that lagged ensembles can be used to assess the impact of long-lead time training and effective resolution on forecast skill. The findings suggest that ensemble methods, including lagged ensembles, provide a practical way to evaluate probabilistic skill in AI weather models, offering insights for improving future forecasts. The research emphasizes the importance of probabilistic scoring in AI weather prediction and provides a framework for comparing models using lagged ensembles.A practical probabilistic benchmark for AI weather models evaluates the probabilistic skill of AI weather models against an operational baseline using lagged ensembles. This method allows for parameter-free comparison of leading AI models like GraphCast and Pangu against deterministic forecasts. The results show that while GraphCast outperforms Pangu in deterministic metrics, they are tied in probabilistic CRPS. The study highlights that multi-step loss functions, though beneficial for deterministic scores, increase dissipation and reduce probabilistic skill. The spherical Fourier Neural Operator (SFNO) approach is tested, revealing that effective resolution affects ensemble dispersion. The study also shows that lagged ensembles can be used to assess the impact of long-lead time training and effective resolution on forecast skill. The findings suggest that ensemble methods, including lagged ensembles, provide a practical way to evaluate probabilistic skill in AI weather models, offering insights for improving future forecasts. The research emphasizes the importance of probabilistic scoring in AI weather prediction and provides a framework for comparing models using lagged ensembles.
Reach us at info@study.space
[slides] A Practical Probabilistic Benchmark for AI Weather Models | StudySpace