Interpretation of Rank Histograms for Verifying Ensemble Forecasts

Interpretation of Rank Histograms for Verifying Ensemble Forecasts

23 May 2000 and 9 August 2000 | THOMAS M. HAMILL
The rank histogram is a tool for evaluating ensemble forecasts. It is used to assess the reliability of ensemble forecasts and to diagnose errors in their mean and spread. The histogram is generated by repeatedly tallying the rank of the verification (usually an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, usually taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a U-shaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of model variables may not be flat. Further, if imperfect observations are used for verification, the observational errors should be accounted for, otherwise the shape of the rank histogram may mislead the user about the characteristics of the ensemble. If a statistical hypothesis test is to be performed to determine whether the differences from uniformity of rank are statistically significant, then samples used to populate the rank histogram must be located far enough away from each other in time and space to be considered independent. The rank histogram is a relatively new tool and collective experience with it is limited. Some initial guidance is provided on its suggested use. We also explain some ways in which its uncritical use can lead to an inaccurate understanding of the characteristics of EFs. Section 2 provides a general overview of the rank histogram and its link to other probabilistic verification tools. Section 3 describes some of the common problems in the interpretation of rank histograms. Section 4 describes the manner in which samples should be generated if one is to perform a formal hypothesis test of the uniformity of a rank histogram. Section 5 concludes. The rank histogram permits a quick examination of some qualities of the ensemble. Consistent biases in the ensemble forecast will show up as a sloped rank histogram; a lack of variability in the ensemble will show up as a U-shaped, or concave, population of the ranks. Further, the rank histogram may be useful for more than just evaluating the forecast quality. Hamill and Colucci (1997, 1998) and Eckel and Walters (1999) also show how rank histograms provide information that may be used to recalibrate ensemble forecasts with systematic errors, thus achieving improved probabilistic forecasts. While it is common for operational centers to produce probabilistic forecasts from their ensembles as if the ensembles were random samples from the same distribution as the truth, in fact many operational centers construct their ensembles under different assumptions. For example, the singular vector method used at the European Centre for Medium-Range Weather Forecasts (Molteni et al.The rank histogram is a tool for evaluating ensemble forecasts. It is used to assess the reliability of ensemble forecasts and to diagnose errors in their mean and spread. The histogram is generated by repeatedly tallying the rank of the verification (usually an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, usually taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a U-shaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of model variables may not be flat. Further, if imperfect observations are used for verification, the observational errors should be accounted for, otherwise the shape of the rank histogram may mislead the user about the characteristics of the ensemble. If a statistical hypothesis test is to be performed to determine whether the differences from uniformity of rank are statistically significant, then samples used to populate the rank histogram must be located far enough away from each other in time and space to be considered independent. The rank histogram is a relatively new tool and collective experience with it is limited. Some initial guidance is provided on its suggested use. We also explain some ways in which its uncritical use can lead to an inaccurate understanding of the characteristics of EFs. Section 2 provides a general overview of the rank histogram and its link to other probabilistic verification tools. Section 3 describes some of the common problems in the interpretation of rank histograms. Section 4 describes the manner in which samples should be generated if one is to perform a formal hypothesis test of the uniformity of a rank histogram. Section 5 concludes. The rank histogram permits a quick examination of some qualities of the ensemble. Consistent biases in the ensemble forecast will show up as a sloped rank histogram; a lack of variability in the ensemble will show up as a U-shaped, or concave, population of the ranks. Further, the rank histogram may be useful for more than just evaluating the forecast quality. Hamill and Colucci (1997, 1998) and Eckel and Walters (1999) also show how rank histograms provide information that may be used to recalibrate ensemble forecasts with systematic errors, thus achieving improved probabilistic forecasts. While it is common for operational centers to produce probabilistic forecasts from their ensembles as if the ensembles were random samples from the same distribution as the truth, in fact many operational centers construct their ensembles under different assumptions. For example, the singular vector method used at the European Centre for Medium-Range Weather Forecasts (Molteni et al.
Reach us at info@study.space