Ensemble Forecasting, Forecast
Calibration, and Evaluation
• Current state-of-the-art in ensemble
forecasting (EF), and where are we headed?
• Brief review of EF evaluation techniques
• Deficiencies in current approaches for
estimation/calibration/evaluation of EFs
• New EF calibration techniques.
• Methods for summarizing probabilistic
information for end users and forecasters
What does it take to get a ―perfect‖
probabilistic forecast from an
• IF you start with an ensemble of initial
conditions that samples the distribution of
analysis uncertainty, and
• IF your forecast model is perfect (error growth
only due to chaos), and
• IF your ensemble is infinite in size,
• THEN probabilistic forecast ―perfect.‖
… Big IFs 3
Examining the IFs:
(1) Do we sample the distribution of
NCEP SREF rank
histogram, 39-h forecast
Not enough spread in the forecasts. Consequently, focus in 1st-generation
EF systems is to design initial perturbations that will grow different from
each other quickly, not on sampling analysis-uncertainty distribution. 4
The Breeding Technique (NCEP)
Breeding takes a pair of forecast
perturbations and periodically
rescales them down and adds
Wang and Bishop showed that
them to the new analysis state.
a larger ensemble formed from
Perturbations only grossly reflect
independent pairs of bred members
will be comprised of pairs that have 5
almost identical perturbation structures
Singular Vectors (ECMWF)
Singular vectors (SVs)
here indicate the field of
are expected to grow the
most rapidly in a short-
The SV structure depends
upon a choice of norm.
Here, these leading three
SVs have magnitude only
in one area over the globe.
Singular vectors, continued
Case 1 (9 Jan 1993) Case 2 (8 Feb 1993)
SVs tend to have their initial perturbations in the mid-troposphere, little
amplitude near surface or tropopause. If they’re meant to sample analysis
errors, are analysis errors really near-perfect at these levels?
As SVs evolve, they grow to have amplitudes aloft and near the surface.
Expect SV ensemble forecast spread to be unrepresentative in the early
hours of the forecast (e.g., spread of EFs too small near the surface). 7
A better way to construct initial conditions?
Ensemble-based data assimilation
and error stats
First Guess 1 Analysis 1
First Guess 2 Analysis 2
First Guess N Analysis N
An ensemble of forecasts is used to define the error statistics
of the first-guess forecast. An ensemble of analyses are
produced. If designed correctly, they’re sampling the
analysis uncertainty and can be used for initializing EFs and
are lower in error. 8
Why might initial conditions from
ensemble data assimilation be
Flow-dependent error statistics for the first guess,
improve the blending of observations and forecasts.
Example: 500 hPa height analyses assimilating only SfcP obs
Black dots show
Ensemble Filter pressure ob
pressure obs) RMS = 39.8 m
Interpolation RMS = 82.4 m
(214 surface (3D-Var worse)
Perturb the land surface in EFs?
Examining the IFs:
(2) Is the forecast model anywhere
close to perfect?
• Model errors
– due to limited resolution (truncation)
– due to parameterizations
– due to numerical methods choices
– biases, especially near the surface, and in
– slow growth of forecast differences among
ensemble members due to coarse grid spacing,
Dealing with model errors
(1) Better models (4-km, (2) Introduce stochastic element into model
60-h WRF for Katrina)
(3) Multi-model ensembles
Examining the IFs:
(3) Can we run a nearly
Clearly not; CPU availability finite.
-- large ensemble «—» low resolution
(small sampling error, larger model error).
-- small ensemble «—» higher resolution
(large sampling error, smaller model error).
-- optimal size/resolution tradeoff may be different
for different variables (large ensemble for 500 hPa
anomaly correlation, smaller ensemble for fine-
scale precipitation events).
Probabilistic Forecast Verification
Relative Economic Value
methods now; won’t
review here. Often
diagnostics are needed
to understand specific
errors in ensemble
A tool for exploring calibration issues:
CDC’s ―reforecast‖ data set
• Definition: a data set of retrospective numerical forecasts using
the same model to generate real-time forecasts
• Model: T62L28 NCEP global forecast model, circa 1998
(http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/refcst for details).
• Initial States: NCEP-NCAR reanalysis plus 7 +/- bred modes
(Toth and Kalnay 1993).
• Duration: 15 days runs every day at 00Z from 1978/11/01 to now.
• Data: Selected fields (winds, hgt, temp on 5 press levels, precip,
t2m, u10m, v10m, pwat, prmsl, rh700, heating). NCEP/NCAR
reanalysis verifying fields included (Web form to download at
Why reforecast? Bias structure can be difficult
to evaluate with small forecast samples
New calibration technique:
Reforecasting with analogs
Analog calibration results
resolution - reliability
What are appropriate
methods for summarizing
information for end
Box shows 25-75% range
Whiskers show full range (or
95% after calibration)
Central bar shows median
Confident cold spell
Methods for summarizing
probabilistic information, continued
What isn’t very helpful to the
Spaghetti Diagram Does widely spread
6 dm spread
6 dm spread
Where EF is headed
• Use of ensembles in data assimilation / better
methods for initializing EFs
• Sharing of data across countries (TIGGE) to
do multi-model ensembling
• Higher-resolution ensembles with model-
• Improved calibration using reforecasts
• Increased use by sophisticated users (e.g.,
coupling into hydrologic models)
References and notes
• Page 3: Getting a “perfect” probabilistic forecast from an ensemble
Ehrendorfer, M., 1994: The Liouville equation and its potential usefulness for the prediction of forecast skill. Part 1:
Theory. Mon. Wea. Rev., 122, 703-713.
• Page 4: Do we sample the distribution of analysis uncertainty?
Rank histograms from NCEP’s SREF web page, http://wwwt.emc.ncep.noaa.gov/mmb/SREF/SREF.html
• Page 5: The breeding technique
Toth, Z. and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297-
• Pages 6-7: Singular vectors.
Buizza, R., and T. N. Palmer, 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos.
Sci., 52, 1434-1456.
Barkmeijer, J. M. Van Gijzen, and F. Bouttier, 1998: Singular vectors and estimates of the analysis-error covariance
metric. Quart. J. Royal Meteor. Soc., 124, 1697-1713.
• Pages 8-10: Ensemble-based data assimilation
Hamill, T. M., 2005: Ensemble-based atmospheric data assimilation. To appear in Predictability of Weather and Climate,
Cambridge Press, T. N. Palmer and R. Hagedorn, eds. Available at
Whitaker, J. S., G. P. Compo, X. Wei, and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data
assimilation. Mon. Wea. Rev., 132, 1190-1200.
• Page 11: Perturbing the land surface
Sutton, C., T. M. Hamill, and T. T. Warner, 2005: Will perturbing soil moisture improve warm-season ensemble forecasts? A
proof of concept. Mon. Wea. Rev., in review. Available from http://www.cdc.noaa.gov/people/tom.hamill/land_sfc_perts.pdf .
• Pages 12-13 Model errors
WRF forecast from http://wrf-model.org/plots/realtime_hurricane.php (try 2005-08-27, and rain mixing ratio, 60-h forecast)
Stochastic element picture from Judith Berner’s presentation at the ECMWF workshop on the representation of sub-grid
processes using stochastic-dynamic models. http://www.ecmwf.int/newsevents/meetings/workshops/2005/Sub-
Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in
seasonal forecasting – I. Basic concept. Tellus, 57A, 219-233. (multi-model ensemble picture)
Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2005: Reforecasts, an important data set for improving weather predictions,
Bull. Amer. Meteor. Soc., in press. Available at http://www.cdc.noaa.gov/people/tom.hamill/reforecast_bams4.pdf
• Page 14: Can we run a nearly infinite ensemble?
Mullen, S. L., and R. Buizza, 2002: The impact of horizontal resolution and ensemble size on probabilistic forecasts of
precipitation by the ECMWF ensemble prediction system. Wea. Forecasting, 17, 173-191.
• Page 15, probabilistic forecast verification
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550-
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Cambridge Press, 467 pp (section 7.4).
Mason, I. B., 1982: A model for the assessment of weather forecasts. Aust. Meteor. Mag., 30, 291-303. (for the
relative operating characteristic).
Richardson, D. S. , 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J.
Royal Meteor. Soc., 126, 649-667.
• Pages 16-19: Calibration using reforecasts.
Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2005: Reforecasts, an important data set for improving weather
predictions, Bull. Amer. Meteor. Soc., in press. Available at
• Page 20: Summarizing probabilistic information for end users:
From Ken Mylne, UK Met Office; also ECMWF newsletter 92, available from www.ecmwf.int .
• Page 21: Summarizing continued, probability maps
• Page 22: What isn’t very helpful (spaghetti diagrams)
From http://www.cdc.noaa.gov/map/images/ens/ens.html .