Ensemble Forecasting, Forecast Calibration, and Evaluation - PowerPoint

Document Sample
Ensemble Forecasting, Forecast Calibration, and Evaluation - PowerPoint Powered By Docstoc
					Ensemble Forecasting, Forecast
  Calibration, and Evaluation

            Tom Hamill
• Current state-of-the-art in ensemble
  forecasting (EF), and where are we headed?
• Brief review of EF evaluation techniques
• Deficiencies in current approaches for
  estimation/calibration/evaluation of EFs
• New EF calibration techniques.
• Methods for summarizing probabilistic
  information for end users and forecasters
  (time permitting).

What does it take to get a ―perfect‖
  probabilistic forecast from an

• IF you start with an ensemble of initial
  conditions that samples the distribution of
  analysis uncertainty, and
• IF your forecast model is perfect (error growth
  only due to chaos), and
• IF your ensemble is infinite in size,
• THEN probabilistic forecast ―perfect.‖

               … Big IFs                        3
             Examining the IFs:
     (1) Do we sample the distribution of
            analysis uncertainty?
                                 NCEP SREF rank
                             histogram, 39-h forecast

Not enough spread in the forecasts. Consequently, focus in 1st-generation
EF systems is to design initial perturbations that will grow different from
each other quickly, not on sampling analysis-uncertainty distribution. 4
   The Breeding Technique (NCEP)

Breeding takes a pair of forecast
perturbations and periodically
rescales them down and adds
                                     Wang and Bishop showed that
them to the new analysis state.
                                     a larger ensemble formed from
Perturbations only grossly reflect
                                     independent pairs of bred members
analysis errors.
                                     will be comprised of pairs that have 5
                                     almost identical perturbation structures
      Singular Vectors (ECMWF)
Singular vectors (SVs)
here indicate the field of
perturbations that
are expected to grow the
most rapidly in a short-
range forecast.
The SV structure depends
upon a choice of norm.

Here, these leading three
SVs have magnitude only
in one area over the globe.

        Singular vectors, continued
             Case 1 (9 Jan 1993)             Case 2 (8 Feb 1993)



SVs tend to have their initial perturbations in the mid-troposphere, little
amplitude near surface or tropopause. If they’re meant to sample analysis
errors, are analysis errors really near-perfect at these levels?

As SVs evolve, they grow to have amplitudes aloft and near the surface.

Expect SV ensemble forecast spread to be unrepresentative in the early
hours of the forecast (e.g., spread of EFs too small near the surface). 7
  A better way to construct initial conditions?
          Ensemble-based data assimilation

                              and error stats

              First Guess 1                     Analysis 1
              First Guess 2                     Analysis 2

              First Guess N                     Analysis N

An ensemble of forecasts is used to define the error statistics
of the first-guess forecast. An ensemble of analyses are
produced. If designed correctly, they’re sampling the
analysis uncertainty and can be used for initializing EFs and
are lower in error.                                        8
 Why might initial conditions from
 ensemble data assimilation be
        more accurate?

Flow-dependent error statistics for the first guess,
improve the blending of observations and forecasts.

Example: 500 hPa height analyses assimilating only SfcP obs

Reanalysis (3DVar)
(120,000+ obs)

                                           Black dots show
Ensemble Filter                            pressure ob
―EnSRF‖                                    locations
(214 surface
pressure obs)                              RMS = 39.8 m

Interpolation                              RMS = 82.4 m
(214 surface                               (3D-Var worse)
pressure obs)
Perturb the land surface in EFs?

          Examining the IFs:
 (2) Is the forecast model anywhere
           close to perfect?
• Model errors
  –   due to limited resolution (truncation)
  –   due to parameterizations
  –   due to numerical methods choices
  –   etc.
• Manifestations
  – biases, especially near the surface, and in
  – slow growth of forecast differences among
    ensemble members due to coarse grid spacing,
    less scale-interaction.
       Dealing with model errors
(1) Better models (4-km,     (2) Introduce stochastic element into model
  60-h WRF for Katrina)

 (3) Multi-model ensembles
                                    (4) Calibration

            Examining the IFs:
         (3) Can we run a nearly
            infinite ensemble?

Clearly not; CPU availability finite.
   -- large ensemble «—» low resolution
      (small sampling error, larger model error).
   -- small ensemble «—» higher resolution
      (large sampling error, smaller model error).
   -- optimal size/resolution tradeoff may be different
      for different variables (large ensemble for 500 hPa
      anomaly correlation, smaller ensemble for fine-
      scale precipitation events).
Probabilistic Forecast Verification

                                    Relative Economic Value
           Many well-established
           methods now; won’t
           review here. Often
           diagnostics are needed
           to understand specific
           errors in ensemble

A tool for exploring calibration issues:
     CDC’s ―reforecast‖ data set
• Definition: a data set of retrospective numerical forecasts using
  the same model to generate real-time forecasts
• Model: T62L28 NCEP global forecast model, circa 1998
  (http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/refcst for details).
• Initial States: NCEP-NCAR reanalysis plus 7 +/- bred modes
  (Toth and Kalnay 1993).
• Duration: 15 days runs every day at 00Z from 1978/11/01 to now.
• Data: Selected fields (winds, hgt, temp on 5 press levels, precip,
  t2m, u10m, v10m, pwat, prmsl, rh700, heating). NCEP/NCAR
  reanalysis verifying fields included (Web form to download at

Why reforecast? Bias structure can be difficult
  to evaluate with small forecast samples

New calibration technique:
Reforecasting with analogs

Analog calibration results

                       resolution - reliability
               BSS =
                            uncertaint y

 What are appropriate
methods for summarizing
  information for end
 users? Forecasters?

     Box shows 25-75% range
     Whiskers show full range (or
      95% after calibration)
     Central bar shows median

 Confident cold spell

    Methods for summarizing
probabilistic information, continued

           Probability Maps

What isn’t very helpful to the
     end forecaster…
Spaghetti Diagram   Does widely spread
                    spaghetti indicate
                    unpredictability or
                    slack gradient?

                                    6 dm spread



                                    6 dm spread

                     570                  22
       Where EF is headed
• Use of ensembles in data assimilation / better
  methods for initializing EFs
• Sharing of data across countries (TIGGE) to
  do multi-model ensembling
• Higher-resolution ensembles with model-
  error parameterizations
• Improved calibration using reforecasts
• Increased use by sophisticated users (e.g.,
  coupling into hydrologic models)

                 References and notes
•   Page 3: Getting a “perfect” probabilistic forecast from an ensemble

    Ehrendorfer, M., 1994: The Liouville equation and its potential usefulness for the prediction of forecast skill. Part 1:
    Theory. Mon. Wea. Rev., 122, 703-713.

•   Page 4: Do we sample the distribution of analysis uncertainty?

    Rank histograms from NCEP’s SREF web page, http://wwwt.emc.ncep.noaa.gov/mmb/SREF/SREF.html

•   Page 5: The breeding technique

    Toth, Z. and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297-

•   Pages 6-7: Singular vectors.

    Buizza, R., and T. N. Palmer, 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos.
    Sci., 52, 1434-1456.

    Barkmeijer, J. M. Van Gijzen, and F. Bouttier, 1998: Singular vectors and estimates of the analysis-error covariance
    metric. Quart. J. Royal Meteor. Soc., 124, 1697-1713.

•   Pages 8-10: Ensemble-based data assimilation

    Hamill, T. M., 2005: Ensemble-based atmospheric data assimilation. To appear in Predictability of Weather and Climate,
    Cambridge Press, T. N. Palmer and R. Hagedorn, eds. Available at
    http://www.cdc.noaa.gov/people/tom.hamill/efda_review5.pdf .

    Whitaker, J. S., G. P. Compo, X. Wei, and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data
    assimilation. Mon. Wea. Rev., 132, 1190-1200.

•   Page 11: Perturbing the land surface

    Sutton, C., T. M. Hamill, and T. T. Warner, 2005: Will perturbing soil moisture improve warm-season ensemble forecasts? A
    proof of concept. Mon. Wea. Rev., in review. Available from http://www.cdc.noaa.gov/people/tom.hamill/land_sfc_perts.pdf .

•   Pages 12-13 Model errors

    WRF forecast from http://wrf-model.org/plots/realtime_hurricane.php (try 2005-08-27, and rain mixing ratio, 60-h forecast)

    Stochastic element picture from Judith Berner’s presentation at the ECMWF workshop on the representation of sub-grid
    processes using stochastic-dynamic models. http://www.ecmwf.int/newsevents/meetings/workshops/2005/Sub-
    grid_Processes/Presentations.html .

    Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in
    seasonal forecasting – I. Basic concept. Tellus, 57A, 219-233. (multi-model ensemble picture)

    Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2005: Reforecasts, an important data set for improving weather predictions,
    Bull. Amer. Meteor. Soc., in press. Available at http://www.cdc.noaa.gov/people/tom.hamill/reforecast_bams4.pdf

•   Page 14: Can we run a nearly infinite ensemble?

    Mullen, S. L., and R. Buizza, 2002: The impact of horizontal resolution and ensemble size on probabilistic forecasts of
    precipitation by the ECMWF ensemble prediction system. Wea. Forecasting, 17, 173-191.

•   Page 15, probabilistic forecast verification

    Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550-

    Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Cambridge Press, 467 pp (section 7.4).

    Mason, I. B., 1982: A model for the assessment of weather forecasts. Aust. Meteor. Mag., 30, 291-303. (for the
    relative operating characteristic).

    Richardson, D. S. , 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J.
    Royal Meteor. Soc., 126, 649-667.

•   Pages 16-19: Calibration using reforecasts.

    Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2005: Reforecasts, an important data set for improving weather
    predictions, Bull. Amer. Meteor. Soc., in press. Available at

•   Page 20: Summarizing probabilistic information for end users:

    From Ken Mylne, UK Met Office; also ECMWF newsletter 92, available from www.ecmwf.int .

•   Page 21: Summarizing continued, probability maps

    From www.cdc.noaa.gov/reforecast/narr

•   Page 22: What isn’t very helpful (spaghetti diagrams)

    From http://www.cdc.noaa.gov/map/images/ens/ens.html .