Ensemble Verification EMC by alicejenny


									Ensemble Verification
Multi-model ensembles
   Review of developments and plans

            By Yuejian Zhu

              August 2005
          Ensemble Verification – current status
•   NCEP global ensemble verification package used since 1995
     –   Comprehensive verification statistics computed against analysis fields
     –   Post most of skill scores on web-site and updated daily/frequently
     –   Inter-comparison with other NWP centers (mainly MSC and ECMWF)
     –   Saved ASCII (text) format statistics for each initial forecasts (or verified analysis)
     –   Many of them use climatology
•   NCEP regional ensemble (SREF) verification package
     – Basic measures computed routinely since 1998
     – Probabilistic measures being developed independently from global ensemble
•   Unification from global and regional
     – Need to unify computation of global – regional ensemble verification measures
     – Running unified code on NCEP daily operation (year 2005)
     – Unified framework must facilitate wide-scale national/international collaboration:
           • North American Ensemble Forecast System (collaboration with Met. Service Canada)
           • THORPEX International Research Program
           • WRF meso-scale ensemble developmental and operational activities
     – Facilitate wider community input in further development/enhancements
           • For example, how to establish basis for collaboration with NCAR, statistical community,
    Ensemble Verification – design specifications
•   Computation based on daily/6-hourly/3-hourly cycle
     – Store the computation scores
•   Compute statistics selected from list of available
     – Point-wise measures, including:
           • RMS errors, PAC for individual ensemble members, mean and medium
           • Measure of reliability (Talagrand distribution and outlier, spread vs. RMS error, reliability
             part of BS, RPSS, etc)
           • Measure of resolution (ROC, IC, resolution part of BS, RPSS, potential EV, etc)
           • Combined measures of reliability and resolution (BSS, RPSS, etc)
     – Multivariate statistics (such as PECA (Perturbation versus Error Correlation
       Analysis) , etc. reference from Wei)
     – Variable and lead time – make all available that are used from ensemble
•   Aggregate statistics as chosen in time, space, etc
     –   Select time periods (seasonal, monthly, etc)
     –   Select spatial domain (pre-designed or user specified area)
     –   Select lead-time (optional)
     –   Select variables
•   Verify against observation/analysis
     – Scripts running verification codes should handle against both O/A issues
     – Use the same subroutine to compute statistics for either one
     – Account for effect of observation/analysis uncertainty? (if possible)
    Ensemble Verification – design specifications
•   Verify for different ensemble sizes
     – Combination of multi-model ensemble
     – Variable ensemble sizes, comparable
•   Define forecast/verification events by either
     – Observed/analyzed climatology, such as
          • 10 percentile thresholds in climate distribution
          • above/below normal
     – User specified thresholds – compute corresponding climate percentiles, such as
          • Precipitation greater than 1 inch per 24 hours
          • Temperature below frozen
          • Wind shear greater than 10m/s
     – Based on ensemble members (like Talagrand stats) – compute climate percentiles
•   Facilitate the use of benchmark
     – Climatology, persistence, extreme or user specified
     – Short, medium, long-rang and climate forecast
     – Operational and research community
•   Prioritize and find balance between
     – Flexibility vs. complexity
     – Operational vs. research use
     – Easy format vs. display
    Ensemble Verification – development and plan
•   Design unified ensemble verification framework
     – Input data handling
          • Use standard WMO formats as data input
                 –   GRIB format for analysis, forecast and climatology
                 –   BUFR format for observation
          • Option to allow non-standardized user/institution specifying
     – Computation of statistics
          •   Establish required software functionalities (scripts)
          •   Build up required verification statistics program/subroutines (source codes)
          •   Jointly develop and share scripts/subroutines with standard input/output
          •   Comparable scientific results from independent investigators
     – Output daily statistics (discussion)
          • Adopt WMO recommended (if any)
          • VSDB format (SREF used)
          • User specified
     – Display of output statistic (option)
          • Develop/adapt display software for interactive interrogation of output statistics
                 –   FVS display system
                 –   FSL approach to WRF verification
                 –   Others
    Ensemble Verification – development and plan
•   Develop and implement new verification framework
     – Utilize existing software and infrastructure where possible
          • Combine current global and regional verification software
          • Use existing climatology, develop new climatological distribution (anomalies)
     – NCEP new ensemble-related verification efforts
          • Direct all of them toward new framework if possible
     – Share NCEP works with Meteorological Service of Canada for
          • Northern American Ensemble Forecast System (NAEFS) project
          • Mainly exchange the subroutines
     – Share NCEP works with interested collaborators
          • Forecast System Lab. (FSL: statistical display tools)
          • Other institutions
     – Make new software available to national/international community
          • THORPEX international research program
          • WRF ensemble verification
          • Coordinate further development with wider community (WMO, other NWP centers)
                 Ensemble Forecasts
1. Why do we need ensemble forecast?
   Look at following schematic diagrams:
                  Ensemble Forecasts (continue)
                      Deterministic forecast
Initial uncertainty


                               Verified analysis

           Prob. Evaluation (cost-loss analysis)
Based on hit rate (HR) and false alarm (FA) analysis
.. Economic Value (EV) of forecasts

                           Ensemble forecast

                                     Average 2-day advantage

            Deterministic forecast
                 Prob. Evaluation (useful tools)
... Small and large uncertainty.
  1 day (large uncertainty) = 4 days (control) = 10-13 days (small uncertainty)
                                      Northern Hemisphere
                                    500hPa geopotential height

      Pattern Anomaly Correlation

                                         Root Mean Square

Simple Measurement
For Ensemble mean
One day advantage

                    Due to model imperfection
   Outlier – from Talagrand distribution

Spread is too small/bias

               Spread is too big
      ROC area

May-July 2002
           Prob. Evaluation (multi-categories)
4. Reliability and possible calibration ( remove bias ):
   For period precipitation evaluation

                       Calibrated forecast
                                                 Skill line

                Raw forecast

                                              Resolution line
                                              Climatological prob.
              Brier Skill Scores and decomposition


ENSEMBLE SIZE (importance issue)






          RMS ERROR


          MEAN BIAS
                                     5-day forecast

                                   CTL is better (dominate)

               8-day forecast

CTL is better (less dominate) 61
                      1-year data

Low resolution is better

       Resolution difference
NCEP ensemble mean performance for past 5-year
                Ranked probabilistic skill scores

NCEP ensemble probabilistic performance for past 5-year

        Economic values for 1:10 cost/loss ratio
Multi-model ensemble (early study)
        Individual model performances

                            • EXPs: NCEP operation
                            • EXPd: DAO/NASA
                            • EXPm: MSC
                            • EXPn: NOGAPS
                            • Top: PAC for 500hPa 5-d,
                              NCEP is best
                            • Bottom left: RMS error
                            • Bottom right: Bias, perfect
                              bias for NOGAPS
Multi-model ensemble (early study)

                 • EXPs: NCEP operation
                 • EXPnd: NCEP+DAO
                 • EXPnm: NCEP+MSC
                 • EXPnn: NCEP+NOGAPS
                 • EXPnp: NCEP ensemble
                   (random one pair, lower
                 • All three multi-model
                   ensembles are better than
                   NCEP’s deterministic fcst
                 • NCEP+MSC is best
Daily comparison of NH 500 hPa height 5-day PAC scores
            Diff. > 0.1 (10%) 18 cases      MSC=0.791

             Diff. >0.1 (10%) 56 cases
      Synoptic example of 500hPa height forecast
              Ini: 2003021200 Valid: 2003021700
                NCEP analysis         NCEP forecast

MSC=.6068       DAO forecast           MSC forecast
Based on re-analysis
monthly climatology
Based on 10 climatologically-
equally-likely bins
       Multi-model ensembles
• NCEP and ECMWF              • NCEP and CMC
  – T12Z cycle only             – T00Z cycle only
  – NCEP 10m ensemble .vs.      – NCEP 10m ensemble .vs.
    NCEP analysis                 NCEP analysis
  – ECMWF 10m ensemble .vs.     – CMC 10m ensemble .vs.
    ECMWF analysis                CMC analysis
  – NCEP(6)+ECMWF(4)            – NCEP(6)+CMC(4)
    ensemble .vs. NCEP            ensemble .vs. NCEP
    analysis                      analysis
  – ECMWF(6)+NCEP(4)            – CMC(6)+NCEP(4)
    ensemble .vs. ECMWF           ensemble .vs. CMC analysis

To top