Docstoc

Forecast Verification

Document Sample
Forecast Verification Powered By Docstoc
					Introduction to Forecast Verification

          Laurence J. Wilson
          MSC Dorval Quebec
                        Outline
• Why verify: Principles, goodness and goals of verification
• General Framework for verification (Murphy-Winkler)
   – Joint and conditional distributions
   – Scores and measures in context of the framework
   – Murphy’s attributes of forecasts
• Value of forecasts
• Model Verification
   – Data matching issues
   – Verification issues
• Verification of probability distributions
• Tricks of the trade
• Conclusion
                                    References
     •   Wilks, D.S., 1995: Statistical methods in the atmospheric sciences. Academic
         Press. Chapter 7.
     •   Stanski, H.R., L.J. Wilson, and W.R. Burrows, 1990: Survey of common
         verification methods in meteorology. WMO WWW Technical Report No. 8.
         Also available on the web – see below.
     •   Jolliffe, I.T. and D.B. Stephenson, 2003: Forecast verification: A practitioner’s
         guide in atmospheric science. Wiley.
     •   Murphy, A.H. and R.L. Winkler, 1987: A general framework for forecast
         verification. Mon. Wea. Rev. 115, 1330-1338.
     •   Murphy, A.H., 1993: What is a good forecast? An essay on the nature of
         goodness in weather forecasting. Wea. Forecasting 8, 281-293.

     Web Page of the Joint WGNE/WWRP Verification Working Group:

http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html
     Has lots of links, and lots of information. (and is changing all the time)
     ANNOUNCEMENT: This group is organizing a workshop on Verification
        methods in or near Montreal, Quebec, September 13 to 17, 2004.
           Evaluation of forecasts

• Murphy’s “goodness”
  – CONSISTENCY: forecasts agree with forecaster’s true
    belief about the future weather [strictly proper]
  – QUALITY: correspondence between observations and
    forecasts [ verification]
  – VALUE: increase or decrease in economic or other
    kind of value to someone as a result of using the
    forecast [decision theory]
        Evaluation of forecast system

• Evaluation of forecast “goodness”
• Evaluation of delivery system
   – timeliness (are forecasts issued in time to be useful?)
   – relevence (are forecasts delivered to intended users in a
     form they can understand and use?)
   – robustness (level of errors or failures in the delivery of
     forecasts)
      Principles of (Objective) Verification
• Verification activity has value only if the information generated leads
  to a decision about the forecast or system being verified
    – User of the information must be identified
    – Purpose of the verification must be known in advance
• No single verification measure provides complete information about
  the quality of a forecast product.
• Forecast must be stated in such a way that it can be verified
    – “chance” of showers
    – What does that gridpoint value really mean?
• Except for specific validation studies, verification should be carried
  out independently of the issuer of the product.
               Goals of Verification
• Administrative
   – Justify cost of provision of weather services
   – Justify additional or new equipment
   – Monitor the quality of forecasts and track changes
• Scientific
   – To identify the strengths and weaknesses of a forecast
     product in sufficient detail that actions can be specified
     that will lead to improvements in the product, ie to
     provide information to direct R&D.
          Verification Model (cont’d)

• Predictand Types
  – Continuous: Forecast is a specific value of the variable
     • wind
     • temperature
     • upper air variables
  – Categorical/probabilistic: Forecast is the probability of
    occurrence of ranges of values of the variable
    (categories)
     •   POP and other weather elements (fog etc)
     •   Precipitation type
     •   cloud amount
     •   precipitation amount
  – Probability distributions (ensembles)
Framework for Verification (Murphy-Winker)
• All verification information contained in the joint distribution of
  forecasts and observations
• Factorizations:
   – calibration-refinement
      p(f,x) = p(x|f) p(f)
      p(x|f) = conditional distribution of observation given forecast
      (calibration/reliability)
      p(f) = marginal distribution of forecasts (refinement)

    – likelihood-base rate
      p(f,x) = p(f|x)p(x)
      p(f|x) = conditional distribution of forecasts given observations
      (likelihood/discrimination)
      p(x) = marginal distribution of observations (base rate/climatology)
              Verification Samples

• Joint distn of forecasts and observations may be:
   – A time series at points
   – One or more spatial fields
   – A combination of these
• In meteorological applications:
   – The events of the sample are not usually even close to
     independent in the statistical sense
   – Importance of carefully assessed confidence limits of
     verification results
Spot Temperature Scatter Plot Exercise
Spot Temperature Scatter Plot Exercise
               Contingency Tables - Basic 2 X 2
                               Forecasts
                         Yes             No
                     HITS           MISSES           Total Events Obs
Observations




               Yes

                     FALSE          CORRECT          Total non-events
                     ALARMS         NEGATIVES        Obs
               No


                     Total Events   Total Non-Events Sample size
                     Fcst           Fcst
Verification -A general model
ATTRIBUTE        DEFINITION                 RELATED MEASURES

1. Bias          Correspondence             bias (mean forecast
                 between mean forecast      probability-sample
                 and mean observation       observed frequency)

2. Association   Strength of linear         covariance, correlation
                 relationship between
                 pairs of forecasts and
                 observations
3. Accuracy      Average                    mean absolute error
                 correspondence             (MAE), mean squared
                 between individual pairs   error (MSE), root mean
                 of observations and        squared error, Brier
                 forecasts                  score (BS)
4. Skill         Accuracy of forecasts      Brier skill score, others
                 relative to accuracy of    in the usual format
                 forecasts produced by a
                 standard method
                     Skill Scores

• Format:
                Skill 
                         Sc  SSc
                        PSc  SSc
• Where Sc=score (MAE, Brier etc)
• PSc= score for a perfect forecast
• SSc= score for a standard (unskilled) forecast
ATTRIBUTE        DEFINITION                    RELATED MEASURES
5. Reliability   Correspondence of             Reliability component of BS,
                 conditional mean              MAE, MSE of binned data from
                 observation and               reliability table.
                 conditioning forecast,
                 averaged over all
                 forecasts
6. Resolution    Difference between            Resolution component of BS
                 conditional mean
                 observation and
                 unconditional mean
                 observation, averaged
                 over all forecasts.
7. Sharpness     Variability of forecasts as   Variance of forecasts
                 described by distribution
                 of forecasts
8.               Difference between            Area under ROC, measures of
Discrimination   conditional mean forecast     separation of conditional
                 and unconditional mean        distributions; MAE,MSE of
                 forecast, averaged over       scatter plot, binned by
                 all observations              observation value
9. Uncertainty   Variability of observations   Variance of observations
                 as described by the
                 distribution of
                 observations
ATTRIBUTE        DEFINITION                    RELATED MEASURES
5. Reliability   Correspondence of             Reliability component of BS,
                 conditional mean              MAE, MSE of binned data from
                 observation and               reliability table.
                 conditioning forecast,
                 averaged over all
                 forecasts
6. Resolution    Difference between            Resolution component of BS
                 conditional mean
                 observation and
                 unconditional mean
                 observation, averaged
                 over all forecasts.
7. Sharpness     Variability of forecasts as   Variance of forecasts
                 described by distribution
                 of forecasts
8.               Difference between            Area under ROC, measures of
Discrimination   conditional mean forecast     separation of conditional
                 and unconditional mean        distributions; MAE,MSE of
                 forecast, averaged over       scatter plot, binned by
                 all observations              observation value
9. Uncertainty   Variability of observations   Variance of observations
                 as described by the
                 distribution of
                 observations
                   ROC - ECMWF Ensemble Forecasts
                                               24 h POP (>1 mm)
                                                                                                              Likelihood diagram - 96 h pcpn
                                 Relative Operating Characteristic
                                                                                                            3000
                                  24 h Precip > 1 mm Europe obs                                             2500
           1                                                                                                2000




                                                                                                    Cases
                                                                                                            1500                                                          pcpn
                                                                                                            1000
                                                                                                                                                                          no pcpn
      0.9                                                                                                    500
                                                                                                               0




                                                                                                                    0.05

                                                                                                                           0.25

                                                                                                                                     0.45

                                                                                                                                               0.65

                                                                                                                                                          0.85
      0.8
                                                                                                                     Forecast Probability


      0.7
                                                                                                             Likelihood Diagram - 144 h pcpn

                                                                                                            3000
      0.6                                                                                                   2500
                                                                                          96 h              2000
Hit Rate




                                                                                                    Cases
                                                                                                            1500                                                          pcpn
      0.5                                                                                 0 skill
                                                                                                            1000
                                                                                                                                                                          no pcpn
                                                                                          144 h              500
                                                                                          240 h                0




                                                                                                                   0.05

                                                                                                                           0.25

                                                                                                                                     0.45

                                                                                                                                                   0.65

                                                                                                                                                            0.85
      0.4

                                                                                                                      Forecast Probability
      0.3

                                                                        Az       Da                           Likelihood Diagram - 240 h pcpn
      0.2                                                        96h 0.839    1.400
                                                                                                            2000
                                                                 144h 0.777   1.077
                                                                 240h 0.709   0.780                         1500
      0.1




                                                                                                    Cases
                                                                                                                                                                           pcpn
                                                                                                            1000
                                                                                                                                                                           no pcpn
                                                                                                             500
           0
               0    0.1   0.2   0.3   0.4     0.5    0.6   0.7       0.8      0.9     1                        0




                                                                                                                    0.05

                                                                                                                              0.25

                                                                                                                                            0.45

                                                                                                                                                          0.65

                                                                                                                                                                   0.85
                                       False Alarm Rate
                                                                                                                           Forecast Probability
Current ROC – Canadian EPS
        Spring 2003
                      Summary

• The choice of verification measure depends on:
   – The purpose of the verification (admin – science)
   – The nature of the predictand
   – The attributes of the forecast to be measured
           The meaning of ‘Value’

• “Weather forecasts possess no intrinsic value in an
  economic sense. They acquire value by
  influencing the behaviour of individuals or
  organizations (“users”) whose activities are
  sensitive to weather.”
   – Allan Murphy, Conference on economic benefits of
     Meteorological and Hydrological services (Geneva,
     1994)



                                                         24
                  Types of “Value”
• Social value - Minimization of Hazards to human
  life and health
   – Value to individual users
• Economic value of forecasts
   –   Value to a specific business
   –   Value to a weather-sensitive industry
   –   Value to a weather-sensitive sector
   –   Value to the economy of a country
   –   Market value (e.g. futures)
• Environmental value
   – minimizing risk to the environment
   – optimal use of resources
                                                    25
               Value vs. Quality

• Quality refers only to forecast verification; Value
  implicates a user
• A perfect forecast may have no value if no one
  cares about it
• An imperfect forecast will have less value than a
  perfect forecast
• See Murphy and Ehrendorfer 1987


                                                        26
                 Measuring value
• The cost-loss decision model
   –   focus on maximizing gain or loss-avoidance
   –   requires objective cost information from user
   –   user specific, difficult to generalize
   –   economic value to weather-sensitive operation only.
   –   easy to evaluate relative value
• Contingent-valuation method
   – focuses on demand for service and “willingness to pay”
   – requires surveys of users to determine variations in demand as
     function of variations in price and/or quality of service
   – less user-specific; a larger crossection of users/industries can be
     evaluated in one study
   – measures in terms of perception rather than actual accuracy.
   – e.g. evaluation of ATADs, Rollins and Shaykewich, Met Apps
     Mar. 03
             Model Verification

• Data matching issues
• Verification issues
  Model Verification – Data Matching Issues
• Typical situation: Model gridded forecasts, observations at
  points
   – Point in situ observations undersample the field, contain
     information on all spatial and most temporal scales.
     (“representativeness error”? Not really)
• Alternatives:
   – Model to data: What does the model predict at the obs point?
       • Interpolation – if the model gives point predictions
            – Gives answers at all verification points
       • Nearest gridpoint value – if the model prediction is a grid box average
            – Verify only those grid boxes where there is at least one obs
            – UPSCALING: - estimate grid box average using all obs in grid
              box.
   – Data to model: Analysing point data:
       • NOT RECOMMENDED because treats networks of point
         observations as if they contain information only on the scales
         represented by the grid on which the analysis is done.
               Model Verification- Issues




• Resolution – scale separation
• Spatial verification – object-oriented
  Model Verification – Scale separation
• Mesoscale verification: Separating model errors
  due to resolution from other errors.
• IF high resolution spatial data is available:
   – Scale separation, wavelets or other method (Mike
     Baldwin)
   – Repeat verification on same dataset at different scales
     to get performance curve
   – Data combination: Use high resolution data to “inform”
     statistical estimates such as grid box averages.
                Spatial Verification
• Object-oriented methods
• The calculation of displacement, size errors for
  specific objects (e.g. rain areas, fronts)
   – Hoffman, 1995; Ebert and McBride 2000 CRA method
      • Decomposition of errors into location, shape, size components
   – Others (Mike Baldwin)
   – Problem always is the matching of the forecast and
     observed object.
 Verification of probability distributions

• Problem:
   – Comparison of distribution with a single outcome
• Solutions:
   – Verify density in vicinity of observation (Wilson, Burrows and
     Lanzinger, 1999)
   – Verify cdf against observation represented as cdf (CRPS, Hersbach
     2000)
   – Extract probabilities from distribution and verify as probability
     forecasts (sample several thresholds)
   – Compare parameters of pooled distribution with sample
     climatology (Talagrand diagram)
Ensemble verification - distribution
                    CRPS example
                      CDF - Forecast-observed

               1

              0.9

              0.8

              0.7
Probability




              0.6

              0.5

              0.4

              0.3

              0.2

              0.1

               0
                                    X

                       Forecast         Observed
Rank Histogram example
                         Tricks of the trade
• “How can I get a better (higher) number?”
    – Remove the bias before calculating scores (works really well for quadratic
      scoring rules) and don’t tell anyone.
    – Claim that the model predicts grid box averages, even if it doesn’t . Make
      the boxes as large as possible.
    – Never use observation data. It only contains a lot of “noise”. As an
      alternative,:
         • Verify against an analysis that uses the model being verified as a trial field.
           Works best in data-sparse areas
         • Use a shorter range forecast from the model being verified and call it
           observation data.
    – Design a new or modified score. Don’t be bothered by restrictions such as
      strictly properness. Then the values can be as high as desired.
    – Stratify the verification data using posteriori rules. One can always get rid
      of pathological cases that bring down the average.
    – When comparing the performance of your model against others, make
      sure it is your analysis that is used as the verifying one.
    – Always insist on doing the verification of your own products….
• Remember, you already know you have a good product. The goal of
  the verification is to show it “objectively”
                  Conclusions

• Quick survey of verification and value assessment
• A data-oriented perspective

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:8/28/2012
language:Latin
pages:38