Forecast Verification
Presenter: Neil Plummer National Climate Centre Lead Author: Scott Power Bureau of Meteorology Research Centre Acknowledgements A. Watkins, D. Jones, P. Reid, NCC
Introduction
Verification - what it is and why it is important? Terminology Potential problems Comparing various measures Assisting users of climate information
What is verification?
“check truth or correctness of”
“process of determining the quality of forecasts” “objective analysis of degree to which a series of forecasts compares and contrasts with the equivalent observations of a given period”
Why bother with verification?
Scientific admin support
o is a new system better? o assist with consensus forecasts
Application of forecasts
o “how good are your forecasts?” o “should I use them?” o can be used to help estimate value
Terminology can be confusing
Verification is made a little tricky by the fact that everyday words are used to describe quantities with a precise statistical meaning. Common words include:
o accuracy o skill o reliability o bias o value o hit rates, percent consistent, false alarm rate, ...
all have special meanings in statistics
Accuracy
Average correspondence between forecasts and observations
Measures
o mean absolute error, root mean square error
Bias
Correspondence between average forecast with average observation
o e.g. average forecast - average value of observation
Skill
Accuracy of forecasts relative to accuracy of forecasts using a reference method (e.g. guessing, persistence, climatology, damped persistence, …)
Measures
o numerous!
Reliability
Degree of correspondence between the average observation, given a particular forecast, and that forecast taken over all forecasts
e.g. suppose forecasts of : “10% or 30% or , …, or 70% or … chance of rain tomorrow” are routinely issued for many years
if we go back through all of the forecasts issued a forecast of looking for occasions when forecast probability of 70% was issued, then we would expect to find rainfall on 70% of occasions if the forecast system is “reliable” this is often not the case
Reliability Graph Reliability Graph
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Reliability
Seasonal Forecast Reference
Probability
Value
Impact that prudent use of a given forecast scheme has on the user’s profits, COMPARED WITH profits made using a reference strategy
Measures
o $, lives saved, disease spread reduced, …
Contingency Table
FORECAST YES NO Miss
OBSERVED
YES
Hit
NO
False Alarm
Correct Rejection
HIT RATE = Hits/(Hits + Misses) FALSE ALARM RATE = False Alarms/(False Alarms + Correct Rejections) PERCENT CONSISTENT = 100*(Hits+Correct Rejections)/Total
Accuracy measures
Hit rates
o Proportion of observed events correctly forecast
False alarm rates
o Proportion of observed non-events forecasted as events
Percent Correct
o 100x (proportion of all forecasts that are correct)
1. Forecast performance 2x2 contingency table
Forecast event Event 28 72 100 Nonevent 23 2680 2703 Total 51 2752 2803
Obs
Nonevent Total
Is this a good scheme?
1. Original Scheme:
Percent correct = 100(28 + 2680)/2803
= 96.6%
so it is a very accurate scheme!
or is it?
2. Performance of 2nd (reference) forecast method: never predict a tornado = a “lazy” forecast scheme!
Forecast event Event
Obs
0 0 0
Nonevent 51 2752 2803
Total 51 2752 2803
Nonevent Total
Performance measures
Percent Correct:
1. Original Scheme:
Percent correct = 100(28 + 2680)/2803 = 96.6%
2. Reference Lazy Scheme:
Percent correct = 100(0 + 2752)/2803 = 98.2% !!
Performance measures
Hit rates:
1 ) 28/51 … so over half the tornadoes predicted 2 ) reference scheme: 0/51 … no tornadoes predicted
Value
Suppose an unexpected (unpredicted) tornado causes $500 million damage and that an expected (predicted) tornado results in only $100 million damage So forecast scheme (1) saves 28 x 400 million compared to forecast scheme (2) a huge saving - highly valuable!!
Categorical versus probabilistic
Categorical
o “The temperature will be 26ºC tomorrow”
Probabilistic
o “There is a 30% chance of rain tomorrow”
o “There is a 90% chance that wet season rainfall will be above median”
Artificial Skill
danger of too many inputs
danger of trying too many inputs
independent data cross-validation
importance of supporting evidence
o simple plausible hypothesis o climate models
o process studies
How do users verify predictions?
No single answer, however:
o some switch from probabilistic to categorical
o media prefer categorical forecasts o assessments made on a single season
o extrapolation
How can we assist users in verification
Increase access to verification information
Simplify information
Build partnerships
o media
o users & user groups
o other government departments
Education (booklets, web, …)
Summary
Verification is crucial but care is needed! Familiarise with terminology used
o skill, accuracy, value, …
No single measure tells the whole story Importance of using independent data in verification Keep it simple Communicating verification results is challenging
o Users sometimes do their own verification - sobering o Most people like to think categorically - challenging o Dialogue with end-users is very important