Embed
Email

www.nws.noaa.govohrfcdevdocsHHartmann_part1_da...

Document Sample

Shared by: dfhdhdhdhjr
Categories
Tags
Stats
views:
0
posted:
2/2/2012
language:
pages:
37
Verification Introduction



Holly C. Hartmann

Department of Hydrology and Water Resources

University of Arizona

hollyoregon@juno.com





RFC Verification Workshop, 08/14/2007

1

Goals





• General concepts of verification

• Think about how to apply to your operations

• Be able to respond to and influence NWS verification

program

• Be prepared as new tools become available

• Be able to do some of their own verification

• Be able to work with researchers on verification

projects

• Contribute to development of verification tools (e.g.,

look at various options)

• Avoid some typical mistakes





2

Agenda



1. Introduction to Verification

- Applications, Rationale, Basic Concepts

- Data Visualization and Exploration

- Deterministic Scalar measures

2. Categorical measures – KEVIN WERNER

- Deterministic Forecasts

- Ensemble Forecasts

3. Diagnostic Verification

- Reliability

- Discrimination

- Conditioning/Structuring Analyses

4. Lab Session/Group Exercise

- Developing Verification Strategies

- Connecting to Forecast Operations and Users



3

Why Do Verification? It depends…





Administrative: logistics, selected quantitative criteria

Operations: inputs, model states, outputs, quick!

Research: sources of error, targeting research

Users: making decisions, exploit skill, avoid mistakes







Concerns about verification?









4

Need for Verification Measures





Verification statistics identify

- accuracy of forecasts

- sources of skill in forecasts

- sources of uncertainty in forecasts

- conditions where and when forecasts are skillful

or not skillful, and why





Verification statistics then can inform

- improvements in terms of forecast skill and

decision making with alternate forecast sources

(e.g., climatology, persistence, new forecast

systems)







Adapted from: Regonda, Demargne, and Seo, 2006 5

Skill versus Value





Assess quality of forecast system

i.e. determine skill and value of forecast







A forecast has skill if it predicts A forecast has value if it

the observed conditions well helps the user to make better

according to some objective or decisions than without

subjective criteria. knowledge of the forecast.



• Forecasts with poor skill can be valuable (e.g. extreme event

forecasted in wrong place)

• Forecasts with high skill can be of little value (e.g. blue sky desert)

Credit: Hagedorn (2006) and Julie Demargne 6

Stakeholder Use of HydroClimate Info & Forecasts



Common across all groups

Uninformed, mistaken about forecast interpretation

Use of forecasts limited by lack of demonstrated forecast skill

Have difficulty specifying required accuracy



Common across many, but not all, stakeholders

Have difficulty distinguishing between “good” & “bad” products

Have difficulty placing forecasts in historical context



Unique among stakeholders

Relevant forecast variables, regions (location & scale), seasons, lead

times, performance characteristics

Technical sophistication: base probabilities, distributions, math

Role of of forecasts in decision making

7

What is a Perfect Forecast?





Forecast evaluation concepts

All happy families are alike;

each unhappy family

is unhappy in its own way.

-- Leo Tolstoy (1876)





All perfect forecasts are alike;

each imperfect forecast

is imperfect in its own way.

-- Holly Hartmann (2002)









8

Different Forecasts, Information, Evaluation



Deterministic

“Today’s high will be 76 degrees,

and it will be partly cloudy, Categorical

with a 30% chance of rain.”

Probabilistic









9

Different Forecasts, Information, Evaluation



Deterministic

“Today’s high will be 76 degrees,

and it will be partly cloudy, Categorical

with a 30% chance of rain.”

Probabilistic

Deterministic Categorical Probabilistic



76°

30%

No

rain Rain









How would you evaluate each of these?

10

Different Forecasts, Information, Evaluation



Deterministic

“Today’s high will be 76 degrees,

and it will be partly cloudy, Categorical

with a 30% chance of rain.”

Probabilistic









Standard hydrograph



Deterministic

11

ESP Forecasts: User preferences influence verification









From: California-Nevada River Forecast Center

12

ESP Forecasts: User preferences influence verification









From: California-Nevada River Forecast Center

13

ESP Forecasts: User preferences influence verification









From: California-Nevada River Forecast Center

14

ESP Forecasts: User preferences influence verification









From: California-Nevada River Forecast Center

15

So Many Evaluation Criteria!





Deterministic Categorical Probabilistic

Hit Rate

Bias Brier Score

Surprise rate

Correlation Threat Score Ranked

RMSE Gerrity Score Probability Score

Success Ratio Distributions-

• Standardized Post-agreement

RMSE oriented Measures

Percent Correct

• Nash-Sutcliffe Pierce Skill Score • Reliability

Gilbert Skill Score • Discrimination

Linear Error in

Heidke Skill Score

Probability Space • Sharpness

Critical Success index

Percent N-class errors

Modified Heidke Skill Score

Hannsen and Kuipers Score

Gandin and Murphy Skill Scores…

16

RFC Verification System: Metrics



CATEGORIES DETERMINISTIC FORECAST PROBABILISTIC FORECAST

VERIFICATION METRICS VERIFICATION METRICS



1. Categorical Probability Of Detection (POD), Brier Score (BS),

(predefined threshold, range False Alarm Ratio (FAR), Rank Probability Score (RPS)

of values) Probability of False Detection (POFD)

Lead Time of Detection (LTD),

Critical Success Index (CSI), Pierce Skill Score

(PSS), Gerrity Score (GS)



2. Error Root Mean Square Error (RMSE), Continuous RPS

(accuracy) Mean Absolute Error (MAE),

Mean Error (ME), Bias (%),

Linear Error in Probability Space (LEPS)





3. Correlation Pearson Correlation Coefficient, Ranked

correlation coefficient, scatter plots







4. Distribution Properties Mean, variance, higher moments for Wilcoxon rank sum test, variance of

observation and forecasts forecasts, variance of observations,

ensemble spread, Talagrand Diagram (or

Rank Histogram)







Source: Verification Group, courtesy J. Demargne 17

RFC Verification System: Metrics



CATEGORIES DETERMINISTIC FORECAST PROBABILISTIC FORECAST

VERIFICATION METRICS VERIFICATION METRICS





5. Skill Scores Root Mean Squared Error Skill Score (SS- Rank Probability Skill Score,

RMSE) (with reference to persistence, Brier Skill Score (with reference to

(relative accuracy over

climatology, lagged persistence), persistence, climatology, lagged

reference forecast)

Wilson Score (WS), persistence)

Linear Error in Probability Space Skill Score

(SS-LEPS)









6. Conditional Statistics Relative Operating Characteristic (ROC), ROC and ROC Area,

(based on occurrence of reliability measures, other resolution measures,

specific events) discrimination diagram, reliability diagram,

other discrimination measures discrimination diagram,

other discrimination measures





7. Confidence Sample size, Ensemble size, sample size,

(metric uncertainty) Confidence Interval (CI) Confidence Interval (CI)









Source: Verification Group, courtesy J. Demargne 18

Possible Performance Criteria



Accuracy - overall correspondence between forecasts and observations

Bias - difference between average forecast and average observation

Consistency - forecasts don’t waffle around









Good consistency









19

Possible Performance Criteria



Accuracy - overall correspondence between forecasts and observations

Bias - difference between average forecast and average observation

Consistency - forecasts don’t waffle around

Sharpness/Refinement – ability to make bullish forecast statements







Not Sharp









Sharp





20

What makes a forecast “good”?



Forecasts should agree with observations, with few Accuracy

large errors





Forecast mean should agree with observed mean Bias





Linear relationship between forecasts and Association

observations





Forecast should be more accurate than low-skilled Skill

reference forecasts (e.g., random chance, persistence, or

climatology)







Adapted from : Ebert (2003)

21

What makes a forecast “good”?





Binned forecast values should agree with binned Reliability

observations (agreement between categories)





Forecast can discriminate between events & non- Resolution

events





Forecast can predict with strong probabilities (i.e., Sharpness

100% for event, 0% for non-event)







Forecast represents the associated uncertainty Spread (Variability)







Adapted from : Ebert (2003)

22

Forecasting Tradeoffs



Forecast performance is multi-faceted



False Alarms Surprises

warning without event event without warning





No fire





“False Alarm Ratio” “Probability of Detection”

A forecaster’s fundamental challenge

is balancing these two.

Which is more important?

Depends on the specific decision context…

23

How Good? Compared to What?



SForecast – SBaseline SForecast

Skill Score = =1-

SPerfect – SBaseline SBaseline









Skill Score: (0.50 – 0.54)/(1.00-0.54) = -8.6%

~worse than guessing~

What is the appropriate Baseline?

24

Graphical

Forecast Evaluation









25

Basic Data

Display





Historical

seasonal water

supply outlooks

Colorado River

Basin









Morrill, Hartmann, and

Bales, 2007

26

Scatter plots





Historical

seasonal water

supply outlooks

Colorado River

Basin









Morrill, Hartmann, and

Bales, 2007

27

Histograms







Historical

seasonal water

supply outlooks

Colorado River

Basin









Morrill, Hartmann, and

Bales, 2007

28

IVP Scatterplot Example









Source: H. Herr 29

Cumulative Distribution Function (CDF): IVP



Cat 1 = No Observed

Precipitation

Cat 2 = Observed

Precipitation

(>0.001”)



Empirical distribution

of forecast

probabilities for

different

observations

categories



Goal: Widely

separated CDFs





Source: H. Herr, IVP Charting Examples, 2007 30

Probability Density Function (PDF): IVP



Cat 1 = No Observed

Precipitation

Cat 2 = Observed

Precipitation

(>0.001”)



Empirical distribution

for 10 bins for IVP

GUI









Goal: Widely

separated PDFs





Source: H. Herr, IVP Charting Examples, 2007 31

“Box-plots”: Quantiles and Extremes









Based on

summarizing

CDF

computation

and plot



Goal: Widely

separated box-plots



Cat 1 = No Observed Precipitation Source: H. Herr, IVP Charting Examples,

2007

Cat 2 = Observed Precipitation (>0.001”)

32

Scalar

Forecast Evaluation









33

Standard Scalar Measures



Bias

Mean forecast = Mean observed









Forecast

Correlation Coefficient

Variance shared between forecast and observed (r2)

Says nothing about bias or whether

forecast variance = observed variance

Pearson correlation coefficient: assumes normal

Observed

distribution, can be + or – (Rank r: only +, non-normal ok)







Root Mean Squared Error

Distance between forecast/observation values

Better than correlation, poor when error is

heteroscedastic fcst

Emphasizes performance for high flows obs

Alternative: Mean Absolute Error (MAE)



34

Standard Scalar Measures (with Scatterplot)



1943-99 April 1 Forecasts for 1954-97 January 1 Forecasts for

Apr-Sept Streamflow at Jan-May Streamflow at

Stehekin R at Stehekin, WA Verde R blw Tangle Crk, AZ

Bias = 22 Bias = -87.5

Forecast (1000’s ac-ft)









Corr = 0.92 Corr = 0.58

RMSE = 74.4 RMSE = 228.3









Observed (1000’s ac-ft) Observed (1000’s ac-ft)





35

IVP: Deterministic Scalar Measures





ME: smallest;

+ and – errors

cancel

MAE vs.

RMSE: RMSE

influenced by

large errors for

large events

MAXERR:

largest

Sample Size:

small samples

have large

uncertainty







Source: H. Herr, IVP Charting Examples, 2007 36

IVP: RMSE – Skill Scores









Skill compared to

Persistence Forecast









Source: H. Herr, IVP Charting Examples, 2007 37



Other docs by dfhdhdhdhjr
Bild 1
Views: 0  |  Downloads: 0
BHIVA Feedback 09 - UK-CAB
Views: 0  |  Downloads: 0
hobbycentral.com
Views: 0  |  Downloads: 0
Profile of Research
Views: 0  |  Downloads: 0
Power Teaching
Views: 0  |  Downloads: 0
Donate
Views: 0  |  Downloads: 0
From Cells to Organisms
Views: 0  |  Downloads: 0
ETM5221MSIS5600 Virtual Teams Defined
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!