Z - TU Delft CiTG

Document Sample
Z - TU Delft CiTG Powered By Docstoc
					 Review of statistics and frequency analysis
         Academic year 2009 - 2010
Associate Professor: Dr. P.H.A.J.M. van Gelder
        TU Delft, Faculty of Civil Engineering and Geosciences
                    UNESCO-IHE, Guest Lecturer


  Year 2009 Lectures on:
  October 27th 8.45 – 10.30h
  October 28th 15.45 – 17.30h
  November 5th 15.45 – 17.30h
  November 6th 8.45 – 10.30h


  A selection of these slides will be
  presented during the course



  contact for students
  Dr.ir. P.H.A.J.M. van Gelder room: 3.87 ext: 86544
Overall outline of the course
Review of statistics and frequency
 analysis

Data analysis, random variables,
 classification, stat. moments, frequency
 distributions; samples, populations and
 probability models; parameter
 estimation and confidence intervals.
  Introduction to this Course
1 + 1 lecture periods (basic statistics), 1 + 1
  lecture periods on frequency analysis and a
  written exam
Dialog instead of monolog (ask questions!)
Complete power point presentations can be
  found on Van Gelder’s website
Brief self-introduction:
  – Background (name, country, university education etc.)
  – Interests, experiences with statistics?
  – What do you hope to get out of this part of the course?
  – Who has experienced hydrological extremes?
Outline for today

• Introduction on natural hazards and
  probabilistic design
• Data sets for river– and coastal engineers
• Theory on:
  – Probability and events
  – Random (or stochastic) variables
  – Transformations
  – Multivariate distributions
Number of Floods worldwide




  Source: Dartmouth Flood Observatory, 2003
Regional Distribution of Large Floods



                                        1999-2002




      1985-1988


Source: Dartmouth Flood
Observatory, 2003
Reasons for Concern




                      Source: Smith et al 2001, TAR IPCC WG II
Does Global Warming Lead to an Intensification of the
Hydrological Cycle and More Extremes Events?
More flood-producing rain (but regional differences)
Longer rainless periods and higher evaporation demand
  of the atmosphere
There are indications for:
    Changing regimes, flood timing
    More frequent floods, land slides etc.
    No reduction of droughts, but decreased summer low flows
     e.g. in western Europe
          Effects of changing Mean and Variance of
                Precipitation on Stream Flow
               (i.e. non-linear processes, thresholds, etc.)




(from Middelkoop 2005, after Arnell 1996)
Frequency analyses can be done for …
High flows
    Flood peak discharge
    Flood volume
    Occurrence of high flows in certain periods/months
Low flows
    Minimum low flow discharge
    Runoff deficits
Groundwater levels, groundwater fluctuations etc.
Estimated cost of damage caused by natural hazards
Simulated variables (model outputs)
Daily rainfall, rainfall intensities etc.
….

      Let’s start with examples of flood peak discharges
How do we measure floods?




                    (Aus: Hornberger et al., 1998)
Rating curve




               (Aus: Hornberger et al., 1998)
Application of rating curve to measured
        water levels at a gauge




                              (Aus: Hornberger et al., 1998)
…. that is not always that easy!




               (The big flood in the HJ Andrews 1996)
Continuous measurement of discharges
       (incl. floods and low flows)




                               (Aus: Hornberger et al., 1998)
Floods in 2002; river Elbe,
city of Dresden, Germany
Flood defence structures


Storm Surge Barrier Oosterschelde (NL)
Maeslantkering - storm surge barrier (NL)
Variability in annual precipitation (NL)
                                             Precipitation in the Netherlands
                            1100



                            1000



                             900
       Precipitation [mm]




                             800



                             700



                             600



                             500
                               1940   1950   1960         1970         1980     1990   2000
                                                          Year
Variability in daily river
discharges (Meuse river, NL)
                             Daily Discharges of the River Maas at Lith (+ Mean, Minimum and Maximum per year))
                             2500




                             2000
    River Discharge [m3/s]




                             1500




                             1000




                              500




                                0
                                    0         1000          2000            3000         4000          5000
                                                                   Day Nr
Example: Evaluating of hydrological extremes
(flood runoff); Case in Austria 2005
         600

                     Min (71-02)
                     Max(71-02)                                                2005: max. ca. 515 m3/s
                     Mittel (71-02)
         500         2005




         400
[m3/s]




         300
                                                  1999: max. ca. 230 m3/s




         200




         100




          0
          1.Jän   1.Feb      1.Mär    1.Apr   1.Mai      1.Jun         1.Jul   1.Aug       1.Sep         1.Okt   1.Nov   1.Dez
Example: Evaluating of floods in a regional
context (Case study: Austria 2005)
Ongoing EU projects on Flood Risks
 EFFS: A European Flood Forecasting System
   – to develop a prototype of a European flood
     forecasting system for 4-10 days in advance,
     which could provide daily information on potential
     floods for large rivers, and flash floods in small
     basins.

 SPHERE: Systematic, Palaeoflood and Historical data
   for the improvement of flood Risk Estimation
    – to develop a new approach which complements
       hydrologic modelling and the application of
       historical and paleoflood hydrology to increase the
       temporal framework of the largest floods over time
       spans from decades to millennia; in order to
       improve extreme flood occurrences.
THARMIT: Torrent Hazard Control in the European
  Alps
   – to develop practical tools and methodologies for
     hazard assessment, prevention and mitigation,
     and to devise methods for saving and monitoring
     potentially dangerous areas.

CARPE DIEM: Critical assessment of Available Radar
  Precipitation Estimation techniques and Development
  of Innovative approaches for Environmental
  Management.
   – to improve real-time estimation of radar rainfall
      fields for flood forecasting, by coupling multi-
      parameter polarisation radar data and NWP, and
      exploiting NWP results in order to improve the
      interpretation of radar observations.
IMPACT: Investigation of Extreme Flood Processes and
  Uncertainty
   – to investigate extreme flood and defense failure
     processes, their risk and uncertainty. Will consider
     dam breach formation, sediment movement, flood
     propagation and predictive models, within an
     overall framework of flood risk management.

GLACIORISK: Survey and Prevention of Extreme
  Glaciological Hazards in European Mountainous
  Regions
   – to develop scientific studies for detection, survey
     and prevention of glacial disasters in order to
     save lives and reduce damages.
SAFERELNET: Risk assessment of natural hazards in
  Europe

MITCH: Mitigation of Climate Induced Hazards
   – dealing with the mitigation of natural hazards with
     a meteorological cause, in order to assist planning
     and management. The main focus will be on flood
     forecasting and warning, but it will also include
     other flood related hazards, such as landslips and
     debris flow, and longer term climate hazards, such
     as drought, and the possible impact of climate
     change on the frequency and magnitude hazards

ADC-RBM: Advanced Study Course in River Basin
  Modelling for Flood Risk Mitigation - June 2002
FLOODMAN: Near real-time flood forecasting, warning
  and management system based on satellite radar
  images, hydrological and hydraulic models and in-situ
  data
   – near real-time monitoring of flood extent using
     spaceborne SAR, optical data & in-situ
     measurements, hydrological and hydraulic model
     data. The result will be an expert decision system
     for monitoring, management and forecast of floods
     in selected areas in Europe. The monitoring will
     also be used to update the hydrological/hydraulic
     models and thereby improving the quality of flood
     forecasts.
FLOODSITE: The FLOODsite project covers the
  physical, environmental, ecological and socio-
  economic aspects of floods from rivers, estuaries
  and the sea. The project is arranged into seven
  themes covering:

Risk analysis – hazard sources, pathways and
   vulnerability of receptors.
Risk management – pre-flood measures and flood
   emergency management.
Technological integration – decision support and
   uncertainty.
Pilot applications – for river, estuary and coastal sites.
Training and knowledge uptake – guidance for
   professionals, public information and educational
   material.
Networking, review and assessment.
Co-ordination and management.
Extreme Events;
Two very realistic simulations
1. River dike failure in the Netherlands
2. Asteroid impact in the Atlantic Ocean
The role Statistics                            Properties of
in Water Engineering                           the hydrological
                                               system




Data analysis and statistics
 (later also modeling etc.)




        Statistics of hydrological variables
                        for decision support
                           We need Information about …
                          Water balance: P = R + ET + dS/dt
                          Variability and heterogeneity of hydrological
Statistical Analysis !!


                            variables (groundwater levels, precip. patterns etc.)
                          Hydrological extremes:

                                                                          x-year
                                                                          flood
                                      droughts                   floods
                          Scenarios for:
                             Land use change
                             Climate chance
                             Different water management strategies
                          ETC.
4 types of error occur when measuring a
hydrological variable
     1. Operation and function errors: malfunction of measuring
        instrument, personal (human) error.
     2. Random error: caused by numerous minor impacts partly
        independent from each other. If repeated frequently, the
        values fluctuate around the true value.
     3. Constant systematic error: inherent in any kind of
        equipment (e.g., wrong installation of instruments, wrongly
        indicated zero point, incorrect rating curve etc.); constant
        in respect of time, but may vary according to the
        measuring range.
     4. Variable systematic error: usually caused by insufficient
        control during the measuring period; mostly the origin in
        the instrument (e.g., drift” of the device, growing of plants
        at the location of measurement etc.). Can be avoided
        through continuous comparison of the measurement and
        repeated calibration of the instruments.
   Systematic errors can not be reduced by increasing the
    number of measurements, if equipment and measuring
    conditions remain the same!
Structural design principles
Old methods:
  – determine a worst case load
  – determine a worst case strength
  – determine the geometry of the structure
Disadvantages of old method
Unknown how safe the structure is
No insight in contribution of different individual
 failure mechanisms
No insight in importance of different input
 parameters
Uncertainties in variables cannot be taken into
 account
Uncertainties in the physical models cannot be
 taken into account
Failure mechanisms of a dike
Design of a structure




   Random boundary conditions
Fault tree with AND and OR
                        flood defence fails


                               OR

         flood defence collapses         overtopping


                 AND

    piping develops   inspection fails
Mathematics of AND and OR
In case of an AND-gate, you should
  multiply the probabilities
In case of an OR-gate, you should add
  the probabilities (and substract the
  multiplication of the two probabilities)
Important condition:
  This is only true when both mechanisms
  are fully (statistically) independent
Example of dependence
Modern wave run-up formula is:


                         R  1.6 H
                              tan 
                         
                               H /L
                         L  gT /(2 )
                                2




(for shallow water, last equation is somewhat different and implicit)
Example of dependence (2)
So the answer depends on H and T
But in a single wave field, T = f(H), for
 example:T = 3.9 * H0.376
This can be modelled as:
 T = A * HB, in which both A and B are
 stochastic variables with a mean and
 standard deviation
Example of dependence (3)
But this is only true in case of a single
 wave field (wind waves OR swell
 waves)
When there are more wave fields H and T
 are NOT statistically independent
There is no good model for run-up due to
 double peaked spectra, but there is an
 approximation by Van der Meer
Example of dependence (4)

                    Run-up larger than allowed


                              OR

  Run-up due to           Run-up due to              Run-up due to
   wind waves              swell waves           double peaked spectrum


 Computation with       Computation with          Computation with the
   statistics of          statistics of            approxiamtion of
   wind waves             swell waves                VanderMeer
Two approaches
First approach:
  start at bottom and calculate the
  probability of failure according to normal
  design practice
second approach:
  start at top and assign probability to
  failure mechanisms
Two approaches (2)
Usually with a start at bottom, you do not
 reach at the required overall failure
 probability
Usually with a start at top, you cannot
 construct some elements
So in practice, you have to make a
 mixture
Sensitivity analysis
What is the effect of 10% change in input
 on the output ?
This determines how important is an input
 parameter
End of introduction
Data availability
Internet offers a huge source of past - and
  real time data
The theory will be explained with
examples and data sets taken from river
engineering:
The Global Runoff Data Centre
  http://grdc.bafg.de/servlet/is/910/
Mediterranean Hydrological Cycle Observing System
  (Med-HYCOS project)
  http://medhycos.mpl.ird.fr
UNESCO International Hydrological Programme
  http://webworld.unesco.org/water/ihp/db/
The Global River Discharge Database” (RIVDIS)
  http://www.rivdis.sr.unh.edu
Local Websites
Ministry of Water Resources of China
  http://www.mwr.gov.ch/english/index.asp
Ministry of Water Resources of India
  http://mowr.gov.in
Water Commission of India
  http://cwc.nic.in
These sites were useful for obtaining basic
  information about river basins, not so useful
  in downloading discharge data
Apart from websites, data is also published in
  National Water Resources Books
Datasets for river engineers
The longest period of observation is recorded for the
  river Nemunas at Smalinninkai: 1812-2003 (LT). The
  majority of European rivers have observation records
  dating from the period 1910-1920 and continuing to
  1999-2004.
On most Asian rivers water discharges have been
  observed since the period 1930-1940, although the
  river Bia at Biisk (Russia) has a record 108 year
  period of observation (1895 to 2003). The shortest
  period of observation is found on the Indian rivers
  (1939 -1979), the Chinese rivers (1930-1985) and the
  Iranian rivers (1963-1985).
Your data for the exercise is available at:

  http://www.citg.tudelft.nl/live/pagina.jsp?
  id=418a276e-b63e-4cec-a6fe-
  763feb04f984&lang=en
Some snap shots
Data for coastal engineers

www.oceanor.no/
www.knmi.nl/onderzk/oceano/waves/era40/lice
  nse.cgi
www.globalwavestatisticsonline.com/
http://www.golfklimaat.nl
http://www.actuelewaterdata.nl
http://www.hydraulicengineering.tudelft.nl/public
  /gelder/paper56-data3.zip
Hm0, H1/3,HTE3, Tm02, TH1/3, Th0,
wind direction, wind speed, water
     level , surge
Important data source for Dutch data
Some snap shots
real time wave data; significant wave heights
wave periods
Water levels gauges in Mid West Netherlands
Water levels and astronomical tide
The wave data on the wave climate site is available in data files per
  year (now : 1979 - 2002). The files contain wave data in the
  following format :
19880221 0100 90 4 84 12 52 66 351 10 -6 3433402
19880221 0400 76 3 73 15 53 65 339 -95 -2 3433402
19880221 0700 67 3 66 10 42 53 346 -31 -1 3433402
19880221 1000 67 3 64 12 40 48 344 51 -8 3433402
19880221 1300 66 3 65 11 36 44 308 -3 0 3433402
19880221 1600 75 3 73 12 40 47 298 -93 6 3433402
19880221 1900 91 4 79 13 41 48 325                 0 3 3433402
19880221 2200 86 4 80 11 41 47 334 95 0 3433402
19880222 0100 84 5 81 13 42 50 323 38 4 0400402
…….. etc.
                                        The files are arranged as follows :
Column nr.                Name                                                                         unit

1                         Date                                                                         [yyyymmdd]


2                         time                                                                         [hhmm] MET !


3                         wave height Hm0                                                              [cm]


4                         accuracy wave height Hm0 (standard deviation)                                [cm]


5                         wave height H1/3                                                             [cm]


6                         wave height HTE3                                                             [cm]


7                         wave period Tm02                                                             [0.1 s]


8                         wave period TH1/3                                                            [0.1 s]


9                         wave direction Th0                                                           [gr], nautical[1]


10                        water level                                                                  [cm] NAP/MSL


11                        surge                                                                        [cm]


12                        code number which indicates the origin of the given value                    [-]




     [1]
       Nautical degrees : (from) North = 0 degrees, (from) East = 90, South = 180, West = 270 and
       North again = 360.
Probability

P(A) = probability of event A

Mathematical definition
Frequentistic definition
Mathematical definition

Axioms:
1. P(A)  0
2. P() = 1
3. P(A or B) = P(A) + P(B)
   (if A and B are independent)
Frequentistic definition

P(A) = N(A) / N

in which:
N(A)        number of experiments leading to A
N           total number of experiments

example: probability that a consumer product fails
  within 1 year after production
example interpretation
P(A) = n(A) / N
P      probability
n(A)   number of outcomes in experiment A
N      total number of outcomes

          x      x      x   x   x       x
          x      x      x   x A x       x
          x      x      x   x   x       x
          x      x      x   x   x       x

P(A) = 4 / 24 = 1 / 6
example dice

      1    2      3   4   5   6
      x    x      x   x   x   x


P(x=4)    = 1/6
P(x  5) = 2/6
P(x even) = 3/6
Some history

1650Pascal / Fermat
1750         Bernouilli / Bayes
1850         Venn / Boole

1920         Von Mises
1960         Savage / Lindley   decision making
1970         Benjamin / Cornell decision making
1960’s and onwards

 ‘Years ago a statistician might have
 claimed that statistics deals with the
 processing of data;
 today statisticians will be more likely to
 say that statistics is concerned with
 decision making in the face of
 uncertainty.’
probability calculation

calculation of a probability from other
  probabilities
Joint events
                            B
Union
                A               
A or B
                            B
Cross section
A and B         A               

                            B
Implication         A
A in B                          

Denial                  A
A not                           
Union
P(A or B)

        x   x     x    x   x       x
        x   x     x    x   x       x
        x   x     x    x   x       x B
          A
        x   x     x    x   x       x


P(A or B) = P(A) + P(B) - P(A en B)
 13/24      = 6/24 + 9/24 - 2/24
            ?
Cross section
P(A and B)

        x   x       x     x     x    x
        x   x       x     x     x    x
        x   x       x     x     x    x B
          A
        x   x       x     x     x    x

P(A and B) = nAB / n
           = (nA / n) * (nAB / nA)
           = P(A) * P(B | A)
           = 6/24 * 2/6 = 2/24
Conditional probability
P(A | B) =
  probability of A given the fact that event
  B has occured

P(A and B) = P(B) P(A | B)

P(A | B) = P(A and B) / P(B)
Conditional probability


P(rain in Delft on sept. 18, 2024)?

P(rain in Delft on 9/18-2024| rain in
  Amsterdam on 9/18-2024)?

P(rain in Delft on 9/18-2024|rain in Cape
  Town on 9/18-2024)?
example: dice
             1            2        3        4         5         6
             x            x        x        x         x         x


                          P( x  2 en x  even ) 1/ 6
P( x  2 | x  even )                                 1/ 3
                               P( x  even )      1/ 2

                              P( x  2 en x  oneven)
P( x  2 | x  oneven)                               0
                                   P( x  oneven)

                         P( x  2 en x  even) 1/ 6
P( x  2 | x  even)                                1/ 3
                              P( x  even)      1/ 2
Independence
A and B are independent if
   PA | B  PA 



In that case:
    PA en B  PA  PB

    PA of B  PA   PB  PA  PB
Important rules
Theorem of total probability
  PA   PA | B PB  PA | Bniet  PBniet 

   PA    PA | Bi  PBi  als Bi elkaar uitsluiten
             i



Generalisation to continuous integral “in
 which the uncertainty is integrated out”

                                           PB | A  PA 
Theorem of Bayes               PA | B  
                                               PB 
example: quiz dilemma
Car in A, B or C

      A            B          C




U: chooses A
QM: Good that you didn’t choose B,
 because it is empty. Would you still like
 to switch to C?
Quiz-dilemma
Theorem of Bayes:

                 Pinf o | A  PA  1/ 2 * 1/ 3
 PA | inf o                                   1/ 3
                     Pinf o         Pinf o 

                   Pinf o | C PC 1 * 1/ 3
 PC | inf o                                  2/3
                       Pinf o       Pinf o 


Yes, switch to C!
                       Notes: P(info)=0.5 because there can be a car in B or not.
   P(info|C)=1, because if we have information on C and B(info), we know that A
                                         should contain the car with 100% certainty
   A clever student is invited to write a simulation programme to find out if this is
                                                                        indeed true
Solution of Jeroen van den Bos

         Development of 2 Matlab scripts
Quiz_noinfo.m (choose a box and check if the car is
      there with no information from the QM)
Quiz_info.m (choose a box and update your choice
when the QM gives his information on another box)
Quiz_noinfo.m
clear;
N = 2000; NoOfBoxes = 3;
NoSuccess=0;

for i = 1:N,
   Box_Car = fix(1+rand(1)*(NoOfBoxes)); %car is put randomly in a box
   Box_Guess = fix(1+rand(1)*(NoOfBoxes)); %random choice of a box

  NoSuccess = NoSuccess + (Box_Car == Box_Guess); %if guess is right increase # of
   succesful attempts

  Fr_Success(i)=NoSuccess/i; %frequency of success after i attemps
end;

P_Success = Fr_Success(N) %final result

% output
% ------

plot(1:N,Fr_Success,'.',[0 N],[1 1]/NoOfBoxes)
axis([0 N 0 1])
legend('Simulation result',['P = 1/' num2str(NoOfBoxes)])
           Quiz_info.m
clear;
N = 2000; NoOfBoxes = 3;
NoSuccess_Stay=0;
NoSuccess_Switch = 0;

for i = 1:N,
   Box_Car = fix(1+rand(1)*(NoOfBoxes)); %car is put randomly in a box
   Box_1stGuess = fix(1+rand(1)*(NoOfBoxes)); %random choice of a box

  k = 0; %Select possible empty boxes
  for j = 1:NoOfBoxes,
     if (Box_Car ~= j) & (Box_1stGuess ~= j) %empty box cannot be 'car' or 'guess'
         k = k + 1;
         Empty_Boxes(k) = j; %vector of empty boxes
     end;
  end;
  Box_Empty = Empty_Boxes(fix(1+rand(1)*k)); %Choose randomly from empty boxes

  k = 0; %Select possible alternatives
  for j = 1:NoOfBoxes,
     if (Box_Empty ~= j) & (Box_1stGuess ~= j) %alt box cannot be 'empty' or 'guess'
         k = k + 1;
         Alternative_Boxes(k) = j; %vector of alternative boxes
     end;
  end;
  Box_Alternative = Alternative_Boxes(fix(1+rand(1)*k)); %Choose randomly from alternative boxes

  NoSuccess_Stay = NoSuccess_Stay + (Box_Car == Box_1stGuess); %if 1st guess is right increase # of succesful attempts 'stay' strategy
  NoSuccess_Switch = NoSuccess_Switch + (Box_Car == Box_Alternative); %if alt. guess is right increase # of succesful attempts 'switch' strategy

  Fr_Success_Stay(i)=NoSuccess_Stay/i; %frequency of success after i attemps
  Fr_Success_Switch(i)=NoSuccess_Switch/i; %frequency of success after i attemps
end;

P_Success_Stay = Fr_Success_Stay(N) %final result
P_Success_Switch = Fr_Success_Switch(N) %final result

% output
% ------

plot(1:N,Fr_Success_Stay,'r.',1:N,Fr_Success_Switch,'b.',[0 N],[1 1]*P_Success_Stay,'r--',[0 N],[1 1]*P_Success_Switch,'b--')
axis([0 N 0 1])
%legend('Simulation result Stay','Simulation result Switch',['P_{Stay} = ' num2str(P_Success_Stay)],['P_{Switch} =',
        num2str(P_Success_Switch)],'location','EastOutside')

legend(['"Stay" stragegy (P_{success} = ' num2str(P_Success_Stay) ')'],['"Switch" strategy (P_{success} = ' num2str(P_Success_Switch) ')']);
                                  No info
 1
                                                          Simulation result
0.9                                                       P = 1/3

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

 0
      0   200   400   600   800    1000     1200   1400     1600   1800       2000
                            With info from QM
 1
                                  "Stay" stragegy (P success = 0.3295)
0.9                               "Switch" strategy (P success = 0.6705)

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

 0
      0   200   400   600   800   1000    1200   1400    1600   1800       2000
Updating process

If new information becomes available,
   new estimates can be made about the
   failure probability of a system
stochastic variables
stochastic variables
•   What is a stochastic variable?
•   probability distributions
•   Fast characteristics
•   Distribution types
•   Two stochastic variables
stochastic variable
 Quantity with uncertainty:
 – Natural variation
 – Lack of statististical data
 – Schematizations

 example:
 – strength of concrete
 – outcome of a dice
 – temperature in Delft on September 16th, 2014
Relation with events
uncertainty can be expressed with probabilities

• probability that stochastic variable X
   –   is smaller than x
   –   larger than x
   –   equal to x
   –   is in the interval [x, x+ x]
   –   etc.
   probability distribution
   probability distribution function = probability
     P(X):


     FX() = P(X)
                 1

                       0.8
stochast               0.6
               F ()
                  X




                       0.4
       dummy
                       0.2

                        0
                                    
probability density

       0.5

      0.45

       0.4

      0.35

       0.3

      0.25

       0.2

      0.15

       0.1

      0.05

        0
         -3   -2   -1   0   1   2   3   4   5
                            



This is a probability density function
probability density
Differentiation of F to :

fX() = dFX() / d

f = probability density function

fX() d = P( < X  +d
           1

         0.8
               P(X 
         0.6
F ()
    X




         0.4

         0.2

           0
                                            
         0.5

         0.4
               P( < X  d 
         0.3
 fX()




         0.2

         0.1

           0
                                   d 
           1

         0.8

         0.6
F ()
    X




         0.4

         0.2   P(X 
           0
                         
         0.5

         0.4

         0.3
 fX()




         0.2

         0.1

           0
                         
Discrete and continuous
discrete variabele:
          0.4                                                              1


         0.35                                                         0.9

          0.3
                                                                      0.8

         0.25
p (x)




                                                              F (x)
                                                                      0.7
   X




          0.2




                                                                     X
                                                                      0.6
         0.15

                                                                      0.5
          0.1


         0.05                                                         0.4


           0                                                          0.3
                0    1   2       3       4   5       6                         0   1    2       3       4       5   6
                                 x                                                                  x




continue variabele:
         0.5                                                          1


         0.4                                                         0.8
                                                             F (x)
 f (x)




         0.3                                                         0.6
                                                                X
    X




         0.2                                                         0.4



         0.1                                                         0.2



           0                                                          0
           -4       -2       0       2           4       6            -4           -2       0       2       4       6
                                 x                                                              x


                kansdichtheid                                              (cumulatieve)
                                                                           kansverdeling
 Complementary probability
 distribution
Complementary (cumulative distribution function (ccdf) of variable S
P(S > x) = 1 - P(S  x) = 1-FS(x)




                                          tail of the distribution

      P{S > Sd} = Ptarget


                               Sd
Fast characteristics
              0.5


      f (x)   0.4


              0.3
         X




              0.2

                                  sX
              0.1


               0
               -4   -2   0         2   4       6
                             mX            x

 mX      mean
 sX      standard deviation, indication for spreading
           0.7

           0.6

           0.5


     fX(x) 0.4
           0.3

           0.2

           0.1             sX
             0
              0   1   2         3   4       5
                      mX                x


Mean  maximum (mode)
Median is the value m for which P(X<m)=50%
                        

• Mean            m X   x fX x  dx
                        

                        
• Variance       s   x  m X  fX x  dx
                   2
                   X
                                     2

                        



• Standard deviation         sX


• Variation coefficient             sX
                             VX 
                                    mX
distribution types

•   Uniform distribution
•   Normal distribution
•   Lognormal distribution
•   Gumbel distribution
•   Weibull distribution
•   Gamma distribution
•   ….
Uniform distribution

   fX()                   area = total probability = 1

  1/(b-a)


               a                  b
                                          

 mean        m = (a+b)/2

 Standard deviation s = (b-a)/12
Matlab demonstration
Generate numbers from a Uniform distr.
Make a histogram (observe the variability
 around its mean)
Calculate the mean value and standard
 deviation
They should converge to 0.5 and 0.2887
 (1/sqrt(12))
This is indeed confirmed by the simulation
Normal distribution
                                           strength of timber
                          0.06
                                                     mX    probability density function
                          0.05                             normal distribution
                                                           mean = 37 N/mm2
                                                           standard deviation = 8.6 N/mm2
                          0.04
    probability density




                          0.03
                                                             sX

                          0.02



                          0.01



                            0
                            -10   0   10   20   30    40     50     60      70      80    90
                                           bending strength (N/mm2)
Normal distribution in CDF domain
                                                sterkte houtsoort
                             1

                            0.9

                            0.8
    kans buigsterkte <= x




                            0.7

                            0.6

                            0.5                                     kansverdelingsfunctie
                                                                    normale verdeling
                            0.4                                     mu = 37 N/mm2
                                                                    sigma = 8.6 N/mm2
                            0.3

                            0.2

                            0.1

                             0
                                  0   10   20    30       40       50      60       70      80
                                                               2
                                                      x (N/mm )
Normal distribution in linearised CDF
domain
                                                     sterkte houtsoort
                            0.997
                            0.99         Normal Probability Plot
                            0.98
                            0.95
    kans buigsterkte <= x




                            0.90

                            0.75


                            0.50

                            0.25

                            0.10
                            0.05
                            0.02
                            0.01
                            0.003

                                    15    20    25    30   35      40    45   50   55   60
                                                                   2
                                                       x (N/mm )
normal distribution
probability density: probability distribution:
                       1   m                    y
                                    2
                              
f X                                 FX y      f X  d
               1       2 s 
                  e
             s 2
                                                    


in which:
m    mean
s    standard deviation (s > 0)
    dummy variable (- <  < )
standard normal distribution
normal distributed variable X:
                                        X  mX
X  mX  s X u             or      u
                                         sX
standard normal distributed variable u:
 mu  0
su  1

probability density:                    probability distribution:
                             1                              y
                             2
f u       
                      1
                           e 2           Fu y          d
                      2
                                                            
                                                 table
This table can be used in both directions:
1. Given an x value, what is the
corresponding exceedence probability
2. Given a probability, what is the
corresponding x value



Note that the table only describes the right
hand tail of the standard normal
distribution. The left hand tail can be
obtained by symmetry around the point (0,
0.5).

For ordinary normal distributions, always
scale back to a standard normal distribution
(by subtracting the mean value, and
dividing by the standard deviation)
normal distribution
Why so popular?

Central limit theorem
  Sum of many variables (i.i.d.) is (almost)
  normally distributed.

Y = X1 + X2 + X3 + X4 + ….

                       i.i.d. = independent identically distributed
       Normal Distributions

A continuous rv X is said to have a
normal distribution with parameters
m and s , where    m   and
0  s , if the pdf of X is

           1     ( x  m )2 /(2s 2 )
f ( x)       e                           x  
         s 2
   Standard Normal Distributions
The normal distribution with parameter
values m  0 and s  1 is called a
standard normal distribution. The
random variable is denoted by Z. The
pdf is               1     z2 / 2
        f ( z;0,1)              e          z  
                       s 2
The cdf is                  z
    ( z )  P( Z  z )        f ( y;0,1)dy
                            
 Standard Normal Cumulative Areas

Standard             Shaded area = (z)
normal
curve

                 0   z
    Standard Normal Distribution
Let Z be the standard normal variable.
Find (from table)
  a. P( Z  0.85)
    Area to the left of 0.85 = 0.8023

  b. P(Z > 1.32)
     1  P(Z  1.32)  0.0934
c. P(2.1  Z  1.78)
   Find the area to the left of 1.78 then
   subtract the area to the left of –2.1.
  = P( Z  1.78)  P( Z  2.1)
  = 0.9625 – 0.0179
   = 0.9446
           z Notation

z will denote the value on the
measurement axis for which the area
under the z curve lies to the right of z .
                           Shaded area
                            P ( Z  z )  



                   0      z
Ex. Let Z be the standard normal variable. Find z if
a. P(Z < z) = 0.9278.
    Look at the table and find an entry
    = 0.9278 then read back to find

              z = 1.46.

 b. P(–z < Z < z) = 0.8132
      P(z < Z < –z ) = 2P(0 < Z < z)
                       = 2[P(z < Z ) – ½]
                       = 2P(z < Z ) – 1 = 0.8132
            P(z < Z ) = 0.9066
                                          z = 1.32
 Nonstandard Normal Distributions

If X has a normal distribution with
mean m and standard deviation s , then
                 X m
            Z
                   s
has a standard normal distribution.
          Normal Curve
Approximate percentage of area within
given standard deviations (empirical
rule).
                  99.7%
                  95%
                   68%
Ex. Let X be a normal random variable
with m  80 and s  20.
Find P( X  65).

                         65  80 
   P  X  65   P  Z          
                           20 
                P  Z  .75

               = 0.2266
Ex. A particular rash shown up at an
elementary school. It has been
determined that the length of time that the
rash will last is normally distributed with
m  6 days and s  1.5 days.
Find the probability that for a student
selected at random, the rash will last for
between 3.75 and 9 days.
                        3.75  6     96
P  3.75  X  9   P           Z     
                        1.5          1.5 
                 P  1.5  Z  2

                = 0.9772 – 0.0668

                = 0.9104
    Percentiles of an Arbitrary Normal
    Distribution

(100p)th percentile      (100 p)th for 
for normal  m,s   m  standard normal  s
                                        
Lognormal distribution

           0.7

           0.6

           0.5


     fX() 0.4
           0.3

           0.2

           0.1             sX
             0
              0   1   2         3   4       5
                      mX                
  Lognormal distribution
                 y
                                      fX() : lognormal
fY(y) : normal


                                            y = ln  or
                                             = exp(y)

                                                       

                     If X is lognormal distributed, than
                     Y = ln(X) is normal distributed
Lognormal distribution
X lognormal distributed  Y = ln(X) normal distributed

probability density function for X:

                        ln   m 2 
f X   
             1
                    exp          Y
                                             0
            s Y 2    
                             2s Y
                                 2
                                      
                                      
in which mY and sY parameters of the lognormal distribution:
mY     mean value of Y (not of X !!)
sY     standard deviation of Y (not of X !!)
Lognormal distribution
X lognormal distributed  Y = ln(X) normal distributed

m X  exp(m Y  1 s Y )
                    2
                   2

s X  m X exp(s Y )  1
                2



m Y  ln m X  1 s Y
                   2
               2

          sX 2        
s Y  ln 1             ln(1  V 2 )
          m2                     X
             X        
Lognormal distribution
As a consequence of the central limit
 theorem:
  Product of many variables is (almost)
  lognormal distributed

  y  x1 x 2 ...x n
  log y  log x1  log x 2  ...  log x n
  so log y (almost) normal distributed.

  Definition: log y normal  y lognormal
Example (salaries in a country are LN distr.
Asymptotic distributions

Normal                 X  Y1  Y2  Y3  ...


Lognormal              X  Y1 * Y2 * Y3 * ...


Weibull                X  minY1, Y2, Y3 ,...


Gumbel                 X  max Y1, Y2, Y3 ,...
Asymptotic distr. return in the 5th year course CT5310
  by Vrijling and Van Gelder
Discrete distributions
       Exponential Distribution

A continuous rv X has an exponential
distribution with parameter  if the pdf is

                e   x x  0
               
  f ( x;  )  
               
                         0       otherwise
       Mean and Variance

The mean and variance of a random
variable X having the exponential
distribution

                1         1
           m       s 
                     2
                         2
 Applications of the Exponential
 Distribution
Suppose that the number of events
occurring in any time interval of length t
has a Poisson distribution with parameter t
and that the numbers of occurrences in
nonoverlapping intervals are independent
of one another. Then the distribution of
elapsed time between the occurrences of
two successive events is exponential with
parameter   .
Two stochasts
Two stochasts
   joint probability density

                    0.2
   Kansdichtheid




                   0.15


                    0.1


                   0.05


                     0
                     2
                          1                                       2
                              0                               1
                                                          0
                                  -1             -1
                                       -2   -2
                              y                       x
Contour lines of the joint density
                         fY(h)




                                 h
    fXY(,h)




      fX()


                           
Two stochasts
Relation with events
fXY  ,h   P  X       andh    Y  h  h  /  h

FXY  ,h   PX   en Y  h 
                                                             
                                                    h              h


                                                            
Also here
f     density
F     (cumulative) distribution
Example


                                          3



• Length           kansdichtheid (1/m)
                                         2.5

                                          2

                                         1.5

                                          1

                                         0.5

                                          0
                                          1.2   1.4   1.6        1.8        2    2.2    2.4   2.6
                                                             lengte (m)



• Weight                          0.05
           kansdichtheid (1/kg)




                                  0.04


                                  0.03

                                  0.02


                                  0.01


                                          0
                                          50     60         70         80       90     100    110
                                                                 gewicht (kg)
Corresponding contour plot?
                 110

                 100

                  90
   weight (kg)




                  80

                  70

                  60

                  50
                       1.4   1.6       1.8      2   2.2
                                   length (m)
Scatter plot results of a large survey
                                     health investigation
                  110
                         1000 observations

                  100

                   90
    weight (kg)




                   80

                   70

                   60

                   50
                        1.4   1.5   1.6   1.7 1.8 1.9       2   2.1   2.2
                                            length (m)
Dependence                                                            density (1/kg)




                                                                                        0.02
                                                                                 0.03



                                                                                               0.01
                                                               0.05

                                                                          0.04




                                                                                                      0
                                                                                                          110
                     110




                                                                                                          100
                     100




                                                                 lengte (m)




                                                                                                          90
 weight (kg)




                     90




                                                                                                                weight (kg)
                                                                                                          80
                     80




                                                                                                          70
                     70




                                                                                                          60
                     60




                                                                                                          50
                     50
                            1.4   1.6       1.8      2   2.2
                      3

                      2.5
    densiity (1/m)




                      2

                      1.5

                      1

                      0.5

                      0
                            1.4   1.6       1.8      2   2.2
                                        length (m)
Characteristics


mX, mY


sX, sY

Dependence
covXY                 covariance or
rXY = covXY / sX sY   correlation, between -1 and 1
Covariance
Cov(X,Y)=E((X-EX)(Y-EY))

calculation example on black board
Correlation
        rho = 0    rho = 0.3   rho = 0.7




       rho = 0.9   rho = 1     rho = -0.9
Correlations between wave height
and wave period (data from
golfklimaat.nl website)
  Engineering: structural reliability
Reliability:
Probability that structure falls apart
The smaller the probability, the larger the
  reliability
Risk = probability x consequences
Structure:
Strength
Load
Falls apart if strength < load            Introduction
Design value - principle
    Probability density function (pdf) of the load S: fS(x)



                             P{S > Sd} = Ptarget

                         s



                        m
                                   Sd
                                                   Introduction
Cumulative distribution function
               1

             0.8

             0.6
   F ()
      X




             0.4                 cdf
             0.2   P(X 
               0
                             
             0.5

             0.4

             0.3
    f X()




             0.2
                                 pdf
             0.1

               0
                             
                                       Introduction
Design value - principle
Cumulative probability distribution function (cdf) of the lo


              P{S  Sd} = 1-Ptarget




                                      Sd
                                               Introduction
 Design value - principle
Complementary cumulative distribution function (ccdf) of th
= 1-FS(x)




                                P{S > Sd} = Ptarget


     P{S > Sd} = Ptarget                               Sd

                           Sd                         Introduction
Design value load Sd
Design value load Sd or quantile is defined as:
P{S > Sd} = Ptarget during reference period Tref

Target probability Ptarget :
Depends on consequences of structural failure
Is specified in building codes
Typical:
Ptarget = 10-4 - 10-1 (structural collapse)
Tref = 15 - 100 years
                                        Introduction
Peaks over Treshold analysis
for quantile estimation
                               Let X1; X2; . . . ; Xn be a
                                  series of independent
                                  random observations
                                  of a random variable X
                                  with the distribution
                                  function F(x). To
                                  model the upper tail of
                                  F(x), consider k
                                  exceedances of X over
                                  a threshold u and let
                                  Y1; Y2; . . . ; Yk denote
                                  the excesses (or
                                  peaks), i.e. Yi=Xi-u.
Extreme value statistics




If we know the distribution of a random variable
   (for instance, monthly water level, daily wave
   height, etc), how does the distribution of the
   maximum of n random variables behave?
Careful for inhomogenities
                  Stationary Time Series

Exhibits stationarity in that it fluctuates around a constant long
run mean

Has a finite variance that is time invariant

Has a theoretical covariance between values of yt that depends
only on the difference apart in time

                      E ( yt )  m
                                       
          Var ( y t )  E ( y t  m)  (0)
                                    2



Cov ( y t , y t  )  E( y t  m)( y t    m)  ()
                     Stationary time series

WHITE NOISE PROCESS

           Xt = ut         ut ~ IID(0, σ2 )


                        White Noise
    0.6
    0.4
    0.2
      0
    -0.2
    -0.4
    -0.6
                                      Examples of non-stationary series


       2.5
                               Share Prices                                                        Exchange Rate
                                                                                     .7
      2.25

                                                                                    .65
        2

      1.75                                                                           .6

       1.5
                                                                                    .55

      1.25
                                                                                     .5
        1
                                                                                    .45
       .75

        .5                                                                           .4

       .25                                                                          .35


             0   50      100   150          200   250   300    350   400      450
                                                                                          0   50    100   150   200   250


9.3

                               Income
9.2



9.1



 9



8.9



8.8



8.7


                  1960               1965               1970           1975
Unit Root Tests

    How do you find out if a series is
          stationary or not?
        Order of Integration of a Series

 A series which is stationary after being differenced once
is said to be integrated of order 1 and is denoted by I(1).
     In general a series which is stationary after being
  differenced d times is said to be integrated of order d,
    denoted I(d). A series, which is stationary without
                differencing, is said to be I(0)



Yt  b0 Yt  1  e t  I (1)
 Yt  Yt  Yt  1  b0 e t  I (0)
Informal Procedures to identify non-stationary processes
(1) Eye ball the data (a) Constant mean?
           12
                     RW2


           10



            8



            6



            4



            2



            0
                 0    50   100   150    200    250    300     350     400     450         500



         (b) Constant variance?
                     var
          200


          150


          100


           50


            0


           -50


          -100


          -150


          -200

                 0    50   100    150    200    250     300     350     400         450         500
    Statistical Tests for stationarity: Simple t-test



Set up AR(1) process with drift (b0)
        Yt = b0 + b1Yt-1 + et        et ~ iid(0,σ2)            (1)

Simple approach is to estimate eqn (1) using OLS and examine
estimated b1

Use a t-test with null Ho: b1 = 1 (non-stationary)
  against alternative Ha: b1 < 1 (stationary).

Test Statistic: TS = (b1 – 1) / (Std. Err.(b1))
reject null hypothesis when test statistic is large negative
        - 5% critical value is -1.65
Distribution of a maximum of random
variables
Therefore...
Extreme value distribution of a uniform
distribution
From an ‘operational point of view’
rather than conceptual point of view Matlab
code ct53100.m
 for j=1:100,
 n=12;
   for i=1:n,
            x(i)=5*rand(1);
   end
 y(j)=max(x);
 end
                       5                                                                                      5




                                                                                    Maximum water level [m]
Water level [m]



                       4
                                                                                                              4
                       3
                                                                                                              3
                       2

                       1                                                                                      2
                           0   2        4        6          8        10   12                                      0     20          40            60      80     100
                                            Time period i                                                                                Year j
                       3                                                                             60


                       2                                                                             40
Frequency




                                                                               Frequency
                       1                                                                             20


                       0                                                                                      0
                           1       2             3               4        5                                       2   2.5        3       3.5      4        4.5   5
                                       Monthly water level [m]                                                              Annual maximum Water level [m]
                       5                                                                                      5
Cumulative frequency




                       4                                                            Cumulative frequency
                                                                                                              4
                       3
                                                                                                              3
                       2

                       1                                                                                      2
                           0   2        4        6         8         10   12                                      0     20          40         60          80    100
                                   Sorted monthly water level [m]                                                            Sorted annual water level [m]
Back to the operational viewpoint
- Change the number of observations
- Change the distribution type
- From minimum to maximum
- etc.
From a visual point of view

Statistics wind:
          30                      30
                  Weibull
                                                      Bron:
                                                     Source: KNMI (Schiphol, 1964)
V (m/s)




          20                      20


          10                      10


          0                        0
          0.2         0.1     0        0   0.2         0.4       0.6        0.8            1
                kansdichtheid
               probability density                 tijd
                                                 time (1/1/64 - 31/12/64)

     Mean             : 5 m/s
     Standard deviation     : 2.5 m/s
                                                                            Introduction
visual point of view

          instantaneous   maximum 1 year

                                  maximum 50 years




  5 m/s           20 m/s 28 m/s            wind speed



                                                        Introduction
visual point of view
Procedure for minima of r.v.’s
The extreme value distribution


Distribution type
Distribution parameters
Quantile values (design values)
Example: wind loads



                      Approach 1: extreme value distributions
 The extreme value distribution
Limit theorem (Fisher-Tippet):

Maximum of many random variables has distribution:
Reverse Weibull (bounded maximum) or
Gumbel or
Frechet (bounded minimum)
regardless of parent distribution

Conditions:
Random variables are independent
Random variables have the same parent distribution
                                 Approach 1: extreme value distributions
Extreme value distributions

                                    Frechet
                                   (concave)




         Reverse Weibull    Gumbel
            (convex)       (straight)



                           Approach 1: extreme value distributions
  Generalized Extreme Value
  distribution
All three extreme value distributions are
  special cases of the Generalized Extreme
  Value distribution (GEV):




 = 0 Gumbel            (EV type I for maxima)
 > 0 Frechet           (EV type II for maxima)
 < 0 Reverse Weibull   (EV type III for maxima)
                              Approach 1: extreme value distributions
Domain of attraction
 Parent distribution         Asymptotic distribution
                             type of maximum
                             (domain of attraction)
 uniform, beta               Reverse Weibull
 (short tail)
 Normal, exponential,      Gumbel
 gamma, lognormal, Weibull
 Pareto, Cauchy, Student-t   Frechet
 (fat tail)


                               Approach 1: extreme value distributions
Example: wind load
Example:
Wind load
Maximum over 50 years
Quantile value at Ptarget = 0.15 (design
  load)
Steps:
  – Determine extreme value distribution type
  – Determine distribution parameters
  – Calculate requested quantile (design load)
                           Approach 1: extreme value distributions
Distribution type
Statistics: parent distribution is approx. Weibull
EV-theory: domain of attraction is Gumbel
Plot: monthly maxima of hourly averaged wind speeds




                                                        Slightly convex?




                                    Approach 1: extreme value distributions
Gumbel probability plot

          CDF on Gumbel probability paper:


        Reverse Weibull-like deviation
        (poor convergence)




                                 Approach 1: extreme value distributions
Wind speed annual maxima Schiphol




                   Approach 1: extreme value distributions
Wind pressure monthly maxima
Schiphol




                     pressure:

                     q = 0.5 r U2
                     with:
                     r air density
                     U wind speed

                   Approach 1: extreme value distributions
Transformations
Presenting large datasets
In a histogram
On probability paper
 Classify your data

order the n observations
Number of classes: 1 + 1.33 ln(n)
All classes have preferably the same
  width
Histogram
Wave height (cm’s):
25 45 35 25 30 70
  20 45 65 30 40 40
  35 45 55 35 32 37
  28 45 49 39 40 60
  29 34 47 35 45 49
  35 45 34 28 34 54
  48 38 32 39 45 58
Histogram
#classes: 1 + 1.33 ln(42) ≈ 6
highest - lowest = 70 - 20 = 50
class width about 50 / 6 = 8 (we take 5 to
  choose a round number)
Histogram 4
class           frequency freq/width
  (unit=5)
  17,5 - 27,5       3        3/2
  27,5 - 32,5       7        7/1
  32,5 - 37,5       9        9/1
  37,5 - 42,5       6        6/1
  42,5 - 47,5       8        8/1
  47,5 - 57,5       5        5/2
  57,5 - 77,5       4        4/4
                      Table 2.1 Groundwater chemistry from Noord-Branband, The Netherlands
                        X        Y      EC       Ca       Cl      HCO3      K       Mg       NO3    Na       SO4

Case study:             (m)
                      133875
                      158975
                                (m)   (mS/m)
                               407525 11.60
                               413725 83.20
                                               (mg/l)
                                                 9.63
                                               159.00
                                                        (mg/l)
                                                         19.70
                                                         46.50
                                                                 (mg/l)
                                                                   3.00
                                                                 290.00
                                                                          (mg/l)
                                                                           1.72
                                                                           1.58
                                                                                   (mg/l)
                                                                                    0.88
                                                                                   14.23
                                                                                            (mg/l) (mg/l)
                                                                                             0.28   9.90
                                                                                             0.28 23.26
                                                                                                            (mg/l)
                                                                                                             21.20
                                                                                                            189.00
                      156675   419725 72.10    135.00    37.40   260.00    4.00    13.29     1.81 18.02     149.00


Groundwater           145325
                      143200
                      148238
                      165225
                               401775 39.70
                               410625 33.10
                               411875 67.90
                               422650 40.10
                                                51.20
                                                59.30
                                                74.20
                                                67.70
                                                         19.50
                                                         14.60
                                                         85.00
                                                         22.10
                                                                  11.00
                                                                 207.00
                                                                 252.00
                                                                 125.00
                                                                           8.28
                                                                           1.16
                                                                           3.73
                                                                           1.13
                                                                                    7.50
                                                                                    4.66
                                                                                    3.87
                                                                                    3.45
                                                                                            13.20 13.97
                                                                                             0.28 10.88
                                                                                             0.28 78.83
                                                                                             0.28 10.41
                                                                                                            112.00
                                                                                                              2.10
                                                                                                             44.50
                                                                                                             83.10


chemistry data set    171500
                      163875
                      168463
                      177075
                               414450 24.00
                               407250 14.30
                               402625 49.80
                               405600 19.50
                                                 3.84
                                                17.70
                                                48.20
                                                 8.34
                                                         15.40
                                                         11.70
                                                         52.30
                                                         11.10
                                                                   3.00
                                                                  44.00
                                                                 147.00
                                                                   3.00
                                                                           1.67
                                                                           0.77
                                                                           4.54
                                                                           9.21
                                                                                    2.08
                                                                                    2.06
                                                                                   13.15
                                                                                    6.25
                                                                                             9.35
                                                                                             0.28
                                                                                                    8.63
                                                                                                    7.84
                                                                                             0.28 40.53
                                                                                             2.17 12.75
                                                                                                             61.20
                                                                                                             19.60
                                                                                                             81.10
                                                                                                             61.00
                      184550   415063 40.30     52.70    31.40     9.10    2.78     5.63     0.28 21.81     150.00
                      182500   403638 65.40     63.40    46.50     3.00   25.81    15.67    31.60 20.15     137.00
                      194688   404850 62.80     84.10    31.40     9.10   16.37     9.82    45.50 22.87      94.60
                      137412   391325 70.10     83.80    36.20    13.00   10.99    22.27    57.20 17.48      83.50
                      146000   394525 26.40     49.20     6.67   173.00    1.24     4.22     0.28   7.46      1.80
                      157725   399378 11.60     15.70    10.50    61.00    0.51     1.83     0.28   8.20      2.50
                      158600   393350 88.40    167.00    88.30   371.00    1.47     6.58     0.28 31.94      76.90
                      143290   376185 36.50     23.30    54.50     3.00   10.40     2.68     7.29 38.94      57.70
                      150705   382290 14.20      6.69    10.50     3.00    4.09     6.06     0.28   5.49     41.40
                      167325   392060 27.70     45.30    18.30   127.00    0.72     5.57     0.28   6.93     21.40
                      179563   395513 68.20     51.30    58.90     3.00   43.23    15.59     0.28 32.86     242.00
                      173640   388415 98.50    172.00   109.00   450.00    1.77    19.10     0.28 47.29      60.40
                      162600   378238 15.70     11.20     6.82     3.00    2.57     0.81     0.34   7.52     45.90
                      171335   380745 12.10      2.12     4.72     3.00    1.55     2.27     0.28   4.67     27.10
                      182675   393300 25.20     11.90    12.60     3.00    2.93     7.02     9.31   7.56     62.70
                      192750   397975 53.00     55.50    18.80     3.00   24.71    10.38    40.50 19.79      81.20
                      182065   382505 23.20     35.50     7.45    77.00    1.15     5.70     0.28   6.13     49.10
                      146050   367045 93.00     82.10    43.70     3.00   41.72    21.87    87.30 23.15      90.60
                      155915   372595 63.90     69.00    38.60     3.00   10.99    21.53    44.40 21.83     105.00
                      163750   368850   9.90     7.63     4.93     3.00    1.70     2.15     1.28   6.75     31.10
                      168200   365138 70.80     64.80   112.00    56.00    2.42    14.02     0.28 59.13     166.00
                      177325   372138 13.50     18.30    18.50    57.00    1.51     1.64     0.28 10.69       2.10
                      169813   362163 33.20     10.40    15.30     3.00    1.83     2.46     0.28 15.75      72.10
                       82245   404445 87.10     34.50   160.00   293.00    2.73     3.32     0.28 150.70      2.20
                       85975   401085 62.30    109.00    56.30   349.00    2.16    12.57     0.28 19.78       1.70

(from;                 93840
                      115480
                      117915
                               406235 45.90
                               420695 112.60
                               407045 76.00
                                                41.50
                                               147.00
                                                73.30
                                                         90.70
                                                        181.00
                                                         60.20
                                                                 130.00
                                                                 433.00
                                                                 232.00
                                                                           4.21
                                                                           2.71
                                                                          42.34
                                                                                    9.11
                                                                                   13.30
                                                                                   19.69
                                                                                             0.28 18.86
                                                                                             0.28 49.11
                                                                                             0.28 49.06
                                                                                                              2.10
                                                                                                              1.00
                                                                                                            144.00

Y. Zhou, 2006:        119620
                      127360
                               402050 26.50
                               419875 115.10
                                                18.40
                                               121.00
                                                         18.10
                                                        184.00
                                                                   3.00
                                                                 467.00
                                                                           3.33
                                                                           3.17
                                                                                    7.49
                                                                                    9.94
                                                                                             0.28 11.38
                                                                                             0.28 147.70
                                                                                                             86.50
                                                                                                             11.10
                      135725   416615 62.90    125.00    17.80   305.00    0.32    11.59     0.28 11.49     105.00

Hydrogeostatistics.   121645
                      125395
                               408175 30.50
                               401095 39.00
                                                54.90
                                                52.90
                                                         16.30
                                                         48.90
                                                                 159.00
                                                                  65.00
                                                                           1.05
                                                                           1.66
                                                                                    4.55
                                                                                    4.53
                                                                                             0.28   7.62
                                                                                             0.28 21.33
                                                                                                             16.40
                                                                                                             74.00
                       76360   392710 16.00      3.68    37.20    29.00    2.18     2.13     0.28 22.38       2.20

UNESCO-IHE             84060
                       94950
                               393685 35.30
                               391050 23.60
                                                30.10
                                                 4.31
                                                         61.40
                                                         20.00
                                                                   3.00
                                                                   3.00
                                                                           2.10
                                                                           2.32
                                                                                    8.63
                                                                                    6.13
                                                                                             0.28 25.51
                                                                                             2.30 11.43
                                                                                                             79.20
                                                                                                             76.60
                       96000   383625 63.30     66.10    30.00     3.00   44.80    13.07    16.30 13.18     207.00

lecture note.)        104570
                      101830
                               391570 51.10
                               398440 51.60
                                                15.00
                                                33.30
                                                         60.90
                                                         20.10
                                                                   3.00
                                                                   3.00
                                                                           4.18
                                                                          13.39
                                                                                    6.67
                                                                                   15.59
                                                                                             0.28 26.06
                                                                                             0.28 12.54
                                                                                                             80.90
                                                                                                            173.00
                      113238   399300 61.20     60.00    73.30    74.00   19.38    11.81     0.28 45.19     151.00
Frequency tables

 Step 1: Range of data
     R = xmax - xmin
     R(Cl) = 184 - 4.72 = 179.28 mg/l
Frequency tables
Step 2: Number of class intervals, m
Class number
     6 < m < 25
     m = 1 + 1.33 ln(N)
     Example Cl: m should be around 6-7 class
Class width x:
     x > R/m
     Example Cl: 15 > 179.28/13=14
Class limits xj- (lower limit) and xj+ (upper limit):
     xj- = x0 + (j-1)*Δx < values in class j < xj- + Δx =
xj+
  Frequency tables

Step 3: Number of measurements per class

         nj:      absolute frequency
         fj=nj/n: relative frequency
Frequency tables
Step 4: Creating frequency table
   Class interval   Absolute frequency   Relative frequency
    0 < Cl < 15                    11              20.80%
   15 < Cl < 30                    15              28.30%
   30 < Cl < 45                     8              15.10%
   45 < Cl < 60                     7              13.20%
   60 < Cl < 75                     4               7.50%
   75 < Cl < 90                     2               3.80%
  90 < Cl < 105                     1               1.90%
 105 < Cl < 120                     2               3.80%
 120 < Cl < 135                     0                   0
 135 < Cl < 150                     0                   0
 150 < Cl < 165                     1               1.90%
 165 < Cl < 180                     0                   0
 180 < Cl < 195                     2               3.80%
Creating of a Histogram
 Step 5: Absolute and relative frequencies
                                                  Abso lute fre quency histog ram
                               16

                               14

                               12
      Absolute frequency




                               10

                                8

                                6

                                4

                                2

                                0
                                    0   15   30   45   60   75    90   105   120   135   150   165   180   195
                                                                 Cl (mg/l)


                                                  Re lative freque nc y histo gram
                               30

                               25
      Relative frequency (%)




                               20

                               15

                               10

                                5

                                0
                                    0   15   30   45   60   75    90   105   120   135   150   165   180   195
                                                                 Cl (mg/l)
Frequency tables
Step 6: Cumulative frequency table
 Upper class limit   Absolute frequency   Relative frequency
        Cl < 15                     11             20.80%
        Cl < 30                     26             49.10%
        Cl < 45                     34             64.20%
        Cl < 60                     41             77.40%
        Cl < 75                     45             84.90%
        Cl < 90                     47             88.70%
        Cl < 105                    48             90.60%
        Cl < 120                    50             94.40%
        Cl < 135                    50             94.40%
        Cl < 150                    50             94.40%
        Cl < 165                    51             96.30%
        Cl < 180                    51             96.30%
        Cl < 195                    53            100.00%
Frequency distribution
Step 7: Cumulative frequency distribution curve
                                               Cumulative fre que ncy dis tribution
                           100
                           90
Cumulative frequency (%)




                           80
                           70
                           60
                           50
                           40
                           30
                           20
                           10
                            0
                                 0   20   40    60      80      100      120    140   160   180   200
                                                             Cl (mg/l)
Some typical frequency distributions
                             Sy m m e t ric a l o r be ll- s ha pe d d is t ribut io n                                                 P o s it iv e s ke we d dis t ribut io n




                                                                                                           Relative f requency (% )
Relat ive f requency ( %)




                                              X v a ri a b l e                                                                                    X v a r i a b le




                                         B im o d a l dis t ribu t io n                                                               N e g a t iv e s ke we d dis t ribut io n
 Relat ive f requency ( %)




                                                                                         Relative f requency (% )




                                               X v a ri a b l e                                                                                  X v a r i a b le
Statistical descriptors
Descriptors of central tendency
Mode: the value with largest frequency; average
    value of measurements of the class with the
    largest frequency

   For Cl: the mode is 19.4 mg/l
                                                                         Re lative freque nc y histo gram

        Not applicable for                            30

                                                      25

 distributions with          Relative frequency (%)
                                                      20


 several peaks                                        15

                                                      10

                                                       5

                                                       0
                                                           0   15   30   45   60   75    90   105   120   135   150   165   180   195
                                                                                        Cl (mg/l)
Statistical descriptors
Descriptors of central tendency
Median: the value corresponds to 50% of cumulative
     frequency, the value of mid measurement for odd
     number samples or the average of two mid
     measurements for even number samples
     For Cl:                                                                    Cumulative fre que ncy dis tribution
                                                            100
     median = 31.4 mg/l          Cumulative frequency (%)
                                                            90
                                                            80
                                                            70
                                                            60
                                                            50

Insensitive to the tails or                                 40
                                                            30
outsiders of the distribution,                              20

preferable for data sets with                               10
                                                             0
exceptional values                                                0   20   40    60      80      100
                                                                                              Cl (mg/l)
                                                                                                          120    140   160   180   20
                                                  Cl Rank    Percent
                                                184     1   100.00%
                                                181     2    98.00%




Statistical descriptors
                                                160     3    96.10%
                                                112     4    94.20%
                                                109     5    92.30%
                                                90.7    6    90.30%
                                                88.3    7    88.40%
                                                 85    8     86.50%
                                                73.3   9     84.60%


 1. Descriptors of central tendency             61.4
                                                60.9
                                                60.2
                                                       10
                                                       11
                                                       12
                                                             82.60%
                                                             80.70%
                                                             78.80%
                                                58.9   13    76.90%



 Quartiles: split the data into quarters
                                                56.3   14    75.00%

                                                54.5   15    73.00%
                                                52.3   16    71.10%
                                                48.9   17    69.20%



   – Lower quartile: 25% cumulative frequency
                                                46.5   18    65.30%
                                                46.5   18    65.30%
                                                43.7   20    63.40%
                                                39.2   21    61.50%

     For Cl: lower quartile = 16.3 mg/l         38.6
                                                37.4
                                                37.2
                                                       22
                                                       23
                                                       24
                                                             59.60%
                                                             57.60%
                                                             55.70%
                                                36.2   25    53.80%



   – Upper quartile: 75% cumulative frequency
                                                31.4   26    50.00%
                                                31.4   26    50.00%
                                                 30    28    48.00%
                                                22.1   29    46.10%


     For Cl: upper quartile = 56.3 mg/l         20.9
                                                20.1
                                                       30
                                                       31
                                                             44.20%
                                                             42.30%
                                                 20    32    40.30%
                                                19.7   33    38.40%
                                                19.5   34    36.50%
                                                18.8   35    34.60%


     In practice also the 1%, 5%,10%,           18.5
                                                18.3
                                                18.1
                                                       36
                                                       37
                                                       38
                                                             32.60%
                                                             30.70%
                                                             28.80%


     90%, 95% and 99% values are used
                                                17.8   39    26.90%
                                                16.3   40    25.00%
                                                15.4   41    23.00%
                                                15.3   42    21.10%
                                                14.6   43    19.20%

     (e.g. discharge data).                     12.6
                                                11.7
                                                11.1
                                                10.5
                                                       44
                                                       45
                                                       46
                                                       47
                                                             17.30%
                                                             15.30%
                                                             13.40%
                                                              9.60%
                                                10.5   47     9.60%
                                                7.45   49     7.60%
                                                6.82   50     5.70%
                                                6.67   51     3.80%
                                                4.93   52     1.90%
                                                4.72   53      .00%
Statistical descriptors
1. Descriptors of central tendency
Arithmetic mean: average value of measurements


                    1 N
                 x =  xi
                    n i=1
        For Cl: arithmetic mean = 43.72 mg/l


• More representative of the sample, sensitive to outsiders in a
small sample.
• Most distributions are sufficiently characterised by the mean
and the variance.
Statistical descriptors
 1. Descriptors of central tendency
 Geometric mean

             x g = n x1* x2 * ...* xn
                         1 n
             log( x g ) =  log( xi )
                         n i=1
  For Cl: geometric mean = 29.53 mg/l
Not applicable for negative values. Often hydrogeological
variables are not symmetrical, but the log transformations are
symmetrical. Then geometric mean is applicable.
The radius of a grain with main axes a, b, and c is characterised
best by the third root of the a*b*c.
Statistical descriptors
 1. Descriptors of central tendency
 Harmonic mean

                         1
               xh =
                      1 n 1
                         
                      n i=1 xi

       For Cl: harmonic mean = 20.08 mg/l

Appropriate for phenomenon where small values are more
important (e.g. hydraulic conductance; see lecture notes form
Zhou, 2006)
Statistical descriptors
Descriptors                         Summary of properties
---------------------------------------------------------------------------------------------------------------------
Mode                    indication of abundant values, isolated property,
            not applicable for distributions with several peaks.
---------------------------------------------------------------------------------------------------------------------
Median                  insensitive to the tails of the distribution, preferable
            for data sets with exceptional values.
---------------------------------------------------------------------------------------------------------------------
Arithmetic Mean more representative of the sample, sensitive to exceptional
            values in a small sample. Most distributions are sufficiently
            characterised by the mean and the variance.
---------------------------------------------------------------------------------------------------------------------
Geometric mean not applicable for negative values. The radius of a grain with
            main axes a, b, and c is characterized best by the third root
            of the product a b c.
---------------------------------------------------------------------------------------------------------------------
Harmonic mean more appreciate for phenomenon where small values are
            more important.
Statistical descriptors
 Relations between central tendency descriptors

                       The harmonic mean is smaller than the
                       geometric mean, and the geometric mean is,
 xh < x g < x
                       in turn, smaller than the arithmetic mean.
                       They are equal only if x1 = x2 = ... = xn.

                       If the frequency distribution is symmetrical.
  x = median           They are not equal when the distribution is
                       not symmetrical.

 log( x g ) = log(x)   The mean of the log x -distribution is equal to
                       the logarithm of the geometric mean of x.
Statistical descriptors
2. Descriptors of dispersion (variation)
Sample variance
                1 n                          1 n 2 2
       s   2=
                     ( xi - x )2   s   2=
                                                 xi - x
                n i=1                        n i=1

     For Cl: variance = 1768.72 [mg/l]2
 Standard deviation: the square root of the
 variance
      For Cl: standard deviation = 42.06 mg/l
Statistical descriptors
   2. Descriptors of dispersion (variation)
                      50



                      40
   Frequency (100%)




                      30



                      20



                      10



                      0
                           -8   -7   -6   -5   -4   -3   -2     -1       0       1   2    3   4   5   6   7   8
                                                                     Me an = 0

                                                              S tandard devi ati on = 1
                                                              S tandard devi ati on = 2
Statistical descriptors
  2. Descriptors of dispersion (variation)
Coefficient of variation
                    s
               Cv =
                    x
     For Cl: coefficient of variation = 0.96

  Useful to compare the variations of two or more
  data sets.
Statistical descriptors
         3. Descriptors of asymmetry
Coefficients of skewness

         x - mode                      3( x - median)
    P1 =             Moments or   P2 =
             s                                s
                       product
                      moments!
                                           2
               m3                         n
         a 3 = 3/2            3=                  a3
              m2                    (n - 1)(n - 2)

         For Cl: coefficient of skewness α3 =
  1.97
Statistical descriptors
         Moments or product moments
Central moments
                        1 n
                    mk =  ( xi - x )k
                        n i=1
 Moments are statistical descriptors of a data set used for:
    – 1st moment: mean or expected value, µx (“central tendency”)
    – 2nd moment: variance, σ2x ; standard deviation σx is square root of
      variance (“spread around the central value”)
    – 3rd moment: skewness, γx
      (“measure of symmetry”)
    – 4th moment: kurtosis, κx
      (“peakedness of central portion of distribution”)
Statistical descriptors
           3. Descriptors of asymmetry
                                        Symmetrical distribution         3
                                                                             =0




                                                            X variable
                                                    Mode = Mean = Median




                 Skewned to the right                                             Skewned to the left
                           3
                               >0                                                         3   <0




               X variale                                                                           X variale
  Mode  Mean                                                                                                   Mean Mode
    Median                                                                                                       Median
Statistical descriptors
     4. Descriptor of flatness/’peakedness’
Coefficient of kurtosis
               m4                   n3
        a4 =           4=                       a4
               m2
                2
                           (n - 1)(n - 2)(n - 3)


        For Cl: coefficient of Kurtosis = 7.04

      α4=3: for Normal distribution
      α4>3: steeper than Normal distribution
      α4<3: flatter than Normal distribution
Normal probability paper
Sort the data from small to large
Assign each observation to i/N+1 in which
 i the order number and N the total
 number of data points
Daily Mean Run-Off Anomalies at Achleiten Danube
River  deseasonalised daily mean run-off




                                           POT




                                                 year
   Flood frequency analysis (peak flows)
Annual maximum series (more common)
    One can miss a large event if more than one per year; but continuous
     and easy to process
    Often used for estimating extremes in long records (>10 years)
Partial duration series (“Peaks-Over-Threshold, POT”)
    Definition of the threshold is tricky and requires experience
    Often used for short records (<10 years)

                             150
                                    at Stah
    Daily discharge (m3/s)




                             125

                             100
                                                                             threshold
                              75

                              50

                              25

                               0
                             25-Mar-60    2-Dec-73   11-Aug-87   19-Apr-01
Flood frequency
analysis
(peak flows):

Annual max.
series
vs.
partial duration
series


         (Davie, 2002)
Annual max. series vs. partial
duration series (POT)
Langbein showed the following relationship
  (Chow 1964):

             1/T = 1- e-(1/Tp)
   T : return period using annual max. series
   Tp: return period using partial duration series

Differences get smaller for larger return periods
  (less than 1% difference for a 10-year
  recurrence interval)!
  Assumptions of frequency analysis
All data points are correct and precisely measured
     Be aware of the uncertainty of peak flow data (uncertainty and errors come
      later in this course!)
Independent events: peak flows are not part of the same event
     Carefully check the data set; plot the whole record, in particular all events of
      the POT series
     Problems with events at the transition of the year (31 Dec – 1 Jan) in humid
      temperate or some tropical climates
Random sample: Every value in the population has equal chance of being
   included in the sample
The hydrological regime has remained static during the complete time period
   of the record
     No land use change, no climate change, no changes in the river channels, no
      change in the flood water management etc. in the catchment (often not the
      case for long records!)
All floods originate from the same statistical population (homogeneity)
     Different flood generating mechanisms (e.g. rain storms, snow melt, snow-
      on-ice etc.) might cause floods with different frequencies/recurrence intervals
Describing the frequency mathematically:
Probability Distribution Function
 Typically defined in either of two forms:
 • Probability density function (PDF)
 • Cumulative distribution function (CDF)

                      PDF                    CDF


              PX  x  p( x)  p x
                                                   x
                                       F ( x)   f (i )
 Discrete
                                                  i 0



                            b                      x

Continuous pa  x  b   f ( x)dx   F ( x)      f (m )dm
                                                  
                            a
 Basics (examples using measured flow, Q)
Probability of exceedence, P(X): probability that the flow Q is
  greater or equal X; P(X)ε[0,1]
Relative frequency, F(X): probability of flow Q being less than
  a value X; F(X)ε[0,1]. Can be read from a cumulative
  probability curve, but be careful with the selected class
  intervals.
Average recurrence interval or return period, T(X): statistical
  term meaning the chance of exceedence once every T
  years over a long record (time step is usually one year).
   – Not exactly the number of years that are between certain size
     events!
   – More the average number of years, in which flow is greater than X!
   – No regularity or periodicity in occurrences of exceedences
     (assumption)
                       P(X) = 1–F(X)
                       T(X) = 1/P(X) = 1/(1-F(x))
                                 Relative frequency F(X)



                                        Probability of exceedance P(X)



                  PDF
                                                      CDF
Probability of exceedance P(X)




     Relative frequency F(X)
   Recurrence intervals for design
   purposes (flood protection) in Germany
   Class 1
       Settlements, urban areas, important infrastructure:
        50-100 years
   Class 2
       Single buildings, not always inhabit neighborhoods:
        25-50 years
   Class 3
       Farm land, intensively used: 10-25 years
   Class 4
       Farm land; 5-10 years

What about large dams, nuclear power plants etc.?               PMF

                                        (according to DIN 19700, part 99)
Exceedence probability for a specified number of
time intervals (see Box C-4, page 561 in Dingman, 2002)
Examples: Exceedence probability and return
period (based on Box C-4 in Dingman, 2002)
What is the probability that a flood greater or equal a 100-year flood will occur
     next year?
         P(X) = 1/T(X) = 0.01
What is the probability that we will not have a flood that is greater or equal the
     50-year flood next year?
         F(X) = 1-P(X) = 1-0.02 = 0.98
What is the probability that we will not have a flood that is greater or equal the
     20-year flood in the next 5 years?
         F(X) = (1-P(X))n = 0.955 = 0.774
What is the probability that the next exceedence of the 100-year flood will occur
     in the 10th year from now on?
         p = (1-0.01)9 x 0.01 = 0.00916
What is the probability that the 100-year flood will be exceeded at least once in
     the next 40 years?
         p = 1-(1-0.01)40 = 0.331
What is the probability that the 50-year flood will be exceeded twice in a row
     (two independent events in one year), and how many 50-year floods can
     be expected on averages in 1000 years?
         p = 0.02 x 0.02 = 0.0004; and on average 20 floods in 1000 years.
    =




(                          )
        -   =



                =

                    (Bedient and Huber, 2002)
But how do we estimate P(X) and F(X) from data?

Example: Flood frequency analysis

                           150
                                  at Stah
  Daily discharge (m3/s)




                           125

                           100

                            75

                            50

                            25

                             0
                           25-Mar-60    2-Dec-73   11-Aug-87   19-Apr-01
Example: Annual max. series for the river Wye (1971-97)
(not Normal-distributed!)




                                                  (Davie, 2002)
 Plotting position – Weibull formula
Rank the annual maximum series data from low to high
  (independent data, the year of occurrence is irrelevant)
Calculate F(X) with the rank r and N total data points (i.e.
  length of record: N years)

               F(X) = r/(N+1)

   – For example: The largest value of a 25 year record would plot at
     a recurrence interval of 26 years.
   – F(X) can never reach 1
   – If you rank from high to low P(X) = 1-F(X) is calculated
Example: Annual max. series for the river Wye (1971-97)
(not Normal-distributed!)




                                                  (Davie, 2002)
[Please note:
    Gringorten formula
F(X)=p
in the Workshop
course note]
                  F(X) = (r-0.44) / (N+0.12)
    Difference to Weibull formula is often not great
    Use is often down to personal preferences
    Empirical constants (0.44 and 0.12) are valid for
      Gumbel distribution




  Comparison of Weibull and Gringorten
                formulae (Davie, 2002)
Example: Annual max. series for the river Wye (1971-97)
(not Normal-distributed!)




                                                  (Davie, 2002)
Example: Annual max. series for the river Wye (1971-97)
(not Normal-distributed!)




                                                  Reliability
                                                  is good!




                                                  (Davie, 2002)
 Extrapolation beyond the data set
Weibull or Gringorten formulae only good for flood frequency
    estimations for flows within the measured record, and
    even unreliable near either limiting value
For extrapolation the fit of a probability distribution is needed.
    Estimate the parameters through:
   1. Method of moments (widely used)
   2. Method of L-moments (less widely, used, quite complex)
   3. Method of maximum likelihood (not widely used)
An alternative is a graphical approach to fit the distribution
    (subjective approach)
Choice of the distribution function often based on personal
    preferences (but always take the distribution that fits your
    data best in a particular region), but there are sometimes
    guidelines (depend on the region)
Extreme values are usually not normally distributed, however,
    mean annual flows in humid areas are often normally
    distributed
  Method of moments – Example:
  Gumbel distribution
Product moments are statistical descriptors of a data set (characterize
   the probability distribution):
    – 1st moment: mean or expected value, µx (“central tendency”)
    – 2nd moment: variance, σ2x ; standard deviation σx is square root of
      variance (“spread around the central value”)
    – 3rd moment: skewness, γx
      (“measure of symmetry”)
                                                                            γx > 0
    – 4th moment: kurtosis, κx
      (“peakedness of central portion of distribution”)
    – Coefficient of variation: measure of spread
                 CV = σx/µx

                                                            γx < 0


      (L-moments are used for small sample sizes;
      see Dingman (2002), Appendix C)
Method of moments – Gumbel distribution
                    F(X) = exp(-exp(-b(X-a)))
                    a = mean(Q) - 0.5772/b
                    b = π/(σQ 60.5)

  F(X) leads to P(X) and T(X) for a certain size of flow X
  Re-arranging the formulae leads to the size of flow for a
    given recurrence interval:

                    X = a – 1/b ln ln(T(X)/(T(X)-1))

  Example: for the 50-year flood, you need to compute the natural logarithm of
     50/49 and then the natural logarithm of this result. The parameters a and b
     are estimated from the sample.
  Rule of thumb: Do not extrapolate recurrence intervals
    beyond twice the length of your stream flow record
Example: Annual max. series for the river Wye
(1971-97); values required for the Gumbel formula

   Mean (Q)      Standard deviation (σQ)         a value        b value
    21.21            6.91                        18.11            0.19

       Applying the method of moments and Gumbel formula to the
    data gives some interesting results. The values used in the formula
    are shown in the table above and can be easily computed. When
    the formula is applied to find the flow values for an average
    recurrence interval of 50 years it is calculated as 39.1 m3/s. This is
    less than the largest flow during the record which under the Weibull
    formula has an average recurrence interval of 29 years. This
    discrepancy is due to the method of moments formula treating the
    highest flow as an extreme outlier. If we invert the formula we can
    calculate that a flood with a flow of 48.87 m3/s (the largest on
    record) has an average recurrence interval of around three
    hundred years.

(Davie, 2002)
Distributions often used in hydrology (1/2)




                                   (Dingman, 2002)
Distributions often used in hydrology (2/2)




                                   (Dingman, 2002)
  Use of common distributions in hydrology
Flood frequency analysis
Most commonly applied are the Exponential (EXP), Log-Normal (LN), Log-
Pearson 3 (LP3) and Generalised Extreme Value (GEV) distributions.

In practice
The choice of probability distribution may be dictated by mathematical
convenience or by familiarity with a certain distribution (“personal bias”).
Sample estimators must be adopted in order to obtain estimates of the statistics
for determining the distribution parameters. In some cases, more than one
distribution may fit the available data equally well.

General three-step procedure
1) A suitable form of standard frequency distribution is chosen to represent the
   observations;
2) the chosen distribution is fitted to the data by determining values for its
   parameters; and
3) the required quantiles are computed from the fitted cumulative distribution
   function (CDF).
   Distributions often used in hydrology
       1. Normal distribution (ND)
          Probably the most important distribution, but often not
          useful for hydrological extremes
                                 1      1  x  c 2 
         PDF:         f ( x)       exp            
                               a 2     2 a     
                                       
             a: scale parameter = standard deviation, σ
             c: location parameter = mean, μ
                                                                          y2 / 2
                                                                     e
      Standard Normal Distribution (S-ND)                 f ( y) 
                                                                          2
                                y
                                     1       u2
        CDF:         G( y )        2
                                        exp( )du
                                             2              y
                                                                 xm
                                                                   s
(compare lecture of Dr. Zhou)
Location parameters
A probability distribution is characterized by location and scale parameters.
Location parameter equal to zero and scale parameter equal to one (Standard-
ND) vs. ND with a location parameter of 10 and a scale parameter of 1.




Scale Parameter
The next plot has a scale parameter of 3 (location parameter is zero). The
effect is that the graph is ‘stretched out’ .
 Probability Distribution Functions
2.       The Lognormal distribution
                 y = ln(x)
     If y follows the Normal distribution, x follows the
Log-normal distribution.


Lognormal probability density function:

                 1         1  ln(x) - m y 2
     f(x) =          exp [- (            ) ] when x > 0
            x s y 2       2     sy

                        = 0 elsewhere
Lognormal distribution function
              0.6


              0.5


              0.4
  Frequency




              0.3


              0.2


              0.1


               0
                    0   1                2                   3                 4   5
                                              X variale

                            Standard deviation = 1        Standard deviation = 2


  Lognormal distribution is skewed to the right!
Effects of shape parameters for the
       lognormal distribution


               PDF




               CDF
Processing Log-normal Distribution Function
 Histogram indicates positive skewness
 distribution
 Plot of cumulative frequency on Lognormal
 probability paper shows a straight line
 Take logarithm transformation of data
 y = ln(x)
 Calculate sample mean and standard
 deviation of logarithmic values
 Carry out analysis on logarithmic values
3. Pearson type III distribution
   (Often used for flood frequencies)
                             1                      xc
 PDF:          f ( x)           ( x  c) b 1 exp     
                          a (b)
                           b
                                                      a 

a - scale parameter              b - shape parameter
c - location parameter                   
Г( ) – Gamma function: (b)   t b 1e t dt
                                          0


 commonly fitted to the logarithms of floods (so-
called log-Pearson type III distribution)



4. Gamma Distribution (when c=0)
Effects of shape parameter, gamma


               PDF




               CDF
5. Exponential distribution
Particularly useful when applying partial duration
series

PDF:        f ( x)   exp  x 
With a mean of 1/λ, a variance of 1/λ2, and a skewness of 2.



Standard Exponential Distribution

CDF:        G( y)  1  exp(y)
Plots of exponential distribution




                        CDF



    PDF
Example: Exponential distribution applied to
 storm interval times (from Bedient & Huber 2002)
6. General extreme value (GEV) distribution

                               k ( x  c ) 1 / k 
                                                   
   CDF:       F ( x1 )  exp  1                    for k  0
                              
                                     a     

   c - location parameter         a - scale parameter
   k - shape parameter


  k=0 Extreme value type I (EV1) (Gumbel)
  k<0 Extreme value type II (EV2)
  k>0 Extreme value type III (EV3)
      closely related to the Weibull distribution
       Comparison of the three types of
          GEV distributions (PDF)
    x                                    Type II, k<0

                                              Type I, k=0

                                               Type III, k>0
 x=c


                                                             y1
                0
If the sample for which frequency distribution is required exhibits
skewness, a three-parameter distribution is useful (e.g. GEV).
General Extreme Value distribution
  Type I (= Gumbel distribution)
-> widely used for annual maximum series!
(Note: little different in description (use of parameters) than
   in the example of the river Wye, from Davie (2002))
                                 x1  c 
 CDF:        F ( x1 )  exp exp       
                                   a 

                       1    x1  c       x1  c 
 PDF:        F ( x1 )  exp        exp       
                       a      a            a 

                                                        x1  c
 CDF:         G( y1 )  exp  exp  y1         y1 
                                                          a
 (standardized)
Plots of Gumbel distribution



                       CDF




  PDF
How good is the fit of the distribution
function?
• Graphic check: visual check of the plotted graph
  (How good are the observations reproduced by the fitted
  PDF/CDF?)
• Mathematical check: statistical test to determine the
  goodness fit
             - chi-square (χ2 ):                               PDF
     (needs much data, depends on classified intervals)
   - Kolmogorov-Smirnov (K-S) tests:                CDF
        Not necessary to divide the data into intervals; thus error associated
         with the number and size of intervals is avoided.
        Good if n>35 and even better if n>50.
        Quick and easy, but only one value is considered)
   - Unfortunately, often several distributions provide acceptable fits to
     the available data (no identification of the “true” or “best”
     distribution); confidence limits are too large
  Graphical method
  (Example of the river Wye)


                Is this fit suitable for the
                whole data set?




   Frequency of flows less than a value X. The F(X) values on the
x-axis have undergone a transformation to fit the Gumbel distribution;
called ‘reduced variate’ (cf. Workshop in Hydrology).
Comparison of different PDFs
(Significant differences in particular for the extremes!)
Example: Kolmogorov-Smirnov (K-S) tests
              (according to Schoenwiese 2000)

      Dn  1 max FX ( xi )  S n ( xi )                  (see sketch on
            n
                                                           black board)
          P( Dn  Dn )  1  

   FX(xi) - CDF of the assumed distribution
   Sn(xi) - CDF of the observed ordered sample

If Dn ≤ the tabulated value (see below) Dnα, the assumed
distribution is acceptable at the significance level α
(n: sample size).

α           0.20       0.10         0.05          0.01        0.001
Dnα      1.073/n0.5   1.224/n0.5   1.358 /n0.5   1.628/n0.5   1.040/n0.5
      Variability of quantile estimates –
               confidence limits
Sources of errors
• assumption of a particular distribution (cannot be quantified)
• sampling errors in estimation of the parameters of the
  distribution (quantifiable through standard errors)

Confidence limits (CL): 100(1-α)% confidence intervals
                                   QT : quantile
                                   C : constant (see Workshop page 13;
   CL (QT )  QT  t1 / 2 SEq    course notes from Hall page 31f)
                                   σ : standard deviation from a
                                   sample of size n
                       Cs
 Standard error: SEq              t1-α/2 : value of Student’s t-
                        n          distribution for a 97.5% level of
                                   confidence (two-tailed test) and
                                   (n-1) degrees of freedom; tabulated
Example: Lognormal with 90%-confidence limits




                                 (Bedient & Huber 2002)
  Example: Estimation of confidence limits
                (according to Schoenwiese 2000)
The mean annual temperature at the Hohenpeissenberg, Germany, for
the period 1954-1970 (n=17; Normal-distributed C=1) is 6.24 0C with a
standard deviation of 0.73 0C.
                   CL(QT )  QT  t1 / 2 SEq
The confidence limits (α = 5%) can be calculated as:

CL (mean annual temperature in 0C) = 6.24 ± t97.5% (0.73/170.5) 0C
                                   = 6.24 ± 0.38 0C

The mean of the annual temperature is at significance level of 95%
in the interval of 6.24 ± 0.38 0C, thus between 5.86 and 6.62 0C.

Please note, if the records lengths would have been 120 years
(= n) with the same standard deviation, the interval would be
6.24 ± 0.13 0C.
 A few remarks on:
 Low flow frequency analysis
Data required: annual minimum series
Problem of independent events; do not split the year in the
  middle of low period (i.e. low flow periods can be long)
Often zero-values (e.g. in arid climates or cold climates)
There is finite limit on how low a low flow can be (no
  negative flows!)
    Different statistical treatment of the data
    Fit an exponential distribution rather than, for instance, a log-
     normal distribution
    Other often used distributions are the Weibull, Gumbel, Pearson
     Type III, and log-normal distributions
  A few remarks on:
  Low flow frequency analysis




Figure 7.15 Two probability density
functions. The usual log-normal           Figure 7.16 Probability values
distribution (solid line) is contrasted   (calculated from the Weibull
with the truncated log-normal             sorting formula) plotted on a log
distribution (dashed line) that is        scale against values of annual
possible with low flows (where the        minimum flow (hypothetical
minimum flow can equal zero).             values).
   Application in the Rur river (7-1)
   Station Stah (Germany)
                             area              2245 km2
                             record 1953 to 2001

                            150
                                   at Stah
   Daily discharge (m3/s)




                            125

                            100

                             75

                             50

                             25

                              0
                            25-Mar-60    2-Dec-73   11-Aug-87   19-Apr-01



Prepared by Tu Min (2004)
Application in the Rur river (7-2)
Annual flood peaks (1954 - 2001)
                                          200
  the water years                                   Stah on the Roer




                      Discharge (m 3/s)
                                          150

    (Nov - Oct)                           100

                                           50

                                            0
                                             1950     1960    1970     1980    1990       2000

Homogeneity
                                          200
                      Discharge (m 3/s)             Stah on the Roer
                                          150                                        88
  Statistical tests
                                          100

  (change point)                           50
                                                                              1980
                                                             66
                                            0
                                             1950     1960    1970     1980    1990       2000
Application in the Rur river (7-3)
Flood frequency analysis
         Example – Normal distribution
                    3
water year peak Q (m /s) rank (m) Q in order p=(m-0.375)/(N+0.25)              t       Xcal    UCL     LCL
   1954         65          1                      28        0.013           -2.228    12.70   28.24   -2.83
   1955         76          2                      31        0.034           -1.829    24.05   37.65   10.44
   1956         66          3                      34        0.054           -1.604    30.47   43.04   17.90
   1957         59          4                      34        0.075           -1.439    35.17   47.03   23.32
   1958         71          5                      36        0.096           -1.306    38.96   50.26   27.66
   1959         65          6                      44        0.117           -1.192    42.19   53.04   31.33


                                                                     150
                                                   Normal
N = 48
                                Peak (m3/s)




                                                                     100

μ = 76.1
                                                                      50                    Observed
σ = 28.5                                                                                    Fitted
                                                                                            95% CL
                                                                       0                   Series4
t97.5% = 2.011                                -3        -2   -1         0          1           2           3
                                                                  Standard variate
Application in the Rur river (7-4)
Flood frequency analysis
         Example – LN distribution

  Y  ln( X )
                                                 5.5

N = 48                         LN                5.0


μy = 4.3
                 Ln (Q)



                                                 4.5

                                                 4.0
σy= 0.4
                                                                    Observed
                                                 3.5                Fitted
                                                                    95% CL
                                                                   Series4
t97.5% = 2.011            -3        -2   -1
                                                 3.0
                                                     0         1       2       3
                                              Standard variate
Application in the Rur river (7-5)
Flood frequency analysis
         Example – LN3 distribution

N = 48                                                    x(min)x(max)  x 2 med
                 Y  ln( X   )                  
                                                        x(min)  x(max)  2 xmed
Xmin = 27.6
Xmax = 139.3                                     5.5
                                  LN3
Xmed = 73.1
                 Ln (Q+73)




                                                 5.0
μy = 5.0                                                            Observed
                                                                    Fitted
σy= 0.2                                                             95% CL
                                                                   Series4
                                                 4.5
t97.5% = 2.011               -3     -2   -1         0
                                              Standard variate
                                                               1       2           3
                   Application in the Rur river (7-6)
                   Flood frequency analysis
                   N
                   μ
                         48
                        76.1
                             Example – Gumbel distribution
                     2
                   σ              810.9
                   α               22.2                                         Distribution fitting
                   ζ               63.3
   water year                  peak Q (m3/s) Rank (m) Q in order         Fi        Yi                       Xest            baised var(Xest)        LCL     UCL
                   1954              65            1           28       0.012   -1.494                          30.2               34.2              18.4   41.9
                   1955              76            2           31       0.032   -1.232                          36.0               26.7              25.6   46.3
                   1956              66            3           34       0.053   -1.076                          39.4               22.9              29.8   49.1
                   1957              59            4           34       0.074   -0.957                          42.1               20.4              33.0   51.2
                   1958              71            5           36       0.095   -0.857                          44.3               18.6              35.6   53.0


                                                                                                     200
                               200
                                                                                                                   1980-2001


                                                                                 Discharge (m 3/s)
                                                                                                     150
                                          Gumbel (1954-2001)
                               150
Peak (m3/s)




                                                                                                     100

                                                                                                     50

                               100                                                                     0
                                                                                                     200
                                                                                                           -2          -1      0          1     2       3   4      5
                                                                     Observed                                     1974-1979
                                                                                                                        Reduced            Gumbel variate
                                                                                 Discharge (m 3/s)



                                                                                                     150
                                50                                   Fitted
                                                                     95% CL                          100


                                 0                                  Series4                          50


              -2          -1         0      1     2        3            4              5              0
                                                                                                           -2          -1       0     1     2       3       4      5
                                      Reduced Gumbel variate                                                                   Reduced Gumbel variate
    Application in the Rur river (7-7)
    Flood frequency analysis
              Example – Gumbel distribution
                                                     Magnitude of T-year flood

                                  1 
           QT  ay  c y   ln  ln 1  
                                                                                                               58-79               80-01
                                                                                                              Magnitude of T-year flood
                                                                                                               54-01               LCL(54-01)

                                  T 
                                                                                                               UCL (54-01)
                                                                                                200



                α=




                                                                                 Discharge (m 3/s)
                          22.2      22.7     20.4      N-1 =      47                            160
                ζ=        63.3      52.9     76.2     t97.5% =   2.01
 Return      Reduced
                         1954-2001 1958-1979 1980-2001 Var (Est) LCL     UCL                    120
Period (yr) variate (yi)
    5          1.50       96.6      86.9     106.9     40.4      83.8    109.4
   10          2.25       113.3    104.0     122.2     73.7      96.0    130.5
   25          3.20       134.3    125.5     141.6     133.9     111.1   157.6                       80
   50          3.90       150.0    141.4     156.0     191.7     122.1   177.8                            1      2       3        4             5
                                                                                                               Reduced Gumbel variate
   75          4.31       159.0    150.7     164.4     230.4     128.5   189.5
   100         4.60       165.5    157.3     170.3     260.1     133.0   197.9
Take home messages
Frequency analysis, in particular of hydrological extremes,
  is prerequisite for sustainable water resources
  management
Annual max. series or partial duration series depends on
  the length of the record
Understanding of probability, recurrence interval and risk
Knowledge of often used statistical distributions
Calculation of confidence intervals
Test of the goodness of fit of a PDF or CDF
Specialty of low flow values
Closure
•   probability and event
•   stochastic variables, cont. and discrete
•   Transformations
•   Joint distributions
•   Linear regression
•   Parameter estimation (with Bestfit)
Closure
The role of statistics in hydrology and water
  resources is what?
You have now knowledge of
   Frequency tables,
   Histograms,
   Distributions functions,
   statistical descriptors, and
   Standard Normal Distribution   and   Log-Normal
    distribution.

  Are you now well prepared for the two
  assignments?
   Assignment 1: Basic Statistics
Use the data set from Van Gelder’s website
   1. Calculate the range and estimate a reasonable number of intervals as
      well as class limits.
   2. Calculate the relative and absolute frequencies.
   3. Make a histogram and cumulative frequency distribution, hand drawn
      on linear paper (figures). Interpret this briefly (one sentence!)
   4. Calculate the median, mode, and arithmetic mean.
   5. Calculate the variance, standard deviation and coefficient of variation.
   6. Calculate the skewness and kurtosis. Interpret the results briefly (one
      sentence!)
   7. Plot the data as a graph on normal probability paper (figure). Is the
      Normal-Distribution suitable for that data set? Compare the mean value
      and standard deviation from your graph with your results in question
      4+5.

Deliver your printed report to Pieter van Gelder on November 5th
    at 15.45h at the start of the lectures
Assignment 2: Frequency Analysis
Download your dataset from Van Gelder’s
  website
Determine the PDF with the lowest Chi-Square
  value in Bestfit
Include in your report a plot of the observations
  and optimal fit
Extrapolate the fitted Exponential distribution to
  a 10^-3 /yr quantile
Calculate the 95% Confidence Bounds around
  the 10^-3 /yr quantile for the Exponential fit
   Deliver your printed report to Pieter van Gelder on
    November 6th at 8.45h at the start of the lectures
 Assignment 3: Transformations of
 distributions
Download your parameters a and b from Van Gelder’s website
Generate 28 random numbers from the Uniform distribution with
   lowerbound a and upperbound b
Plot your data in a histogram and draw the PDF of the uniform
   distribution in the same plot
Generate 100 sets of 28 random numbers from the above Uniform
   distribution and take from each set the maximum number
Plot these 100 maxima in a histogram, derive the theoretical
   distribution function for these 100 maxima and draw the PDF in
   the same plot
Assume that the above 100 numbers are monthly maximum wind
   speeds. Transform your wind speeds to wind pressures and plot
   your wind pressures in a cumulative distribution plot.

    Deliver your printed report to Pieter van Gelder on
 November 13th at 17.00h at the reception of UNESCO-IHE
 Final mark for Review of statistics and
 frequency analysis (module 1)

Weight factor of all computer exercises is
 0.5 in your final mark
Weight factor of written test is 0.5 in your
 final mark

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:11/27/2012
language:English
pages:292