Introduction to Probability

Document Sample
Introduction to Probability Powered By Docstoc
					                   Introduction to Probability

                              Frank Betz

                                    EUI


                           September 2007




Frank Betz (EUI)         Introduction to Probability   September 2007   1 / 124
——————————————————————————–




 Frank Betz (EUI)   Introduction to Probability   September 2007   2 / 124
                                  Outline

1   Introduction

2   The Probability Space

3   Conditioning, independence, Bayes’ Rule and combinatorics

4   Random variables and probability distributions

5   Moments of random variables

6   Some common univariate distributions


     Frank Betz (EUI)       Introduction to Probability   September 2007   2 / 124
           Probability theory: why do we care?



The descriptive analysis of data does not allow for generalisations
beyond the data under considerations.
However, as economists our concern is to draw inference on the
underlying population.
Probability theory enables us to do just that: it develops
mathematical models for probability that provide the logical
foundations for statistical inference.




Frank Betz (EUI)         Introduction to Probability   September 2007   3 / 124
    Probability theory - statistics - econometrics


Probability Theory analyses characteristics of probability
mechanisms on the basis of a limited number of definitions and
axioms.
On the basis of data on trials and some maintained assumptions
about a probability mechanism Statistics ”estimates” its parameters,
or assesses ”hypotheses” about them.
Econometrics applies statistics to test or quantify economic models
and theories.




Frank Betz (EUI)      Introduction to Probability   September 2007   4 / 124
                         References




Spanos (1986) ”The statistical foundations of econometric modelling”
Appendix to Hansens’s Lecture Notes
    ıt
Benoˆ Champagne’s class notes ”Probability and random signals I”
Capinski and Kopp (2004) ”Measure, integral and probability”
Ivan Wilde’s script ”Measure, integration and probability”




Frank Betz (EUI)      Introduction to Probability   September 2007   5 / 124
                                  Outline



2   The Probability Space
      Approaches to Probability
      Random experiment, sample space, and event
      Sigma-field
      Probability




     Frank Betz (EUI)       Introduction to Probability   September 2007   6 / 124
                        Classical probability


Definition
If a random experiment can result in N mutually exclusive and equally
likely outcomes and if NA of these outcomes result in the occurrence of
                                                                NA
the event A, then the probability of A is defined by P(A) =      N


Problems:

    Applies only to situations with a finite number of outcomes.
    Due to the ’equally likely’ condition the definition is circular.




    Frank Betz (EUI)        Introduction to Probability    September 2007   7 / 124
             The frequency approach to probability



Definition
An experiment is repeated many times under similar conditions. For an
event of interest, we postulate a number PA , called the probability of the
event, and approximate PA by the relative frequency with which the
repeated observations satisfy the event.

To overcome the circularity of the classical definition the frequency
approach views probability as the limit of empirical frequencies.




    Frank Betz (EUI)        Introduction to Probability    September 2007   8 / 124
                       Axiomatic probability

Definition
A probability space is a triple (S, F , P(· )), where
     S is the sample space, a collection of possible outcomes of an
     experiment.
     F is the event space or set of events: a collection of subsets of S.
     P(· ) is the probability function defined on F : To any event, P(· )
     assigns the probability that the event is observed once the experiment
     is completed.

Modern probability theory and also the remainder of this course is
concerned with axiomatic probability. We now have a closer look at its
ingredients.

    Frank Betz (EUI)        Introduction to Probability   September 2007    9 / 124
                                  Outline



2   The Probability Space
      Approaches to Probability
      Random experiment, sample space, and event
      Sigma-field
      Probability




     Frank Betz (EUI)       Introduction to Probability   September 2007   10 / 124
                       Random experiment


Definition
A random experiment E is an experiment which satisfies the following
conditions:
    all possible distinct outcomes are known a priori;
    in any particular trial the outcome is not known a priori;
    it can be repeated under identical conditions.

    The axiomatic approach to probability can be viewed as a
    formalisation of the concept of a random experiment.




    Frank Betz (EUI)       Introduction to Probability   September 2007   11 / 124
                           Sample space




Definition
The sample space S is defined to be the set of all possible outcome of
the experiment E . The elements of S are called elementary events.




    Frank Betz (EUI)      Introduction to Probability   September 2007   12 / 124
                       Tossing a fair coin twice




Example
Consider the random experiment E of tossing a fair coin twice and
observing the faces turning up. The sample space of E is
S = {(HT ), (TH), (HH), (TT )} with (HT ), (TH), (HH), (TT ) being the
outcomes or elementary events of S.




    Frank Betz (EUI)        Introduction to Probability   September 2007   13 / 124
                                  Event
    An event is a subset of the sample space S, formed by set theoretic
    operations on the elementary events.
    An event occurs when any of the elementary events it comprises
    occurs.
    When a trial is made only one elementary event is observed , but a
    large set of events may have occured.
    Special events are S, the sure event, and Ø, the impossible event.

Example
A1 = {(HT ), (TH), (HH)} = {(HT )} ∪ {(TH)} ∪ {(HH)}, that is ’two
tails’ does not occur
A2 = {(TT ), (HH)} = {(TT )} ∪ {(HH)}, either ’two heads’ or ’two tails’
occur
    Frank Betz (EUI)      Introduction to Probability   September 2007   14 / 124
                                   Event



We observe that for two events A1 and A2 the following are also events:
    S\A1 = A1 , the complement of A1 and likewise for A2
    A1 ∪ A2 , the union of A1 and A2
    A1 ∩ A2 = A1 ∪ A2 , the intersection of A1 and A2
    A1 \A2 = A1 ∩ A2 = A1 ∪ A2 ...
To construct the events we only need the set theoretic operations
complementation and union.




    Frank Betz (EUI)       Introduction to Probability   September 2007   15 / 124
                                 Example

Example


        A1 = {(HT ), (TH), (HH)} = {TT }
  A1 ∪ A2 = {(HT ), (TH), (HH)} ∪ {(TT ), (HH)} = S
  A1 ∩ A2 = {(HT ), (TH), (HH)} ∩ {(TT ), (HH)} = {(HH)}
    A1 \A2 = {(HT ), (TH), (HH)} ∩ {(HT ), (TH)} = {(HT ), (TH)}

The occurence or non-occurence of A1 or A2 implies the occurence or
non-occurence of these events. A mathematical structure that aims to
assign probabilities to events has to take this into account.


    Frank Betz (EUI)        Introduction to Probability   September 2007   16 / 124
                                  Outline



2   The Probability Space
      Approaches to Probability
      Random experiment, sample space, and event
      Sigma-field
      Probability




     Frank Betz (EUI)       Introduction to Probability   September 2007   17 / 124
                         Set of events


We now understand the meaning and the properties of events. Our
goal, however, is to assign probabilities to these events.
The probability function P(·) : F −→ [0; 1] does just that. As we use
sets to represent events, its domain is a collection of sets, denoted by
F . To understand the probability function we first have to
understand its domain.
The key property of F is that if A ∈ F , then sets obtained via the
various set theoretic operations performed on A must also be
elements of F .



Frank Betz (EUI)       Introduction to Probability    September 2007   18 / 124
                                   Power set
Example
When tossing a fair coin twice the set of all subsets, that is the power set
of S is an obvious candidate for F . The power set is given by:

          P = {∅, S, {(HT )}, {(HH)}, {(TT )}, {(TH)},
                       {(HT ), (HH)}, {(TT ), (HH)}, {(TH), (HH)},
                       {(HT ), (TH)}, {(TT ), (TH)}, {(HT ), (TT )},
                       {(HT ), (TH), (TT )}, {(HT ), (TH), (HH)},
                       {(HH), (TT ), (TH)}, {(HH), (TT ), (HT )}}

However, one can only define F to be the power set of S, if S is finite or
countably infinite. To assign probabilities when S is uncountable we
require that F be a σ-field.
    Frank Betz (EUI)           Introduction to Probability   September 2007   19 / 124
                                Sigma-field

Definition
Let F be a set of subsets of S. F is called a sigma-field if:
    A ∈ F , then A ∈ F - closure under complementation; and
    Ai ∈ F , i = 1, 2, ..., then (   i   Ai ) ∈ F - closure under countable
    union

We observe that the definition implies:
    S ∈ F , because A ∪ A = S
    ∅ ∈ F , as S = ∅
    Ai ∈ F , i = 1, 2, ..., then (   i   Ai ) ∈ F



    Frank Betz (EUI)         Introduction to Probability      September 2007   20 / 124
                                Example



Example
Check whether the collections F1 and F2 are σ-fields where F1 , F2 are
given by

               F1 = {{(HT )}, {(HH), (TH), (TT )}, ∅, S},
               F2 = {{(HT ), (TH)}, {(HH)}, ∅, S}




    Frank Betz (EUI)       Introduction to Probability   September 2007   21 / 124
                                Example




Example
Suppose you are interested in the event ’first toss head’. How do you
obtain a σ-field?




    Frank Betz (EUI)       Introduction to Probability   September 2007   22 / 124
           Sigma-field generated by a family of sets


Definition
G is the σ-field generated by a family of sets A if

               G =     {F : F is a σ-field such that F ⊃ A }


    G is the smallest σ-field containing A
    Starting from some events of interest to obtain the σ-field generated
    by these events is necessary to construct σ-fields when S is infinite or
    uncountable.




    Frank Betz (EUI)        Introduction to Probability   September 2007   23 / 124
                              Borel field




Example
Let S be the real line R and suppose we are interested in events
J = {Bx : x ∈ R}, where Bx = {z : z ≤ x} = (−∞, x] Construct the
σ-field generated by Bx taking complements and countable unions.




    Frank Betz (EUI)       Introduction to Probability   September 2007   24 / 124
                                    Borel field
Definition
Put B =          {F : F is a σ-field containing all intervals}. We say that B is
the σ-field generated by all intervals and we call the elements of B Borel
sets.

        B is the smallest σ-field containing all intervals.
        We started with all intervals of the type (−∞, x] to generate B.
        However, we could have started with all intervals of any other type
        (all open intervals, all closed intervals,...) and still arrived at B.
        Not only does B contain all open intervals, it also contains all open
        sets as any open set is a countable union of open intervals.
        Countable sets are Borel sets, since each is a countable union of
        closed intervals of the form [a, a]; in particular N and Q are Borel sets.
        Frank Betz (EUI)        Introduction to Probability     September 2007   25 / 124
                                  Outline



2   The Probability Space
      Approaches to Probability
      Random experiment, sample space, and event
      Sigma-field
      Probability




     Frank Betz (EUI)       Introduction to Probability   September 2007   26 / 124
                                     Probability

Definition
Probability is defined as a set function on F satisfying the following
axioms:
    P(A) ≥ 0 for every A ∈ F
    P(S) = 1
    P(    i   Ai ) =   i   P(Ai ) if Ai is a sequence of mutually exclusive events
    in F - called countable additivity

    Hence, probability is a countably additive set function with domain F
    and range [0,1].



    Frank Betz (EUI)              Introduction to Probability   September 2007   27 / 124
           Properties of the probability set function



Proposition
Let P(·) be a probability measure on F . Then the following hold.

                            P(A) = 1 − P(A),                A∈F                     (1)
                             P(∅) = 0                                               (2)
        If A1 ⊂ A2 then P(A1 ) ≤ P(A2 ),                A1 , A2 ∈ F                 (3)
                       P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 )                  (4)




    Frank Betz (EUI)          Introduction to Probability         September 2007   28 / 124
           Properties of the probability set function


Proposition
If A1 ⊆ A2 ⊆ . . . with An ∈ F , n = 1, 2, . . . , then we have

                            lim P(An ) = P(             Ai )
                          n−→∞
                                                    i

If A1 ⊇ A2 ⊇ . . . with An ∈ F , n = 1, 2, . . . , then we have

                            lim P(An ) = P(             Ai )
                          n−→∞
                                                    i




    Frank Betz (EUI)         Introduction to Probability       September 2007   29 / 124
                       Probability set function




Example
Is it possible for two events A and B where P(A) = 0.7, P(B) = 0.9 and
P(A ∩ B) = 0.5?




    Frank Betz (EUI)       Introduction to Probability   September 2007   30 / 124
                                     Measure



Definition
A finite measure on a measurable space (S, F ) is a map
µ : F −→ [0, ∞), such that if A1 , A2 , . . . is any sequence of pairwise
disjoint members of F , then

                            µ(       Ai ) =       µ(Ai )
                                 i            i

This requirement is referred to as countable additivity.




    Frank Betz (EUI)         Introduction to Probability   September 2007   31 / 124
                   Probability and measure




A measure space is a triple (S, F , µ), where µ is a measure on the
σ-field F of subsets of S.
If µ(S) = 1, then µ is called a probability measure and (S, F , µ) is
called a probability space.
Probability is therefore a special case of a finite measure.




Frank Betz (EUI)        Introduction to Probability   September 2007   32 / 124
                                  Outline



3   Conditioning, independence, Bayes’ Rule and combinatorics
      Conditional Probability
      Independence
      Bayes’ Theorem
      Combinatorics




     Frank Betz (EUI)       Introduction to Probability   September 2007   33 / 124
                       Conditional probability




Example
Suppose that in the case of tossing a fair coin twice we know that in the
first trial it was heads. What does it imply for the sample space, the
associated σ-field and the corresponding probabilities?




    Frank Betz (EUI)       Introduction to Probability   September 2007   34 / 124
                       Conditional probability




Definition
                                                         P(A∩B)
Suppose that P(B) > 0. Then the number P(A|B) =           P(B)    is called the
conditional probability of A given B.




    Frank Betz (EUI)       Introduction to Probability    September 2007   35 / 124
                               Example




Example
An urn contains 3 white and 2 red balls. Suppose you draw two balls
without replacement and consider the events A ’second ball red’ and B
’first ball red’. Compute P(A), P(B), P(A ∩ B), and P(A|B).




    Frank Betz (EUI)      Introduction to Probability   September 2007   36 / 124
                                  Outline



3   Conditioning, independence, Bayes’ Rule and combinatorics
      Conditional Probability
      Independence
      Bayes’ Theorem
      Combinatorics




     Frank Betz (EUI)       Introduction to Probability   September 2007   37 / 124
                            Independence



Definition
Two events A and B are independent if and only if

                       P(A ∩ B) = P(A)P(B), or                             (5)
                        P(A|B) = P(A), or                                  (6)
                        P(B|A) = P(B)                                      (7)

where (6) and (7) hold only if P(A) > 0 and P(B) > 0 respectively.




    Frank Betz (EUI)       Introduction to Probability   September 2007   38 / 124
                              Independence



Example
A student takes two courses - calculus and statistics. It is known that the
probability that he passes both courses is 0.5, that he passes only calculus
equals 0.2, that he passes only statistics is 0.1, and that he fails both
courses is 0.2. Is the performance in statistics independent of the
performance in calculus?




    Frank Betz (EUI)         Introduction to Probability   September 2007   39 / 124
                            Independence of n events

Definition
Events A1 , ..., An are independent if and only if

               P(Ai ∩ Aj ) = P(Ai )P(Aj ), 1 ≤ i < j ≤ n
        P(Ai ∩ Aj ∩ Ak ) = P(Ai )P(Aj )P(Ak ), 1 ≤ i < j < k ≤ n
                            ...
                 P(         Ak ) =        P(Ak )
                        k             k

Put differently, the events A1 , ..., An are independent if for all k ≤ n for
each choice of k events, the probability of their intersection is the product
of the probabilities.


     Frank Betz (EUI)             Introduction to Probability   September 2007   40 / 124
                       Independence of n events



There are 2n − n − 1 equations.
Independence of n events implies independence of any subset
A1 , ..., Ak .
If only P(Ai ∩ Aj ) = P(Ai )P(Aj ) holds, the events are called pairwise
independent.
However, P(        k   Ak ) =      k   P(Ak ) does not imply pairwise
independence




Frank Betz (EUI)                Introduction to Probability    September 2007   41 / 124
                                  Outline



3   Conditioning, independence, Bayes’ Rule and combinatorics
      Conditional Probability
      Independence
      Bayes’ Theorem
      Combinatorics




     Frank Betz (EUI)       Introduction to Probability   September 2007   42 / 124
                          The law of total probability




Theorem
Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and
  i   Ai = S. Let B ⊂ S. Then

                             P(B) =         P(B|Ai )P(Ai )
                                        i




       Frank Betz (EUI)         Introduction to Probability   September 2007   43 / 124
                       The law of total probability




Example
Prove the law of total probability.




    Frank Betz (EUI)         Introduction to Probability   September 2007   44 / 124
                                Bayes’ Rule



Theorem
Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and
  i   Ai = S. Let B ⊂ S where P(B) > 0. Then

                                         P(B|Ai )P(Ai )
                          P(Ai |B) =
                                         j P(B|Aj )P(Aj )


In Bayes’ formula A1 , ..., An are often called hypotheses, P(Ai ) a priori
probability of Ai , and P(Ai |B) a posteriori probability of Ai .




       Frank Betz (EUI)       Introduction to Probability    September 2007   45 / 124
                         Bayes’ Rule




Example
Prove Bayes’ Rule.




    Frank Betz (EUI)   Introduction to Probability   September 2007   46 / 124
                                Example




Example
A product is produced by two machines 1 and 2. Machine 1 produces 40%
of the output, but 5% of its output is substandard. Machine 2 produces
60% of the output, but 10% of the products are deficient. Compute the
probability that a deficient product is from machine 1.




    Frank Betz (EUI)       Introduction to Probability   September 2007   47 / 124
                                  Outline



3   Conditioning, independence, Bayes’ Rule and combinatorics
      Conditional Probability
      Independence
      Bayes’ Theorem
      Combinatorics




     Frank Betz (EUI)       Introduction to Probability   September 2007   48 / 124
                            Classical probability

Classical probability rests on the assumption that the sample space is
finite.
  1   Suppose S = s1 , ..., sn , that is E results in n elementary events.
  2   Find n positive real numbers p1 , ..., pn such that                 i   pi = 1 These
      numbers represent the probabilities of the elementary events
  3   Obtain the field of events F as the power set of S. Then any event
      A ∈ F can be represented as Ai = si1 , ..., sik with
      P(Ai ) = pi1 + ... + pik .
                                                                          1
Suppose all elementary events are equally likely, that is p1 = ... = pn = n .
                                                         |A|
Then, for any event A ∈ F define P(A) =                   |S| ,   called the classical
probability of A.

      Frank Betz (EUI)             Introduction to Probability                September 2007   49 / 124
                        Combinatorics



Combinatorics is located in the realm of classical probability. It
evolved from the analysis of chance games where the elementary
events are typically equally likely.
Many of these problems can be formulated as drawing k objects from
a set of n elements with or without replacement while being
concerned or not concerned with the ordering of the draw.




Frank Betz (EUI)         Introduction to Probability   September 2007   50 / 124
                Ordered sample, with replacement



Proposition
Suppose k objects are drawn with replacement out of a set of n elements.
Then the total number of ordered k-tuples is nk .

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
with replacement and write down the sequence of numbers.




    Frank Betz (EUI)       Introduction to Probability   September 2007   51 / 124
              Ordered sample, without replacement


Proposition
Suppose k objects are drawn without replacement out of a set of n
elements. Then the total number of ordered k-tuples is
                                    n!
n(n − 1)(n − 2)...(n − k + 1) =   (n−k)! ,   denoted by P(n, k).

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
without replacement and write down the sequence of numbers.




    Frank Betz (EUI)       Introduction to Probability       September 2007   52 / 124
             Unordered sample, without replacement



Proposition
Suppose k objects are drawn without replacement out of a set of n
                                                                    n!
elements. Then the total number of unordered k-tuples is         (n−k)!k! ,
              n
denoted by    k   or C (n, k).

Example
Take k balls in a single draw from the urn containing n balls.




    Frank Betz (EUI)             Introduction to Probability   September 2007   53 / 124
                Unordered sample, with replacement


Proposition
Suppose k objects are drawn without replacement out of a set of n
elements. Then the total number of unordered k-tuples is
(n+k−1)!        n+k−1
 (n−1)!k!   =     k     = C (n + k − 1, k)

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
with replacement and note the sequence of numbers. You then count how
many times the 1st,...,nth ball appeared.




    Frank Betz (EUI)           Introduction to Probability   September 2007   54 / 124
                                Example




Example
In Poker, a player is dealt five cards from a 52 card deck. Compute the
number of possibilities.




    Frank Betz (EUI)       Introduction to Probability   September 2007   55 / 124
                                 Example




Example
In a lottery six balls are drawn from an urn containing 49 distinctly
numbered balls. Compute the probability that a player correctly guesses
the numbers of four balls out of the six balls drawn.




    Frank Betz (EUI)        Introduction to Probability   September 2007   56 / 124
       Useful formulas for unordered sample without
                             replacement

Proposition
Binomial Expansion:

                                             n i n−i
                       (x + y )n =             x y
                                             i
                                       i


Proposition
Pascal’s triangle:
                        n−1    n−1                       n
                             +                    =
                        k −1    k                        k



    Frank Betz (EUI)       Introduction to Probability       September 2007   57 / 124
                                  Outline




4   Random variables and probability distributions
      Random variable
      Distribution and density functions




     Frank Betz (EUI)       Introduction to Probability   September 2007   58 / 124
                      Why do we care?

The concept of the probability space (S, F , P(·)) was developed to
consistently assign probabilities to events, especially when the sample
space S is infinite.
However, the mathematical manipulation of P(·) is difficult, as its
domain is a σ-field F of arbitrary sets. To go further in our
development of a mathematical probability model we need a more
flexible framework.
However, random experiments E often yield quantifiable outcomes,
i.e. outcomes that can be represented by ’numbers’. The concept of a
random variable exploits this opportunity by assigning numbers to
outcomes without altering the probabilistic structure of (S, F , P(·)).


Frank Betz (EUI)       Introduction to Probability   September 2007   59 / 124
                         Coin toss revisited


Example
Suppose that in our familiar example we are interested in the number of
heads. Consider the function X (·) : S −→ Rx that maps all elementary
events si onto the set Rx = {0, 1, 2}. Hence, X (·) is a function that
assigns numbers to the outcome of the random experiment.
However, for X (·) to be a random variable we require more. A random
variable has to preserve the event structure of (S, F , P(·)). This means,
that for every value of X , there exists a corresponding subset in F and
that the mapping preserves unions and complements. . .




    Frank Betz (EUI)        Introduction to Probability   September 2007   60 / 124
                          Coin toss revisited


Example
. . . We obtain:

                        X −1 (0) = {(TT )}
                        X −1 (1) = {(TH), (HT )}
                        X −1 (2) = {(HH)}

If X −1 (0) ∈ F , X −1 (1) ∈ F , X −1 (2) ∈ F , X −1 (0 ∪ 1) ∈ F ,
X −1 (1 ∪ 2) ∈ F , and X −1 (0 ∪ 2) ∈ F , then X (·) is a random variable
with respect to F .



     Frank Betz (EUI)        Introduction to Probability    September 2007   61 / 124
                            Random variable



Definition
A random variable X is a real valued function from S to R, which
satisfies the condition that for each Borel set B ∈ B on R, the set

                       X −1 (B) = {s : X (s) ∈ B, s ∈ S}

is an event in F .

.




    Frank Betz (EUI)         Introduction to Probability   September 2007   62 / 124
                   Features of a random variable



A random variable is a real valued function. It is neither ’random’ nor
a ’variable’.
A random variable is always defined relative to some specific σ-field.
In deciding whether some function Y (·) : S −→ R is a random
variable we proceed from the elements of the Borel field B to those
of the σ-field F and not the other way round.




Frank Betz (EUI)          Introduction to Probability   September 2007   63 / 124
                       Features of a random variable


There is no need to consider all Borel sets B ∈ B. B is the σ-field
generated by all intervals of the type (−∞, x]. Hence, if X (·) is such that

             X −1 ((−∞, x]) = {s : X (s) ∈ (−∞, x], s ∈ S} ∈ F

for all (−∞, x] ∈ B, then

                       X −1 (B) = {s : X (s) ∈ B, s ∈ S} ∈ F

for all B ∈ B since all Borel sets can be expressed in terms of the
semi-closed intervals (−∞, x].



    Frank Betz (EUI)            Introduction to Probability    September 2007   64 / 124
                       Tossing the coin, once more
Example
Consider X - number of heads and let

       F    = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)},
                 {(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}}

Then                         
                             ∅,
                                                          x <0
                             
                             
                             
                             {(TT )},
                             
                                                           0≤x <1
                −1
               X ((−∞, x]) =
                             {(TH), (HT )},
                                                          1≤x <2
                             
                             
                             
                             
                             {(HH)},                      2≤x

and X −1 ((−∞, x]) ∈ F for all x ∈ R and thus X (·) is a random variable
with respect to F .
    Frank Betz (EUI)         Introduction to Probability          September 2007   65 / 124
                         Example, continued
Example
Let Y equal 1 if the first toss is a head. F given as before by

       F    = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)},
                 {(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}}

Then                            
                                ∅,
                                                y <0
                                
                                
               Y −1 ((−∞, y ]) = {(TH), (TT )}, 0 ≤ y < 1
                                
                                
                                
                                {(HT ), (TT )}, 1 ≤ y

Thus, Y −1 ((−∞, y ]) ∈ F for y = 0, Y = 1. Therefore, Y (·) is not a
                      /
random variable with respect ot F . How can we turn Y (·) into a random
variable?
    Frank Betz (EUI)        Introduction to Probability   September 2007   66 / 124
                    The set function PX (·)


To assign probabilities to the Borel set B ∈ B we define the set
function PX (·) : B −→ [0, 1] such that

              PX (B) = P(X −1 (B)) = P(s : X (s) ∈ B, s ∈ S)

for all B ∈ B.
Again, there is no need to consider all Borel sets B ∈ B when defining
PX (·). It is sufficient to consider semi-closed intervals of the type
(−∞, x] as all Borel sets can be expressed in terms of these intervals.




Frank Betz (EUI)         Introduction to Probability   September 2007   67 / 124
                         Tossing a coin twice


Example
X (·), the number of heads, is defined with respect to the same σ-field F
as before. Then,
                                     
                                     0,
                                                 x <0
                                     
                                     
                                     
                                     1,
                                     
                                                  0≤x <1
                       PX ((−∞, x]) = 4
                                     3,
                                     4           1≤x <2
                                     
                                     
                                     
                                     
                                     1,          2≤x




    Frank Betz (EUI)        Introduction to Probability    September 2007   68 / 124
                       Probability space transformed

Using the concept of a random variable, we have transformed the
proability space (S; F , P(·)) into the equivalent probability space
(R, B, PX (·)). In particular:
     S, a set of arbitrary elements, has been replaced by the real line R.
     Correspondingly we traded F , the σ-field of subsets of S against B,
     the Borel field on the real line.
     P(·) a set function defined on arbitrary sets has been replaced by
     PX (·) a set function defined on semi-closed intervals on the real line.
Hence, we have obtained a more flexible framework to develop a
probability model.


    Frank Betz (EUI)          Introduction to Probability   September 2007   69 / 124
                                  Outline




4   Random variables and probability distributions
      Random variable
      Distribution and density functions




     Frank Betz (EUI)       Introduction to Probability   September 2007   70 / 124
           Motivation for distribution function

Though we defined PX (·) on semi-closed intervals rather than
arbitrary sets, it is still a set function.
We would like to simplify it further by transforming it into a point
function, which can be easily represented by an algebraic formula.
This seems feasible as the intervals (∞, x] differ only in their ’end
point’ x.
Heuristically, one defines F (·) as a point function by

                   PX ((−∞, x]) = F (x) − F (−∞), ∀x ∈ R,

and assigning the value zero to F (−∞).


Frank Betz (EUI)          Introduction to Probability   September 2007   71 / 124
                          Distribution function

Definition
Let X be a random variable defined on (S, F , P(·)). The point fucntion
F (·) : R −→ [0, 1] defined by

                  F (x) = PX ((−∞, x]) = Pr (X ≤ x),        ∀x ∈ R

is called the distribution function of X and satisfies the following
properties:
  1   F (x) is non-decreasing;
  2   F (−∞) = limx−→−∞ F (x) = 0, F (∞) = limx−→∞ F (x) = 1
  3   F (x) is continuous from the right
      (limh+ −→0 F (x + h+ ) = F (x), ∀x ∈ R).

      Frank Betz (EUI)        Introduction to Probability      September 2007   72 / 124
                       Discrete random variable



Definition
A random variable is called discrete if its Range RX is a countable set.

Example
We roll three dice one time and define X to be the sum of the numbers
that occur. The set RX = {3, 4, . . . , 18} is finite and thus X is discrete.




    Frank Betz (EUI)         Introduction to Probability   September 2007   73 / 124
                       Continuous random variable


Definition
A random variable is called (absolutely) continuous if its distribution
function F (x) is continuous for all x ∈ R and there exists a non-negative
function f (x) on the real line such that
                                    x
                         F (x) =        f (u) du, ∀x ∈ R
                                   −∞


Note that continuity of X requires more than a continuous distribution
function F (X ). F (x) must be derivable by integrating some non-negative
function f (x).



    Frank Betz (EUI)         Introduction to Probability   September 2007   74 / 124
                                  Density function


Definition
Let F(x) bef the distribution function of the r.v. X . The non-negative
function f (x) defined by
                              x
                 F (x) =           f (u) du, ∀x ∈ R − continuous
                             −∞

or
                        F (x) =         f (u), ∀x ∈ R − discrete
                                  u≤x

is said to be the (probability) density function (pdf) of X .




     Frank Betz (EUI)              Introduction to Probability     September 2007   75 / 124
                       Uniform distribution


Example
Suppose X takes values in the interval [a, b] and all values of X are
attributed the same probability. In this case, we say that X is uniformly
distributed with distribution function
                                 
                                 0,
                                      x <a
                                 
                                 
                         F (x) = x−a , a ≤ x < b
                                  b−a
                                 
                                 
                                 1,   x ≥b

How does the corresponding density function look like?




    Frank Betz (EUI)        Introduction to Probability   September 2007   76 / 124
             Properties of the density function




f (x) ≥ 0, ∀x ∈ R
 ∞
 −∞ f (x)    dx = 1
                         b
Pr (a < X < b) =        a    f (x) dx
          d
f (x) =   dx F (x)   at every point where F (x) is continuous




Frank Betz (EUI)              Introduction to Probability   September 2007   77 / 124
                                     Example



Example
Let fX (x) be given by
                                
                                ax + b              x ∈ [−2b, 2b]
                       fX (x) =
                                0, otherwise

For which a, b is fX (x) a probability density function?




    Frank Betz (EUI)            Introduction to Probability          September 2007   78 / 124
                                Outline




5   Moments of random variables
      Mean
      Variance
      Moment generating function




     Frank Betz (EUI)     Introduction to Probability   September 2007   79 / 124
                                     Mean


Definition
The mean or expected value of a random variable X is given by
                                        ∞
                          E (X ) =          x fx (x) dx
                                      −∞

if X is continuous, and
                             E (X ) =           xi pi
                                            i

if X is discrete.




     Frank Betz (EUI)       Introduction to Probability   September 2007   80 / 124
                                Example



Example
Suppose you are in a casino and offered to play the following game. An
urn contains 100 balls, numbered from 0 to 99. Twelve balls will be drawn
randomly from the urn. Before they are selected you are asked to choose
any number N from 0 to 99. If the ball numbered N is among the twelve
balls drawn from the urn you will be paid $N. Would you be willing to play
if you have to pay $15 to enter the game? What is the optimal strategy?




    Frank Betz (EUI)       Introduction to Probability   September 2007   81 / 124
                                Example




Example
A fair coin is tossed until two successive heads occur. Let X be the
number of tosses required and compute E (X ).




    Frank Betz (EUI)       Introduction to Probability   September 2007   82 / 124
                         Properties of the expectation



  1   E (c) = c if c is a constant
  2   E (aX1 + bX2 ) = aE (X1 ) + bE (X2 ) for any two r.v.’s X1 and X2
      whose means exist and a, b are real constants
                       1
  3   P(X ≥ λE (X )) ≤ λ , for a positive r.v. X and λ > 0. This result is
      known as Markov’s inequality.

Properties (1) and (2) define E (·) as a linear operator. Hence
E (aX + b) = aE (X ) + b.




      Frank Betz (EUI)          Introduction to Probability   September 2007   83 / 124
                       Existence of the expectation


For the expectation to exist we require that
                              ∞
                                    |x|fX (x)dx < ∞
                             −∞

if the random variable is continuous or

                                    |x|f (xi ) < ∞
                                i

if the random variable is discrete.




    Frank Betz (EUI)         Introduction to Probability   September 2007   84 / 124
                                  Example




Example
Show that in case of the Cauchy distribution given by
                                      1
                       fX (x) =               ,     ∀x ∈ R
                                  π(1 + x 2 )

the mean does not exist.




    Frank Betz (EUI)        Introduction to Probability      September 2007   85 / 124
      Expectation of a function of a random variable


Definition
If X is a random variable, and there exists a function g (·) : R −→ R, then
                                         ∞
                        E (g (X )) =         g (x)fx (x) dx
                                        −∞

if X is continuous, and

                           E (g (X )) =          g (xi )pi
                                             i

if X is discrete.




     Frank Betz (EUI)        Introduction to Probability      September 2007   86 / 124
                                Outline




5   Moments of random variables
      Mean
      Variance
      Moment generating function




     Frank Betz (EUI)     Introduction to Probability   September 2007   87 / 124
                                    Variance
Definition
The variance of a random variable X is defined by

                       Var (X ) = E (X − E (X ))2
                                           ∞
                               =               (x − E (X ))2 f (x)dx
                                      −∞

                               =           (xi − E (X ))2 pi
                                       i

when X is continuous and discrete respectively.

    Often, it is more convenient to work with the equality
    Var (X ) = E (X 2 ) − E 2 (X ).
    The root of the variance is called standard deviation.

    Frank Betz (EUI)           Introduction to Probability             September 2007   88 / 124
                       Uniform Distribution




Example
Let X be uniformly distributed on the interval [0, b]. Compute its mean
and variance.




    Frank Betz (EUI)       Introduction to Probability   September 2007   89 / 124
                       Properties of the variance




1   Var (c) = 0 for any constant c.
2   Var (aX ) = a2 Var (X ) for a constant.
                            Var (X )
3   P(|X − E (X )| ≥ k) ≤     k2
                                     ,k   > 0, which is known as Chebyshev’s
    inequality. It provides an upper bound on dispersion as defined by
    |X − E (X )| ≥ k.




    Frank Betz (EUI)        Introduction to Probability     September 2007   90 / 124
                               Example




Example
A random variable X has mean E (X ) = 8 and Variance Var (X ) = 4.
Provide an upper bound for the probability P(X ≤ 4      X ≥ 20).




    Frank Betz (EUI)      Introduction to Probability    September 2007   91 / 124
                                Outline




5   Moments of random variables
      Mean
      Variance
      Moment generating function




     Frank Betz (EUI)     Introduction to Probability   September 2007   92 / 124
                    Higher order moments


If it exists, the r th raw moment is defined by
                                  ∞
              µr = E (X r ) =         x r fx (x) dx,    r = 0, 1, 2, ...
                                −∞

The mean is the first raw moment µ1 = E (X ) = µ.
If it exists, the r th central moment is defined by
                                 ∞
        µr = E (X − µ)r =            (x − µ)r fx (x) dx,      r = 0, 1, 2, ...
                                −∞

The variance is the second central moment µ2 = E (X − µ)2 = σ 2 .



Frank Betz (EUI)          Introduction to Probability          September 2007    93 / 124
                           Skewness



                                                      µ3
The coefficient of skewness is given by S =             3/2 ,   where µi is the
                                                     µ2
ith central moment.
If S < 0 we say the distribution is negatively skewed or skewed to the
left. If S > 0 the distribution is positively skewed or skewed to the
right.
If the distribution is symmetric, then all odd central moments are
equal to zero. Hence S = 0.




Frank Betz (EUI)       Introduction to Probability            September 2007   94 / 124
                              Kurtosis



                                                     µ4
The coefficient of kurtosis is given by K =            µ2
                                                          − 3, where µi is the
                                                      2
ith central moment.
Distributions with zero kurtosis are called mesokurtic. The normal
distribution is mesokurtic.
If K > 0 the distribution is called leptokurtic. Leptocurtic
distributions have a more pronounced peak. If K < 0 the distribution
is called platykurtic and has a relatively flat peak.




Frank Betz (EUI)       Introduction to Probability           September 2007   95 / 124
                       Moment generating function


Definition
Let X be a random variable with pdf fX (·). If there exists a real constant
h > 0 such that E (e tX ) exists for all |t| < h, then mX = E (e tX ) is called
moment generating function.

     If X is discrete the mgf is given by mX =             i   e txi pi .
                                                                             ∞   tx
     In case of continuous X , the mgf is given by mX =                      −∞ e fX (x)dx.

     The mgf need not exist!




    Frank Betz (EUI)         Introduction to Probability                    September 2007   96 / 124
                         Properties of the mgf

If the moment generating function exists, then
    all raw moments of a random variable X can be derived as
                                   d r mX (t)
                          µr = (              )t=0 = E (X r )
                                       dt r

                                          1                          1
                  mX (t) = E (1 + xt +       (xt)2 + ...) =            µ ti
                                          2!                         i! i
                                                                i

    Let X , Y be random variables with associated densities fX (x), fY (y ).
    If the moment generating functions mX (t), mY (t) exist and
    mX (t) = mY (t), ∀t ∈ (−h, h), h > 0, then FX (·) = FY (·).



    Frank Betz (EUI)         Introduction to Probability            September 2007   97 / 124
                                 Example




Example
The probability density function of an exponentially distributed random
variable is given by
                         fX (x) = θe −θx ,        x ≥0

Find Var (X ), using the mgf.




    Frank Betz (EUI)        Introduction to Probability   September 2007   98 / 124
                       Characteristic function


The characteristic function is an alternative to the moment
generating function. It is given by
                                                         √
                         ψX (t) = E (e itX ),      i=        −1

The advantage of the characteristic function is that it always exists,
since |e itX | = | cos tX + sin tX | ≤ 1, such that convergence is
guaranteed.
To obtain moments from the characteristic function compute
i r µr = ψ (r ) (0).



Frank Betz (EUI)           Introduction to Probability            September 2007   99 / 124
                                     Outline




6   Some common univariate distributions
      Discrete distributions
      Continuous distributions




     Frank Betz (EUI)          Introduction to Probability   September 2007   100 / 124
                   Parametric families


We can now define our probability model in the form of a parametric
family of density functions, denoted by Φ = {f (x; θ), θ ∈ Θ}. Every
member of Φ is indexed by a parameter θ, which belongs to the
parameter space Θ.
Choosing a parametric family to model a real world phenomenon
assumes that the data have been generated by one of the densities in
Φ. The uncertainty relating to the outcome of a particular trial of the
random experiment E has been transformed into the uncertainty
regarding the value of the parameter θ.




Frank Betz (EUI)       Introduction to Probability   September 2007   101 / 124
                       Bernoulli distribution

    A random experiment with outcomes ”success” (x = 1) and ”failure”
    (x = 0) and associated probabilities p and q = 1 − p is called
    Bernoulli experiment.
    A Bernoulli distributed rv has probability function
                                
                                p x (1 − p)1−x , x = 0, 1
                    fX (x; p) =
                                0,               otherwise

    The distribution function follows immediately. A Bernoulli random
    variable has mean E (X ) = p and Variance Var (X ) = p(1 − p).

Example
Tossing a coin once. X - ’head occurs’

    Frank Betz (EUI)        Introduction to Probability   September 2007   102 / 124
                   Binomial distribution


A binomial r.v. counts the number of successes in a sequence of n
independent Bernoulli experiments.
The probability function of a binomial r.v. is given by
                       
                        n p x (1 − p)1−x , x = 0, 1, 2, . . . n
                          k
        bX (x; n, p) =
                       0,                    otherwise

A binomially distributed rv has mean E (X ) = np and variance
Var (X ) = np(1 − p).




Frank Betz (EUI)        Introduction to Probability   September 2007   103 / 124
                                 Example




Example
A fair dice is rolled three times. Compute the probability of at least two 6.




    Frank Betz (EUI)        Introduction to Probability   September 2007   104 / 124
                                Example




Example
Obtain the expectation of a binomial random variable in three different
ways.




    Frank Betz (EUI)       Introduction to Probability   September 2007   105 / 124
                                 Example




Example
How many times do we have to throw a dice for the probability to obtain
no 6 to be lower than 5%?




    Frank Betz (EUI)        Introduction to Probability   September 2007   106 / 124
                    Poisson distribution

The Poisson distribution is often used to model the number of events
occuring within a given time interval.
A Poisson rv has probability function
                            
                            e −λ λx , x = 0, 1, 2, . . .
                                   x!
               pX (x; λ) =
                            0,        otherwise

Mean and variance of the Poisson distribution are given by
E (X ) = Var (X ) = λ. Hence, λ can be understood as the average
number of events, that is the arrival rate of events, over the unit
interval.


Frank Betz (EUI)        Introduction to Probability   September 2007   107 / 124
                                Example




Example
At a manufacturing plant accidents have occured once every two months.
Assuming accidents occur independently, what is the expected number of
accidents per year? What is the probability that in a given month there
will be no accidents?




    Frank Betz (EUI)       Introduction to Probability   September 2007   108 / 124
                                 Example




Example
Suppose X ∼ B(n, p) and let n −→ ∞, p −→ 0, such that np = λ. Show
                                 k
that limn−→∞ b(k; n, λ ) = e −λ λ
                     n          k!




    Frank Betz (EUI)        Introduction to Probability   September 2007   109 / 124
                      Geometric distribution

In a sequence of independent Bernoulli experiments, the number of
trials before achieving ’success’ for the first time is geometrically
distributed.
The geometric distribution has probability function
                            
                            (1 − p)x p, x ∈ N0
                gX (x; p) =
                            0,             otherwise

                                                       1−p
The geometric distribution has mean E (X ) =            p    and variance
              1−p
Var (X ) =     p2
                  .
The geometric distribution is also called a ”discrete waiting time”
distribution.

Frank Betz (EUI)         Introduction to Probability         September 2007   110 / 124
                                Example




Example
Show that the geometric distribution is ”memoryless”, that is
P(X > a + b|X > a) = P(X > b).




    Frank Betz (EUI)       Introduction to Probability   September 2007   111 / 124
                   Hypergeometric distribution

Suppose that without replacement we draw a sample of size n out of
a population of size N, K of which share a certain property. Then, a
rv X counting the number of elements sharing this property in the
sample is hypergeometrically distributed.
The hypergeometric distribution has probability function
                              K N−K
                              ( x )( n−x )
                                           , x = 0, 1, . . . , n
          hX (x; n, N, K ) =        (N )
                                     n
                             0,
                             
                                              otherwise

The hypergeometric distribution has mean E (X ) = n K and variance
                                                    N
Var (X ) = n K N−K
             N N
                      N−n
                      N−1 .



Frank Betz (EUI)         Introduction to Probability     September 2007   112 / 124
                                Example




Example
Suppose you are dealt five cards from a standard 52 card Poker deck.
Compute the probability of receiving at least two aces.




    Frank Betz (EUI)       Introduction to Probability    September 2007   113 / 124
                                  Example




Example
                                                                            K
Suppose X ∼ H(x; n, N, K ) and let N −→ ∞, K −→ ∞ such that                 N   = p.
Show that lim hX (x, n, N, K ) = bX (x; n, p)




    Frank Betz (EUI)         Introduction to Probability   September 2007       114 / 124
                                     Outline




6   Some common univariate distributions
      Discrete distributions
      Continuous distributions




     Frank Betz (EUI)          Introduction to Probability   September 2007   115 / 124
                      Uniform distribution


Suppose X takes values in the interval [a, b] and all values of X are
attributed the same probability.
In this case, we say that X is uniformly distributed with density
function                         
                                      1
                                 
                                     b−a ,   a≤x ≤b
                      fX (x) =
                                 0,         x ≥ otherwise
                                                       a+b
A uniformly distributed rv has mean E (X ) =            2    and variance
             (b−a)2
Var (x) =      12 .




Frank Betz (EUI)         Introduction to Probability          September 2007   116 / 124
                      Exponential distribution


In a Poisson process with parameter λ the waiting time until the
event occurs for the first time follows an exponential distribution.
The density function of the exponential distribution is given by
                               
                               λe −λx , x ≥ 0
                   fX (x; λ) =
                               0,       otherwise

                                                          1
The exponential distribution has mean E (X ) =            λ   and variance
              1
Var (X ) =    λ2
                 .   The cdf is given by FX (x) = 1 −     e λx .




Frank Betz (EUI)            Introduction to Probability        September 2007   117 / 124
      Properties of the exponential distribution



Suppose X ∼ P(λ). Then

                   P(no event occurs over (0, t)) = e −λt

On the other hand, let Y ∼ Exp(λ). Then

              P(event occurs at T > t) = 1 − FY (t) = e −λt

The exponential distribution also exhibits the memoryless property.




Frank Betz (EUI)         Introduction to Probability   September 2007   118 / 124
                                 Example

Example
A study on the lifespan of lightbulbs measures the total number of kilowatt
hours consumed until failure. For a 100 Watt lightbulb, which has lit
already 5 tu (time units), Y represents total kilowatt hours consumed. One
tu equals 1,000 hours. A second random variable X records the remaining
lifespan in tu and is exponentially distributed with parameter λ = 0.1.
a. Compute the distribution of Y .
b. Compute the probability that the lightbulb consumes more that 2,500
kilowatt hours.
c. Suppose that after 20 tu the lightbulb still works. Compute the
probability that it works another 20 tu.


    Frank Betz (EUI)        Introduction to Probability   September 2007   119 / 124
                    Normal distribution


The normal distribution is probably the single most important
distribution in econometrics. This is due mainly to results in
asymptotic theory: Under fairly general conditions many other
distributions converge to the normal distribution.
The normal distribution has pdf

                                           1      (x−µ)2
                     fX (x; µ, σ 2 ) = √       e − 2σ2
                                           2πσ
The normal distribution has mean E (X ) = µ and variance
Var (X ) = σ 2 . There is no closed form solution for the cdf.



Frank Betz (EUI)        Introduction to Probability        September 2007   120 / 124
                   Standard normal distribution


A normal distribution with µ = 0 and variance σ 2 = 1 is called
standard normal distribution. The corresponding pdf reads
                                     1    1 2
                             φ(x) = √ e − 2 x
                                     2π
while the cdf is given by
                                       x
                                            1    1 2
                         Φ(x) =            √ e − 2 t dt
                                     −∞     2π

Let X ∼ N(µ, σ) with corresponding pdf and cdf. Then ∀x ∈ R we
have fX (x) = σ φ( x−µ ) and FX (x) = Φ( x−µ ).
              1
                    σ                     σ




Frank Betz (EUI)          Introduction to Probability     September 2007   121 / 124
                             σ-rules


Let X ∼ N(µ, σ 2 ). Then, the ”σ-rules” are given by

                      P(|X − µ| ≤ σ) = 0.682
                     P(|X − µ| ≤ 2σ) = 0.974
                     P(|X − µ| ≤ 3σ) = 0.998

Hence, for practical purposes a normal rv does not deviate more than
three standard deviations from its mean.
If a rv deviates for more than three standard deviations from the
mean it is unlikely to be normal.



Frank Betz (EUI)       Introduction to Probability   September 2007   122 / 124
                                Examples




Example
Let X ∼ N(0, σ 2 ). Find σ 2 given that P(|X | < 0.5) = 0.3.




    Frank Betz (EUI)        Introduction to Probability   September 2007   123 / 124
                                 Example




Example
Suppose the velocity of cars on a certain road is normally distributed.
However, we only know that 21.2% of cars travel faster than 90 km/h, and
that the number of cars driving slowlier than 60 km/h is just ten times
smaller. Compute µ and σ.




    Frank Betz (EUI)        Introduction to Probability   September 2007   124 / 124

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/30/2011
language:English
pages:125