# Introduction to Probability

Document Sample

```					                   Introduction to Probability

Frank Betz

EUI

September 2007

Frank Betz (EUI)         Introduction to Probability   September 2007   1 / 124
——————————————————————————–

Frank Betz (EUI)   Introduction to Probability   September 2007   2 / 124
Outline

1   Introduction

2   The Probability Space

3   Conditioning, independence, Bayes’ Rule and combinatorics

4   Random variables and probability distributions

5   Moments of random variables

6   Some common univariate distributions

Frank Betz (EUI)       Introduction to Probability   September 2007   2 / 124
Probability theory: why do we care?

The descriptive analysis of data does not allow for generalisations
beyond the data under considerations.
However, as economists our concern is to draw inference on the
underlying population.
Probability theory enables us to do just that: it develops
mathematical models for probability that provide the logical
foundations for statistical inference.

Frank Betz (EUI)         Introduction to Probability   September 2007   3 / 124
Probability theory - statistics - econometrics

Probability Theory analyses characteristics of probability
mechanisms on the basis of a limited number of deﬁnitions and
axioms.
On the basis of data on trials and some maintained assumptions
about a probability mechanism Statistics ”estimates” its parameters,
Econometrics applies statistics to test or quantify economic models
and theories.

Frank Betz (EUI)      Introduction to Probability   September 2007   4 / 124
References

Spanos (1986) ”The statistical foundations of econometric modelling”
Appendix to Hansens’s Lecture Notes
ıt
Benoˆ Champagne’s class notes ”Probability and random signals I”
Capinski and Kopp (2004) ”Measure, integral and probability”
Ivan Wilde’s script ”Measure, integration and probability”

Frank Betz (EUI)      Introduction to Probability   September 2007   5 / 124
Outline

2   The Probability Space
Approaches to Probability
Random experiment, sample space, and event
Sigma-ﬁeld
Probability

Frank Betz (EUI)       Introduction to Probability   September 2007   6 / 124
Classical probability

Deﬁnition
If a random experiment can result in N mutually exclusive and equally
likely outcomes and if NA of these outcomes result in the occurrence of
NA
the event A, then the probability of A is deﬁned by P(A) =      N

Problems:

Applies only to situations with a ﬁnite number of outcomes.
Due to the ’equally likely’ condition the deﬁnition is circular.

Frank Betz (EUI)        Introduction to Probability    September 2007   7 / 124
The frequency approach to probability

Deﬁnition
An experiment is repeated many times under similar conditions. For an
event of interest, we postulate a number PA , called the probability of the
event, and approximate PA by the relative frequency with which the
repeated observations satisfy the event.

To overcome the circularity of the classical deﬁnition the frequency
approach views probability as the limit of empirical frequencies.

Frank Betz (EUI)        Introduction to Probability    September 2007   8 / 124
Axiomatic probability

Deﬁnition
A probability space is a triple (S, F , P(· )), where
S is the sample space, a collection of possible outcomes of an
experiment.
F is the event space or set of events: a collection of subsets of S.
P(· ) is the probability function deﬁned on F : To any event, P(· )
assigns the probability that the event is observed once the experiment
is completed.

Modern probability theory and also the remainder of this course is
concerned with axiomatic probability. We now have a closer look at its
ingredients.

Frank Betz (EUI)        Introduction to Probability   September 2007    9 / 124
Outline

2   The Probability Space
Approaches to Probability
Random experiment, sample space, and event
Sigma-ﬁeld
Probability

Frank Betz (EUI)       Introduction to Probability   September 2007   10 / 124
Random experiment

Deﬁnition
A random experiment E is an experiment which satisﬁes the following
conditions:
all possible distinct outcomes are known a priori;
in any particular trial the outcome is not known a priori;
it can be repeated under identical conditions.

The axiomatic approach to probability can be viewed as a
formalisation of the concept of a random experiment.

Frank Betz (EUI)       Introduction to Probability   September 2007   11 / 124
Sample space

Deﬁnition
The sample space S is deﬁned to be the set of all possible outcome of
the experiment E . The elements of S are called elementary events.

Frank Betz (EUI)      Introduction to Probability   September 2007   12 / 124
Tossing a fair coin twice

Example
Consider the random experiment E of tossing a fair coin twice and
observing the faces turning up. The sample space of E is
S = {(HT ), (TH), (HH), (TT )} with (HT ), (TH), (HH), (TT ) being the
outcomes or elementary events of S.

Frank Betz (EUI)        Introduction to Probability   September 2007   13 / 124
Event
An event is a subset of the sample space S, formed by set theoretic
operations on the elementary events.
An event occurs when any of the elementary events it comprises
occurs.
When a trial is made only one elementary event is observed , but a
large set of events may have occured.
Special events are S, the sure event, and Ø, the impossible event.

Example
A1 = {(HT ), (TH), (HH)} = {(HT )} ∪ {(TH)} ∪ {(HH)}, that is ’two
tails’ does not occur
A2 = {(TT ), (HH)} = {(TT )} ∪ {(HH)}, either ’two heads’ or ’two tails’
occur
Frank Betz (EUI)      Introduction to Probability   September 2007   14 / 124
Event

We observe that for two events A1 and A2 the following are also events:
S\A1 = A1 , the complement of A1 and likewise for A2
A1 ∪ A2 , the union of A1 and A2
A1 ∩ A2 = A1 ∪ A2 , the intersection of A1 and A2
A1 \A2 = A1 ∩ A2 = A1 ∪ A2 ...
To construct the events we only need the set theoretic operations
complementation and union.

Frank Betz (EUI)       Introduction to Probability   September 2007   15 / 124
Example

Example

A1 = {(HT ), (TH), (HH)} = {TT }
A1 ∪ A2 = {(HT ), (TH), (HH)} ∪ {(TT ), (HH)} = S
A1 ∩ A2 = {(HT ), (TH), (HH)} ∩ {(TT ), (HH)} = {(HH)}
A1 \A2 = {(HT ), (TH), (HH)} ∩ {(HT ), (TH)} = {(HT ), (TH)}

The occurence or non-occurence of A1 or A2 implies the occurence or
non-occurence of these events. A mathematical structure that aims to
assign probabilities to events has to take this into account.

Frank Betz (EUI)        Introduction to Probability   September 2007   16 / 124
Outline

2   The Probability Space
Approaches to Probability
Random experiment, sample space, and event
Sigma-ﬁeld
Probability

Frank Betz (EUI)       Introduction to Probability   September 2007   17 / 124
Set of events

We now understand the meaning and the properties of events. Our
goal, however, is to assign probabilities to these events.
The probability function P(·) : F −→ [0; 1] does just that. As we use
sets to represent events, its domain is a collection of sets, denoted by
F . To understand the probability function we ﬁrst have to
understand its domain.
The key property of F is that if A ∈ F , then sets obtained via the
various set theoretic operations performed on A must also be
elements of F .

Frank Betz (EUI)       Introduction to Probability    September 2007   18 / 124
Power set
Example
When tossing a fair coin twice the set of all subsets, that is the power set
of S is an obvious candidate for F . The power set is given by:

P = {∅, S, {(HT )}, {(HH)}, {(TT )}, {(TH)},
{(HT ), (HH)}, {(TT ), (HH)}, {(TH), (HH)},
{(HT ), (TH)}, {(TT ), (TH)}, {(HT ), (TT )},
{(HT ), (TH), (TT )}, {(HT ), (TH), (HH)},
{(HH), (TT ), (TH)}, {(HH), (TT ), (HT )}}

However, one can only deﬁne F to be the power set of S, if S is ﬁnite or
countably inﬁnite. To assign probabilities when S is uncountable we
require that F be a σ-ﬁeld.
Frank Betz (EUI)           Introduction to Probability   September 2007   19 / 124
Sigma-ﬁeld

Deﬁnition
Let F be a set of subsets of S. F is called a sigma-ﬁeld if:
A ∈ F , then A ∈ F - closure under complementation; and
Ai ∈ F , i = 1, 2, ..., then (   i   Ai ) ∈ F - closure under countable
union

We observe that the deﬁnition implies:
S ∈ F , because A ∪ A = S
∅ ∈ F , as S = ∅
Ai ∈ F , i = 1, 2, ..., then (   i   Ai ) ∈ F

Frank Betz (EUI)         Introduction to Probability      September 2007   20 / 124
Example

Example
Check whether the collections F1 and F2 are σ-ﬁelds where F1 , F2 are
given by

F1 = {{(HT )}, {(HH), (TH), (TT )}, ∅, S},
F2 = {{(HT ), (TH)}, {(HH)}, ∅, S}

Frank Betz (EUI)       Introduction to Probability   September 2007   21 / 124
Example

Example
Suppose you are interested in the event ’ﬁrst toss head’. How do you
obtain a σ-ﬁeld?

Frank Betz (EUI)       Introduction to Probability   September 2007   22 / 124
Sigma-ﬁeld generated by a family of sets

Deﬁnition
G is the σ-ﬁeld generated by a family of sets A if

G =     {F : F is a σ-ﬁeld such that F ⊃ A }

G is the smallest σ-ﬁeld containing A
Starting from some events of interest to obtain the σ-ﬁeld generated
by these events is necessary to construct σ-ﬁelds when S is inﬁnite or
uncountable.

Frank Betz (EUI)        Introduction to Probability   September 2007   23 / 124
Borel ﬁeld

Example
Let S be the real line R and suppose we are interested in events
J = {Bx : x ∈ R}, where Bx = {z : z ≤ x} = (−∞, x] Construct the
σ-ﬁeld generated by Bx taking complements and countable unions.

Frank Betz (EUI)       Introduction to Probability   September 2007   24 / 124
Borel ﬁeld
Deﬁnition
Put B =          {F : F is a σ-ﬁeld containing all intervals}. We say that B is
the σ-ﬁeld generated by all intervals and we call the elements of B Borel
sets.

B is the smallest σ-ﬁeld containing all intervals.
We started with all intervals of the type (−∞, x] to generate B.
However, we could have started with all intervals of any other type
(all open intervals, all closed intervals,...) and still arrived at B.
Not only does B contain all open intervals, it also contains all open
sets as any open set is a countable union of open intervals.
Countable sets are Borel sets, since each is a countable union of
closed intervals of the form [a, a]; in particular N and Q are Borel sets.
Frank Betz (EUI)        Introduction to Probability     September 2007   25 / 124
Outline

2   The Probability Space
Approaches to Probability
Random experiment, sample space, and event
Sigma-ﬁeld
Probability

Frank Betz (EUI)       Introduction to Probability   September 2007   26 / 124
Probability

Deﬁnition
Probability is deﬁned as a set function on F satisfying the following
axioms:
P(A) ≥ 0 for every A ∈ F
P(S) = 1
P(    i   Ai ) =   i   P(Ai ) if Ai is a sequence of mutually exclusive events
in F - called countable additivity

Hence, probability is a countably additive set function with domain F
and range [0,1].

Frank Betz (EUI)              Introduction to Probability   September 2007   27 / 124
Properties of the probability set function

Proposition
Let P(·) be a probability measure on F . Then the following hold.

P(A) = 1 − P(A),                A∈F                     (1)
P(∅) = 0                                               (2)
If A1 ⊂ A2 then P(A1 ) ≤ P(A2 ),                A1 , A2 ∈ F                 (3)
P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 )                  (4)

Frank Betz (EUI)          Introduction to Probability         September 2007   28 / 124
Properties of the probability set function

Proposition
If A1 ⊆ A2 ⊆ . . . with An ∈ F , n = 1, 2, . . . , then we have

lim P(An ) = P(             Ai )
n−→∞
i

If A1 ⊇ A2 ⊇ . . . with An ∈ F , n = 1, 2, . . . , then we have

lim P(An ) = P(             Ai )
n−→∞
i

Frank Betz (EUI)         Introduction to Probability       September 2007   29 / 124
Probability set function

Example
Is it possible for two events A and B where P(A) = 0.7, P(B) = 0.9 and
P(A ∩ B) = 0.5?

Frank Betz (EUI)       Introduction to Probability   September 2007   30 / 124
Measure

Deﬁnition
A ﬁnite measure on a measurable space (S, F ) is a map
µ : F −→ [0, ∞), such that if A1 , A2 , . . . is any sequence of pairwise
disjoint members of F , then

µ(       Ai ) =       µ(Ai )
i            i

This requirement is referred to as countable additivity.

Frank Betz (EUI)         Introduction to Probability   September 2007   31 / 124
Probability and measure

A measure space is a triple (S, F , µ), where µ is a measure on the
σ-ﬁeld F of subsets of S.
If µ(S) = 1, then µ is called a probability measure and (S, F , µ) is
called a probability space.
Probability is therefore a special case of a ﬁnite measure.

Frank Betz (EUI)        Introduction to Probability   September 2007   32 / 124
Outline

3   Conditioning, independence, Bayes’ Rule and combinatorics
Conditional Probability
Independence
Bayes’ Theorem
Combinatorics

Frank Betz (EUI)       Introduction to Probability   September 2007   33 / 124
Conditional probability

Example
Suppose that in the case of tossing a fair coin twice we know that in the
ﬁrst trial it was heads. What does it imply for the sample space, the
associated σ-ﬁeld and the corresponding probabilities?

Frank Betz (EUI)       Introduction to Probability   September 2007   34 / 124
Conditional probability

Deﬁnition
P(A∩B)
Suppose that P(B) > 0. Then the number P(A|B) =           P(B)    is called the
conditional probability of A given B.

Frank Betz (EUI)       Introduction to Probability    September 2007   35 / 124
Example

Example
An urn contains 3 white and 2 red balls. Suppose you draw two balls
without replacement and consider the events A ’second ball red’ and B
’ﬁrst ball red’. Compute P(A), P(B), P(A ∩ B), and P(A|B).

Frank Betz (EUI)      Introduction to Probability   September 2007   36 / 124
Outline

3   Conditioning, independence, Bayes’ Rule and combinatorics
Conditional Probability
Independence
Bayes’ Theorem
Combinatorics

Frank Betz (EUI)       Introduction to Probability   September 2007   37 / 124
Independence

Deﬁnition
Two events A and B are independent if and only if

P(A ∩ B) = P(A)P(B), or                             (5)
P(A|B) = P(A), or                                  (6)
P(B|A) = P(B)                                      (7)

where (6) and (7) hold only if P(A) > 0 and P(B) > 0 respectively.

Frank Betz (EUI)       Introduction to Probability   September 2007   38 / 124
Independence

Example
A student takes two courses - calculus and statistics. It is known that the
probability that he passes both courses is 0.5, that he passes only calculus
equals 0.2, that he passes only statistics is 0.1, and that he fails both
courses is 0.2. Is the performance in statistics independent of the
performance in calculus?

Frank Betz (EUI)         Introduction to Probability   September 2007   39 / 124
Independence of n events

Deﬁnition
Events A1 , ..., An are independent if and only if

P(Ai ∩ Aj ) = P(Ai )P(Aj ), 1 ≤ i < j ≤ n
P(Ai ∩ Aj ∩ Ak ) = P(Ai )P(Aj )P(Ak ), 1 ≤ i < j < k ≤ n
...
P(         Ak ) =        P(Ak )
k             k

Put diﬀerently, the events A1 , ..., An are independent if for all k ≤ n for
each choice of k events, the probability of their intersection is the product
of the probabilities.

Frank Betz (EUI)             Introduction to Probability   September 2007   40 / 124
Independence of n events

There are 2n − n − 1 equations.
Independence of n events implies independence of any subset
A1 , ..., Ak .
If only P(Ai ∩ Aj ) = P(Ai )P(Aj ) holds, the events are called pairwise
independent.
However, P(        k   Ak ) =      k   P(Ak ) does not imply pairwise
independence

Frank Betz (EUI)                Introduction to Probability    September 2007   41 / 124
Outline

3   Conditioning, independence, Bayes’ Rule and combinatorics
Conditional Probability
Independence
Bayes’ Theorem
Combinatorics

Frank Betz (EUI)       Introduction to Probability   September 2007   42 / 124
The law of total probability

Theorem
Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and
i   Ai = S. Let B ⊂ S. Then

P(B) =         P(B|Ai )P(Ai )
i

Frank Betz (EUI)         Introduction to Probability   September 2007   43 / 124
The law of total probability

Example
Prove the law of total probability.

Frank Betz (EUI)         Introduction to Probability   September 2007   44 / 124
Bayes’ Rule

Theorem
Let A1 , ..., An be pairwise disjoint events such that P(Ai ) > 0 and
i   Ai = S. Let B ⊂ S where P(B) > 0. Then

P(B|Ai )P(Ai )
P(Ai |B) =
j P(B|Aj )P(Aj )

In Bayes’ formula A1 , ..., An are often called hypotheses, P(Ai ) a priori
probability of Ai , and P(Ai |B) a posteriori probability of Ai .

Frank Betz (EUI)       Introduction to Probability    September 2007   45 / 124
Bayes’ Rule

Example
Prove Bayes’ Rule.

Frank Betz (EUI)   Introduction to Probability   September 2007   46 / 124
Example

Example
A product is produced by two machines 1 and 2. Machine 1 produces 40%
of the output, but 5% of its output is substandard. Machine 2 produces
60% of the output, but 10% of the products are deﬁcient. Compute the
probability that a deﬁcient product is from machine 1.

Frank Betz (EUI)       Introduction to Probability   September 2007   47 / 124
Outline

3   Conditioning, independence, Bayes’ Rule and combinatorics
Conditional Probability
Independence
Bayes’ Theorem
Combinatorics

Frank Betz (EUI)       Introduction to Probability   September 2007   48 / 124
Classical probability

Classical probability rests on the assumption that the sample space is
ﬁnite.
1   Suppose S = s1 , ..., sn , that is E results in n elementary events.
2   Find n positive real numbers p1 , ..., pn such that                 i   pi = 1 These
numbers represent the probabilities of the elementary events
3   Obtain the ﬁeld of events F as the power set of S. Then any event
A ∈ F can be represented as Ai = si1 , ..., sik with
P(Ai ) = pi1 + ... + pik .
1
Suppose all elementary events are equally likely, that is p1 = ... = pn = n .
|A|
Then, for any event A ∈ F deﬁne P(A) =                   |S| ,   called the classical
probability of A.

Frank Betz (EUI)             Introduction to Probability                September 2007   49 / 124
Combinatorics

Combinatorics is located in the realm of classical probability. It
evolved from the analysis of chance games where the elementary
events are typically equally likely.
Many of these problems can be formulated as drawing k objects from
a set of n elements with or without replacement while being
concerned or not concerned with the ordering of the draw.

Frank Betz (EUI)         Introduction to Probability   September 2007   50 / 124
Ordered sample, with replacement

Proposition
Suppose k objects are drawn with replacement out of a set of n elements.
Then the total number of ordered k-tuples is nk .

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
with replacement and write down the sequence of numbers.

Frank Betz (EUI)       Introduction to Probability   September 2007   51 / 124
Ordered sample, without replacement

Proposition
Suppose k objects are drawn without replacement out of a set of n
elements. Then the total number of ordered k-tuples is
n!
n(n − 1)(n − 2)...(n − k + 1) =   (n−k)! ,   denoted by P(n, k).

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
without replacement and write down the sequence of numbers.

Frank Betz (EUI)       Introduction to Probability       September 2007   52 / 124
Unordered sample, without replacement

Proposition
Suppose k objects are drawn without replacement out of a set of n
n!
elements. Then the total number of unordered k-tuples is         (n−k)!k! ,
n
denoted by    k   or C (n, k).

Example
Take k balls in a single draw from the urn containing n balls.

Frank Betz (EUI)             Introduction to Probability   September 2007   53 / 124
Unordered sample, with replacement

Proposition
Suppose k objects are drawn without replacement out of a set of n
elements. Then the total number of unordered k-tuples is
(n+k−1)!        n+k−1
(n−1)!k!   =     k     = C (n + k − 1, k)

Example
Suppose an urn contains n distinctly numbered balls. You draw k balls
with replacement and note the sequence of numbers. You then count how
many times the 1st,...,nth ball appeared.

Frank Betz (EUI)           Introduction to Probability   September 2007   54 / 124
Example

Example
In Poker, a player is dealt ﬁve cards from a 52 card deck. Compute the
number of possibilities.

Frank Betz (EUI)       Introduction to Probability   September 2007   55 / 124
Example

Example
In a lottery six balls are drawn from an urn containing 49 distinctly
numbered balls. Compute the probability that a player correctly guesses
the numbers of four balls out of the six balls drawn.

Frank Betz (EUI)        Introduction to Probability   September 2007   56 / 124
Useful formulas for unordered sample without
replacement

Proposition
Binomial Expansion:

n i n−i
(x + y )n =             x y
i
i

Proposition
Pascal’s triangle:
n−1    n−1                       n
+                    =
k −1    k                        k

Frank Betz (EUI)       Introduction to Probability       September 2007   57 / 124
Outline

4   Random variables and probability distributions
Random variable
Distribution and density functions

Frank Betz (EUI)       Introduction to Probability   September 2007   58 / 124
Why do we care?

The concept of the probability space (S, F , P(·)) was developed to
consistently assign probabilities to events, especially when the sample
space S is inﬁnite.
However, the mathematical manipulation of P(·) is diﬃcult, as its
domain is a σ-ﬁeld F of arbitrary sets. To go further in our
development of a mathematical probability model we need a more
ﬂexible framework.
However, random experiments E often yield quantiﬁable outcomes,
i.e. outcomes that can be represented by ’numbers’. The concept of a
random variable exploits this opportunity by assigning numbers to
outcomes without altering the probabilistic structure of (S, F , P(·)).

Frank Betz (EUI)       Introduction to Probability   September 2007   59 / 124
Coin toss revisited

Example
Suppose that in our familiar example we are interested in the number of
heads. Consider the function X (·) : S −→ Rx that maps all elementary
events si onto the set Rx = {0, 1, 2}. Hence, X (·) is a function that
assigns numbers to the outcome of the random experiment.
However, for X (·) to be a random variable we require more. A random
variable has to preserve the event structure of (S, F , P(·)). This means,
that for every value of X , there exists a corresponding subset in F and
that the mapping preserves unions and complements. . .

Frank Betz (EUI)        Introduction to Probability   September 2007   60 / 124
Coin toss revisited

Example
. . . We obtain:

X −1 (0) = {(TT )}
X −1 (1) = {(TH), (HT )}
X −1 (2) = {(HH)}

If X −1 (0) ∈ F , X −1 (1) ∈ F , X −1 (2) ∈ F , X −1 (0 ∪ 1) ∈ F ,
X −1 (1 ∪ 2) ∈ F , and X −1 (0 ∪ 2) ∈ F , then X (·) is a random variable
with respect to F .

Frank Betz (EUI)        Introduction to Probability    September 2007   61 / 124
Random variable

Deﬁnition
A random variable X is a real valued function from S to R, which
satisﬁes the condition that for each Borel set B ∈ B on R, the set

X −1 (B) = {s : X (s) ∈ B, s ∈ S}

is an event in F .

.

Frank Betz (EUI)         Introduction to Probability   September 2007   62 / 124
Features of a random variable

A random variable is a real valued function. It is neither ’random’ nor
a ’variable’.
A random variable is always deﬁned relative to some speciﬁc σ-ﬁeld.
In deciding whether some function Y (·) : S −→ R is a random
variable we proceed from the elements of the Borel ﬁeld B to those
of the σ-ﬁeld F and not the other way round.

Frank Betz (EUI)          Introduction to Probability   September 2007   63 / 124
Features of a random variable

There is no need to consider all Borel sets B ∈ B. B is the σ-ﬁeld
generated by all intervals of the type (−∞, x]. Hence, if X (·) is such that

X −1 ((−∞, x]) = {s : X (s) ∈ (−∞, x], s ∈ S} ∈ F

for all (−∞, x] ∈ B, then

X −1 (B) = {s : X (s) ∈ B, s ∈ S} ∈ F

for all B ∈ B since all Borel sets can be expressed in terms of the
semi-closed intervals (−∞, x].

Frank Betz (EUI)            Introduction to Probability    September 2007   64 / 124
Tossing the coin, once more
Example
Consider X - number of heads and let

F    = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)},
{(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}}

Then                         
∅,
                             x <0



{(TT )},

0≤x <1
−1
X ((−∞, x]) =
{(TH), (HT )},
                             1≤x <2




{(HH)},                      2≤x

and X −1 ((−∞, x]) ∈ F for all x ∈ R and thus X (·) is a random variable
with respect to F .
Frank Betz (EUI)         Introduction to Probability          September 2007   65 / 124
Example, continued
Example
Let Y equal 1 if the ﬁrst toss is a head. F given as before by

F    = {∅, S, {(HH)}, {(TT )}, {(HT ), (HH), (TH)},
{(TH), (HT ), (TT )}, {(HH), (TT )}, {(HT ), (TH)}}

Then                            
∅,
                y <0


Y −1 ((−∞, y ]) = {(TH), (TT )}, 0 ≤ y < 1



{(HT ), (TT )}, 1 ≤ y

Thus, Y −1 ((−∞, y ]) ∈ F for y = 0, Y = 1. Therefore, Y (·) is not a
/
random variable with respect ot F . How can we turn Y (·) into a random
variable?
Frank Betz (EUI)        Introduction to Probability   September 2007   66 / 124
The set function PX (·)

To assign probabilities to the Borel set B ∈ B we deﬁne the set
function PX (·) : B −→ [0, 1] such that

PX (B) = P(X −1 (B)) = P(s : X (s) ∈ B, s ∈ S)

for all B ∈ B.
Again, there is no need to consider all Borel sets B ∈ B when deﬁning
PX (·). It is suﬃcient to consider semi-closed intervals of the type
(−∞, x] as all Borel sets can be expressed in terms of these intervals.

Frank Betz (EUI)         Introduction to Probability   September 2007   67 / 124
Tossing a coin twice

Example
X (·), the number of heads, is deﬁned with respect to the same σ-ﬁeld F
as before. Then,

0,
            x <0



1,

0≤x <1
PX ((−∞, x]) = 4
3,
4           1≤x <2




1,          2≤x

Frank Betz (EUI)        Introduction to Probability    September 2007   68 / 124
Probability space transformed

Using the concept of a random variable, we have transformed the
proability space (S; F , P(·)) into the equivalent probability space
(R, B, PX (·)). In particular:
S, a set of arbitrary elements, has been replaced by the real line R.
Correspondingly we traded F , the σ-ﬁeld of subsets of S against B,
the Borel ﬁeld on the real line.
P(·) a set function deﬁned on arbitrary sets has been replaced by
PX (·) a set function deﬁned on semi-closed intervals on the real line.
Hence, we have obtained a more ﬂexible framework to develop a
probability model.

Frank Betz (EUI)          Introduction to Probability   September 2007   69 / 124
Outline

4   Random variables and probability distributions
Random variable
Distribution and density functions

Frank Betz (EUI)       Introduction to Probability   September 2007   70 / 124
Motivation for distribution function

Though we deﬁned PX (·) on semi-closed intervals rather than
arbitrary sets, it is still a set function.
We would like to simplify it further by transforming it into a point
function, which can be easily represented by an algebraic formula.
This seems feasible as the intervals (∞, x] diﬀer only in their ’end
point’ x.
Heuristically, one deﬁnes F (·) as a point function by

PX ((−∞, x]) = F (x) − F (−∞), ∀x ∈ R,

and assigning the value zero to F (−∞).

Frank Betz (EUI)          Introduction to Probability   September 2007   71 / 124
Distribution function

Deﬁnition
Let X be a random variable deﬁned on (S, F , P(·)). The point fucntion
F (·) : R −→ [0, 1] deﬁned by

F (x) = PX ((−∞, x]) = Pr (X ≤ x),        ∀x ∈ R

is called the distribution function of X and satisﬁes the following
properties:
1   F (x) is non-decreasing;
2   F (−∞) = limx−→−∞ F (x) = 0, F (∞) = limx−→∞ F (x) = 1
3   F (x) is continuous from the right
(limh+ −→0 F (x + h+ ) = F (x), ∀x ∈ R).

Frank Betz (EUI)        Introduction to Probability      September 2007   72 / 124
Discrete random variable

Deﬁnition
A random variable is called discrete if its Range RX is a countable set.

Example
We roll three dice one time and deﬁne X to be the sum of the numbers
that occur. The set RX = {3, 4, . . . , 18} is ﬁnite and thus X is discrete.

Frank Betz (EUI)         Introduction to Probability   September 2007   73 / 124
Continuous random variable

Deﬁnition
A random variable is called (absolutely) continuous if its distribution
function F (x) is continuous for all x ∈ R and there exists a non-negative
function f (x) on the real line such that
x
F (x) =        f (u) du, ∀x ∈ R
−∞

Note that continuity of X requires more than a continuous distribution
function F (X ). F (x) must be derivable by integrating some non-negative
function f (x).

Frank Betz (EUI)         Introduction to Probability   September 2007   74 / 124
Density function

Deﬁnition
Let F(x) bef the distribution function of the r.v. X . The non-negative
function f (x) deﬁned by
x
F (x) =           f (u) du, ∀x ∈ R − continuous
−∞

or
F (x) =         f (u), ∀x ∈ R − discrete
u≤x

is said to be the (probability) density function (pdf) of X .

Frank Betz (EUI)              Introduction to Probability     September 2007   75 / 124
Uniform distribution

Example
Suppose X takes values in the interval [a, b] and all values of X are
attributed the same probability. In this case, we say that X is uniformly
distributed with distribution function

0,
     x <a


F (x) = x−a , a ≤ x < b
 b−a


1,   x ≥b

How does the corresponding density function look like?

Frank Betz (EUI)        Introduction to Probability   September 2007   76 / 124
Properties of the density function

f (x) ≥ 0, ∀x ∈ R
∞
−∞ f (x)    dx = 1
b
Pr (a < X < b) =        a    f (x) dx
d
f (x) =   dx F (x)   at every point where F (x) is continuous

Frank Betz (EUI)              Introduction to Probability   September 2007   77 / 124
Example

Example
Let fX (x) be given by

ax + b              x ∈ [−2b, 2b]
fX (x) =
0, otherwise

For which a, b is fX (x) a probability density function?

Frank Betz (EUI)            Introduction to Probability          September 2007   78 / 124
Outline

5   Moments of random variables
Mean
Variance
Moment generating function

Frank Betz (EUI)     Introduction to Probability   September 2007   79 / 124
Mean

Deﬁnition
The mean or expected value of a random variable X is given by
∞
E (X ) =          x fx (x) dx
−∞

if X is continuous, and
E (X ) =           xi pi
i

if X is discrete.

Frank Betz (EUI)       Introduction to Probability   September 2007   80 / 124
Example

Example
Suppose you are in a casino and oﬀered to play the following game. An
urn contains 100 balls, numbered from 0 to 99. Twelve balls will be drawn
randomly from the urn. Before they are selected you are asked to choose
any number N from 0 to 99. If the ball numbered N is among the twelve
balls drawn from the urn you will be paid \$N. Would you be willing to play
if you have to pay \$15 to enter the game? What is the optimal strategy?

Frank Betz (EUI)       Introduction to Probability   September 2007   81 / 124
Example

Example
A fair coin is tossed until two successive heads occur. Let X be the
number of tosses required and compute E (X ).

Frank Betz (EUI)       Introduction to Probability   September 2007   82 / 124
Properties of the expectation

1   E (c) = c if c is a constant
2   E (aX1 + bX2 ) = aE (X1 ) + bE (X2 ) for any two r.v.’s X1 and X2
whose means exist and a, b are real constants
1
3   P(X ≥ λE (X )) ≤ λ , for a positive r.v. X and λ > 0. This result is
known as Markov’s inequality.

Properties (1) and (2) deﬁne E (·) as a linear operator. Hence
E (aX + b) = aE (X ) + b.

Frank Betz (EUI)          Introduction to Probability   September 2007   83 / 124
Existence of the expectation

For the expectation to exist we require that
∞
|x|fX (x)dx < ∞
−∞

if the random variable is continuous or

|x|f (xi ) < ∞
i

if the random variable is discrete.

Frank Betz (EUI)         Introduction to Probability   September 2007   84 / 124
Example

Example
Show that in case of the Cauchy distribution given by
1
fX (x) =               ,     ∀x ∈ R
π(1 + x 2 )

the mean does not exist.

Frank Betz (EUI)        Introduction to Probability      September 2007   85 / 124
Expectation of a function of a random variable

Deﬁnition
If X is a random variable, and there exists a function g (·) : R −→ R, then
∞
E (g (X )) =         g (x)fx (x) dx
−∞

if X is continuous, and

E (g (X )) =          g (xi )pi
i

if X is discrete.

Frank Betz (EUI)        Introduction to Probability      September 2007   86 / 124
Outline

5   Moments of random variables
Mean
Variance
Moment generating function

Frank Betz (EUI)     Introduction to Probability   September 2007   87 / 124
Variance
Deﬁnition
The variance of a random variable X is deﬁned by

Var (X ) = E (X − E (X ))2
∞
=               (x − E (X ))2 f (x)dx
−∞

=           (xi − E (X ))2 pi
i

when X is continuous and discrete respectively.

Often, it is more convenient to work with the equality
Var (X ) = E (X 2 ) − E 2 (X ).
The root of the variance is called standard deviation.

Frank Betz (EUI)           Introduction to Probability             September 2007   88 / 124
Uniform Distribution

Example
Let X be uniformly distributed on the interval [0, b]. Compute its mean
and variance.

Frank Betz (EUI)       Introduction to Probability   September 2007   89 / 124
Properties of the variance

1   Var (c) = 0 for any constant c.
2   Var (aX ) = a2 Var (X ) for a constant.
Var (X )
3   P(|X − E (X )| ≥ k) ≤     k2
,k   > 0, which is known as Chebyshev’s
inequality. It provides an upper bound on dispersion as deﬁned by
|X − E (X )| ≥ k.

Frank Betz (EUI)        Introduction to Probability     September 2007   90 / 124
Example

Example
A random variable X has mean E (X ) = 8 and Variance Var (X ) = 4.
Provide an upper bound for the probability P(X ≤ 4      X ≥ 20).

Frank Betz (EUI)      Introduction to Probability    September 2007   91 / 124
Outline

5   Moments of random variables
Mean
Variance
Moment generating function

Frank Betz (EUI)     Introduction to Probability   September 2007   92 / 124
Higher order moments

If it exists, the r th raw moment is deﬁned by
∞
µr = E (X r ) =         x r fx (x) dx,    r = 0, 1, 2, ...
−∞

The mean is the ﬁrst raw moment µ1 = E (X ) = µ.
If it exists, the r th central moment is deﬁned by
∞
µr = E (X − µ)r =            (x − µ)r fx (x) dx,      r = 0, 1, 2, ...
−∞

The variance is the second central moment µ2 = E (X − µ)2 = σ 2 .

Frank Betz (EUI)          Introduction to Probability          September 2007    93 / 124
Skewness

µ3
The coeﬃcient of skewness is given by S =             3/2 ,   where µi is the
µ2
ith central moment.
If S < 0 we say the distribution is negatively skewed or skewed to the
left. If S > 0 the distribution is positively skewed or skewed to the
right.
If the distribution is symmetric, then all odd central moments are
equal to zero. Hence S = 0.

Frank Betz (EUI)       Introduction to Probability            September 2007   94 / 124
Kurtosis

µ4
The coeﬃcient of kurtosis is given by K =            µ2
− 3, where µi is the
2
ith central moment.
Distributions with zero kurtosis are called mesokurtic. The normal
distribution is mesokurtic.
If K > 0 the distribution is called leptokurtic. Leptocurtic
distributions have a more pronounced peak. If K < 0 the distribution
is called platykurtic and has a relatively ﬂat peak.

Frank Betz (EUI)       Introduction to Probability           September 2007   95 / 124
Moment generating function

Deﬁnition
Let X be a random variable with pdf fX (·). If there exists a real constant
h > 0 such that E (e tX ) exists for all |t| < h, then mX = E (e tX ) is called
moment generating function.

If X is discrete the mgf is given by mX =             i   e txi pi .
∞   tx
In case of continuous X , the mgf is given by mX =                      −∞ e fX (x)dx.

The mgf need not exist!

Frank Betz (EUI)         Introduction to Probability                    September 2007   96 / 124
Properties of the mgf

If the moment generating function exists, then
all raw moments of a random variable X can be derived as
d r mX (t)
µr = (              )t=0 = E (X r )
dt r

1                          1
mX (t) = E (1 + xt +       (xt)2 + ...) =            µ ti
2!                         i! i
i

Let X , Y be random variables with associated densities fX (x), fY (y ).
If the moment generating functions mX (t), mY (t) exist and
mX (t) = mY (t), ∀t ∈ (−h, h), h > 0, then FX (·) = FY (·).

Frank Betz (EUI)         Introduction to Probability            September 2007   97 / 124
Example

Example
The probability density function of an exponentially distributed random
variable is given by
fX (x) = θe −θx ,        x ≥0

Find Var (X ), using the mgf.

Frank Betz (EUI)        Introduction to Probability   September 2007   98 / 124
Characteristic function

The characteristic function is an alternative to the moment
generating function. It is given by
√
ψX (t) = E (e itX ),      i=        −1

The advantage of the characteristic function is that it always exists,
since |e itX | = | cos tX + sin tX | ≤ 1, such that convergence is
guaranteed.
To obtain moments from the characteristic function compute
i r µr = ψ (r ) (0).

Frank Betz (EUI)           Introduction to Probability            September 2007   99 / 124
Outline

6   Some common univariate distributions
Discrete distributions
Continuous distributions

Frank Betz (EUI)          Introduction to Probability   September 2007   100 / 124
Parametric families

We can now deﬁne our probability model in the form of a parametric
family of density functions, denoted by Φ = {f (x; θ), θ ∈ Θ}. Every
member of Φ is indexed by a parameter θ, which belongs to the
parameter space Θ.
Choosing a parametric family to model a real world phenomenon
assumes that the data have been generated by one of the densities in
Φ. The uncertainty relating to the outcome of a particular trial of the
random experiment E has been transformed into the uncertainty
regarding the value of the parameter θ.

Frank Betz (EUI)       Introduction to Probability   September 2007   101 / 124
Bernoulli distribution

A random experiment with outcomes ”success” (x = 1) and ”failure”
(x = 0) and associated probabilities p and q = 1 − p is called
Bernoulli experiment.
A Bernoulli distributed rv has probability function

p x (1 − p)1−x , x = 0, 1
fX (x; p) =
0,               otherwise

The distribution function follows immediately. A Bernoulli random
variable has mean E (X ) = p and Variance Var (X ) = p(1 − p).

Example
Tossing a coin once. X - ’head occurs’

Frank Betz (EUI)        Introduction to Probability   September 2007   102 / 124
Binomial distribution

A binomial r.v. counts the number of successes in a sequence of n
independent Bernoulli experiments.
The probability function of a binomial r.v. is given by

 n p x (1 − p)1−x , x = 0, 1, 2, . . . n
k
bX (x; n, p) =
0,                    otherwise

A binomially distributed rv has mean E (X ) = np and variance
Var (X ) = np(1 − p).

Frank Betz (EUI)        Introduction to Probability   September 2007   103 / 124
Example

Example
A fair dice is rolled three times. Compute the probability of at least two 6.

Frank Betz (EUI)        Introduction to Probability   September 2007   104 / 124
Example

Example
Obtain the expectation of a binomial random variable in three diﬀerent
ways.

Frank Betz (EUI)       Introduction to Probability   September 2007   105 / 124
Example

Example
How many times do we have to throw a dice for the probability to obtain
no 6 to be lower than 5%?

Frank Betz (EUI)        Introduction to Probability   September 2007   106 / 124
Poisson distribution

The Poisson distribution is often used to model the number of events
occuring within a given time interval.
A Poisson rv has probability function

e −λ λx , x = 0, 1, 2, . . .
x!
pX (x; λ) =
0,        otherwise

Mean and variance of the Poisson distribution are given by
E (X ) = Var (X ) = λ. Hence, λ can be understood as the average
number of events, that is the arrival rate of events, over the unit
interval.

Frank Betz (EUI)        Introduction to Probability   September 2007   107 / 124
Example

Example
At a manufacturing plant accidents have occured once every two months.
Assuming accidents occur independently, what is the expected number of
accidents per year? What is the probability that in a given month there
will be no accidents?

Frank Betz (EUI)       Introduction to Probability   September 2007   108 / 124
Example

Example
Suppose X ∼ B(n, p) and let n −→ ∞, p −→ 0, such that np = λ. Show
k
that limn−→∞ b(k; n, λ ) = e −λ λ
n          k!

Frank Betz (EUI)        Introduction to Probability   September 2007   109 / 124
Geometric distribution

In a sequence of independent Bernoulli experiments, the number of
trials before achieving ’success’ for the ﬁrst time is geometrically
distributed.
The geometric distribution has probability function

(1 − p)x p, x ∈ N0
gX (x; p) =
0,             otherwise

1−p
The geometric distribution has mean E (X ) =            p    and variance
1−p
Var (X ) =     p2
.
The geometric distribution is also called a ”discrete waiting time”
distribution.

Frank Betz (EUI)         Introduction to Probability         September 2007   110 / 124
Example

Example
Show that the geometric distribution is ”memoryless”, that is
P(X > a + b|X > a) = P(X > b).

Frank Betz (EUI)       Introduction to Probability   September 2007   111 / 124
Hypergeometric distribution

Suppose that without replacement we draw a sample of size n out of
a population of size N, K of which share a certain property. Then, a
rv X counting the number of elements sharing this property in the
sample is hypergeometrically distributed.
The hypergeometric distribution has probability function
 K N−K
 ( x )( n−x )
              , x = 0, 1, . . . , n
hX (x; n, N, K ) =        (N )
n
0,

otherwise

The hypergeometric distribution has mean E (X ) = n K and variance
N
Var (X ) = n K N−K
N N
N−n
N−1 .

Frank Betz (EUI)         Introduction to Probability     September 2007   112 / 124
Example

Example
Suppose you are dealt ﬁve cards from a standard 52 card Poker deck.
Compute the probability of receiving at least two aces.

Frank Betz (EUI)       Introduction to Probability    September 2007   113 / 124
Example

Example
K
Suppose X ∼ H(x; n, N, K ) and let N −→ ∞, K −→ ∞ such that                 N   = p.
Show that lim hX (x, n, N, K ) = bX (x; n, p)

Frank Betz (EUI)         Introduction to Probability   September 2007       114 / 124
Outline

6   Some common univariate distributions
Discrete distributions
Continuous distributions

Frank Betz (EUI)          Introduction to Probability   September 2007   115 / 124
Uniform distribution

Suppose X takes values in the interval [a, b] and all values of X are
attributed the same probability.
In this case, we say that X is uniformly distributed with density
function                         
1

b−a ,   a≤x ≤b
fX (x) =
0,         x ≥ otherwise
a+b
A uniformly distributed rv has mean E (X ) =            2    and variance
(b−a)2
Var (x) =      12 .

Frank Betz (EUI)         Introduction to Probability          September 2007   116 / 124
Exponential distribution

In a Poisson process with parameter λ the waiting time until the
event occurs for the ﬁrst time follows an exponential distribution.
The density function of the exponential distribution is given by

λe −λx , x ≥ 0
fX (x; λ) =
0,       otherwise

1
The exponential distribution has mean E (X ) =            λ   and variance
1
Var (X ) =    λ2
.   The cdf is given by FX (x) = 1 −     e λx .

Frank Betz (EUI)            Introduction to Probability        September 2007   117 / 124
Properties of the exponential distribution

Suppose X ∼ P(λ). Then

P(no event occurs over (0, t)) = e −λt

On the other hand, let Y ∼ Exp(λ). Then

P(event occurs at T > t) = 1 − FY (t) = e −λt

The exponential distribution also exhibits the memoryless property.

Frank Betz (EUI)         Introduction to Probability   September 2007   118 / 124
Example

Example
A study on the lifespan of lightbulbs measures the total number of kilowatt
hours consumed until failure. For a 100 Watt lightbulb, which has lit
already 5 tu (time units), Y represents total kilowatt hours consumed. One
tu equals 1,000 hours. A second random variable X records the remaining
lifespan in tu and is exponentially distributed with parameter λ = 0.1.
a. Compute the distribution of Y .
b. Compute the probability that the lightbulb consumes more that 2,500
kilowatt hours.
c. Suppose that after 20 tu the lightbulb still works. Compute the
probability that it works another 20 tu.

Frank Betz (EUI)        Introduction to Probability   September 2007   119 / 124
Normal distribution

The normal distribution is probably the single most important
distribution in econometrics. This is due mainly to results in
asymptotic theory: Under fairly general conditions many other
distributions converge to the normal distribution.
The normal distribution has pdf

1      (x−µ)2
fX (x; µ, σ 2 ) = √       e − 2σ2
2πσ
The normal distribution has mean E (X ) = µ and variance
Var (X ) = σ 2 . There is no closed form solution for the cdf.

Frank Betz (EUI)        Introduction to Probability        September 2007   120 / 124
Standard normal distribution

A normal distribution with µ = 0 and variance σ 2 = 1 is called
standard normal distribution. The corresponding pdf reads
1    1 2
φ(x) = √ e − 2 x
2π
while the cdf is given by
x
1    1 2
Φ(x) =            √ e − 2 t dt
−∞     2π

Let X ∼ N(µ, σ) with corresponding pdf and cdf. Then ∀x ∈ R we
have fX (x) = σ φ( x−µ ) and FX (x) = Φ( x−µ ).
1
σ                     σ

Frank Betz (EUI)          Introduction to Probability     September 2007   121 / 124
σ-rules

Let X ∼ N(µ, σ 2 ). Then, the ”σ-rules” are given by

P(|X − µ| ≤ σ) = 0.682
P(|X − µ| ≤ 2σ) = 0.974
P(|X − µ| ≤ 3σ) = 0.998

Hence, for practical purposes a normal rv does not deviate more than
three standard deviations from its mean.
If a rv deviates for more than three standard deviations from the
mean it is unlikely to be normal.

Frank Betz (EUI)       Introduction to Probability   September 2007   122 / 124
Examples

Example
Let X ∼ N(0, σ 2 ). Find σ 2 given that P(|X | < 0.5) = 0.3.

Frank Betz (EUI)        Introduction to Probability   September 2007   123 / 124
Example

Example
Suppose the velocity of cars on a certain road is normally distributed.
However, we only know that 21.2% of cars travel faster than 90 km/h, and
that the number of cars driving slowlier than 60 km/h is just ten times
smaller. Compute µ and σ.

Frank Betz (EUI)        Introduction to Probability   September 2007   124 / 124

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 4 posted: 8/30/2011 language: English pages: 125