# Ch2

Document Sample

```					                                                                 Econ 526/ Fall 2012
Manopimoke/ Chapter 2

Probability Theory & Distributions
I.   Probability theory originated with gambling. Gamblers were attempting to

understand how likely it would be to win a game of chance.

A.     Even though chance is common in the vernacular, it’s difficult to define.

B.     Keeping the idea of gambling in mind, let’s define chance build up to a

definition of probability.

1.     The chance of something happening gives the percentage of time

it’s expected to happen, when the process is done over and over

again, independently under the same conditions.

        the chance of an event lies between 0% and 100% or 0 and

1.

C.     A Random Experiment is the process of observing the outcome of a

chance event.

D.     Elementary Outcomes: are all possible results of a random experiment. In

the book elementary outcomes are known as events and simple events.

E.     The Sample Space: is the set of all the elementary outcomes. Now an

example.

1.     The Random experiment: The coin toss.

2.     The Elementary outcomes: _________________

3.     The Sample Space: ______

4.     Dice:

        elementary outcomes: _____________

        sample space:_______________

1
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

F.     Random Variable: numerical summary of a random outcome

   Discrete random variable – a random variable that takes on a

discrete set of values: 0, 1,

   Continuous random variable – a random variable that takes on a

continuum of possible values or an interval of values

G.     Probability: a numerical weight that measures the likelihood of each

elementary outcome. 0 < P(X) <=1

H.     Examples of Classical Probability:

1.       For a Coin:

2.       For a Die:

       P(X)>=0. Probabilities are non-negative.

       Probability of the entire sample space = 1.

J.     Events: Sets of elementary outcomes. The probability of an event is the

sum of the probabilities of elementary outcomes in the set. Consider the

roll of 2 dice (black and white):

Event Description             Event’s Elementary Outcomes                   Probability
White Die shows 1

2
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

I.   Working with Probabilities

1.     To find the chance that at least one of two things will happen,

check to see if they are mutually exclusive. If they are, add the

chances.

      Example: A card is dealt from a shuffled deck. What is the

chance that the card is either a heart or a spade?

2.     Throw a pair of dice. The chance of getting at least one die = 1

B.     Conditional Probability: The probability that one event occurs given the

condition that another event has already occurred. P(A|C)

1.     A Card Game: 52 card deck (no jokers). 4 suits of 13 cards each.

Shuffle the deck of cards and the two top cards are put down on the

table. You win \$1 if the second card is a queen of hearts.

3
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

      What is the probability of winning the dollar?

      You turn over the first card and it’s a seven of clubs. Now

what is your chance of winning a \$1?

2.     The second example is called a conditional chance. Given that the

first card is not what we want, what’s the probability of getting

what we want, the queen of hearts?

3.     A New experiment: We roll the white die = 1. Now what’s the

probability that the 2 dice sum to 3?

4.     A Formal definition of Conditional Probability:

P(E|F) = P(E and F) / P(F).

      Asides: P(E|E)=1; P(E|F)=0 when E and F are mutually

exclusive.

C.   The Multiplication Rule: P(E and F) = P(E|F)*P(F)

1.     Another experiment: using the same card game above, what is the

chance that the first card is the seven of clubs and the second card

is the queen of hearts?

2.     Another experiment: a deck of cards is shuffled and two cards are

dealt. What is the chance that they are both aces?

4
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

      In both examples above, the chance of second event was

affected by the outcome of the first event. In other words,

the conditional probability figured into the outcome of both

events occuring.

      A chance that two things will both happen equals the

chance that the first will happen, multiplied by the chance

that the second will happen given that the first has

happened. (The conditional probability of the second

multiplied by the probability of the first).

3.   An example of Independence: A coin is tossed twice. What is the

chance of a head followed by a tail?

4.   Independence: Two events are independent if the occurrence of

one has no influence on the probability of the other.

      in terms of conditional probability:

      when 2 events are independent, they have a special

multiplication rule:

      An example: Event C=white die=1; Event D=black die=1.

P(C|D)=P(C and D) / P(D);

5
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

        But the probability of the white die showing 1 affects the

probability of the dice summing to 3

5.      More intuition on independence: Say we have 3 tickets in a box:

red, white and blue. 2 tickets are drawn with replacement.

Replacement means you put the first ticket drawn back into the

box. What is the chance of drawing the red ticket and then the

white?

6.      Suppose instead you draw 2 tickets without replacement (there’s a

conditional probability here). What is the chance of drawing the

red and then the white?

I.   Preliminaries on Probability Distributions:

A.     Counting: suppose we can get one of two outcomes from a random

experiment (coin toss) or one of 6 outcomes if we roll a die. How many

possible combinations of coin flips and die rolls can we get?

1.      In general 2 possible outcomes for coin toss = m

2.      6 possible outcomes for die = n

3.      Total combinations= m*n

B.     Factorial Rule--given n different items, they can be arranged in n!

different ways. 5! = 5*4*3*2*1=120

6
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

C.    Permutations: the number of sequences of k items selected from n total

n!
without replacement and order is important is given by:
(n  k )!

D.    Combinations: the number of combinations of of k items selected from n

total without replacement when order is unimportant is given by:

n!
(Also known as the binomial coefficient)
(n  k )! k !

II.   Examples:

1.     A PIN code at your bank is made up of 4 digits, how many

combinations are possible?

2.     There are 10 entries in a contest. Only three will win, 1st, 2nd, or

3rd prize. What are the possible results?

3.     A softball league has 7 teams, what are the possible ways of

ranking the teams?

7
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

4.     From a group of 4 people, 3 are selected to form a

committee. How many combinations are there?

5.     How many different combinations of 3 girls and 4 boys can you

get?

III.   Quick Mathematics Review

A.    Some of the Rules (let k be a constant and X and Y be variables with

multiple observations going from i=1, 2, .... N)

N                 N
       Rule 1:   å kX      i   = k å Xi
i=1               i=1

N                          N       N
       Rule 2:    å ( X + Y ) = å X + åY
i   i                 i         i
i                      i=1         i=1

N
       Rule 3:   å k = kN
i=1

N
       Rule 4:   å( X - X ) = 0
i
i

N
1 N
       Rule 5:    å( Xi - X)(Yi - Y ) = N å XiYi + XY
i                       i=1

N
1 N 2
å( X - X )                       å Xi - X 2
2
       Rule 6:                 i             =
i                         N i=1

8
Econ 526/ Fall 2012
Manopimoke/ Chapter 2
N       N              N               N
       Rule 7:         X Y
i       j
i   j    (  X i )(  Yj )
i 1            j 1

N        N                         N   N            N     N
       Rule 8:     ( X ij  Yij )   X i    Yij )
i       j                      i 1 j 1           i 1 j 1

These are important for doing the math involved in econometrics.

IV.   Probability distributions

A.     Definitions

1.      Random Variable-- a variable that has a single numerical value for

each outcome of an experiment

       Discrete--countable and finite number of outcomes

       Continuous--infinite number of outcomes

2.      Probability Distribution--the probability of each value of the

random variable (similar to a relative frequency)

        P( x )  1

       0  P(x)  1

3.      Each probability distribution has a mean and standard deviation

denoted by the following formulas:

          xP( x ) = the mean

        2   ( x   ) 2 P( x ) = the variance

4.      The expected value of a random variable is in general, the long run

average value of the random variable over many repeated trials. It

is computed as a weighted average of the possible outcomes of that

9
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

random variable where the weights are the probability of that

outcome.

           E ( x )   xP( x )

where  x  ( x) mean

5.        The variance of a random variable x denoted Var(x) is the

expected value of the square of the deviation of X from its mean

    Var(x) =

I.   Probability distributions may be summarized by measures of central tendency and

A. Mean/Median:

B. Variance:

C. Algebra of Expectations:

1.     Let x, y random variables

2.     Let a , b constants

3.     The following are properties of expectation operators

       E  a  a

       E (ax)  a E ( x)

       E (ax  b)  a E ( x)  b

10
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

    E ( x  y)  E ( x)  E ( y)

D. Expectation Operators for the Variance:

1.    Var(x) =

2.    Standard deviation of (x) =  2   x
x

3.    Var(a) = 0 A constant, a, has no variance

Proof:

E. Properties of the Variance using Expectation Operators

Var(x + a) = Var x = E[((x + a) - E(x + a))2 ]
= E[(x - Ex)2 + (a - Ea)2 ]
1.
= Var x

Var(ax) = a 2Var(x)
= E[(ax - E(ax))2 ]
= E(a 2 (x - E(x)2 )
2.
= a 2Var x

11
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

F.   Covariance measures linear dependence between x and y i.e. the extent in which

two random variables move together.

1.     Covariance: s xy

2.     Cov( x, y)  E[( x  E ( x))( y  E ( y))] or E( XY)  E( X ) E(Y)

3.     The variance of the sum of two random variables x and y is the sum of

their variances plus two times their covariance.

Var ( x  y )  Var x  Var y  2 Cov xy
 E[( x  y )  E ( x  y )]2
 E[(( x  Ex )  ( y  Ey )) 2 ]
 E[( x  y )  E ( x )  E ( y )]2
 E[ x  E ( x )]2  E[ y  E ( y )]2  2 E[( x  E ( x ))(( y  E ( y ))]

Note that if x and y are independent, Var(x+y) = Var(x) +Var(y)

    E(X*Y) E(X)*E(Y) in most cases.

    Random variables are said to be independent if the probability of

one occurring has no effect on the probability of the second

occurring.

(1) When two variables are independent:

(2)   x is orthogonal to y

(3) E(X*Y)=E(X)*E(Y)

12
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

s xy
4.      The Correlation coefficient is a function of the covariance: r xy =
s xs y

     s xs y = Var(x) Var(y)

     -1 £ r £1 : the correlation coefficient lies between zero and 1.

    ρ = 1  positive linear association

     ρ = -1  negative linear association

    ρ = 0  no linear association

5.      The Conditional Expectation of Y given X (also called the conditional

mean)

     E(Y | X = x) is the mean value of Y when X takes on the value x.

6.      Law of Iterated Expectations

     E(Y )= E [ E(Y | X)]--the mean of Y is equal to the weighted average

of the conditional expectation of Y given all values of X.

n
     E(Y )= E [ E(Y | X)] = å E(Y | X = xi )Pr(X = xi )
i=1

I.   Moments of Random Variable – measures the shape of a distribution

A.   E[( x ) r ]  r th moment , r>0

B. Mean = E[(x)1 ]the first moment.

C. Variance = E(x - E(x))2             2nd moment

13
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

E(x - E(x))3
D. Skewness =                       3rd moment
s       3
x

E(x - E(x))4
E. Kurtosis =                      4th moment
s   4
x

All mean 0 or variance 1

II. The Normal Distribution: a very useful probability distribution. According to G.

Lippmann a French Physicist: Everybody believes in the [normal approximation],

the experimenters because they think it is a mathematical theorem, the

mathematicians because they think it’s an experimental fact.

14
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

A. Properties: bell shaped; symmetric; a continuous distribution based on a

1 -2
2
z
formula (that you won’t be required to know or understand): f (z) =    e ;
2p

as z get’s quite large, f(z) approaches zero; takes a max at z = 0.

E(x) = m x
1.     Where,
Var x = s x
2

B. The Standard Normal Distribution

1.     Random variables that have a N(0,1) distribution is denoted Z called the

standard normal distribution

Z : mz = 0     s z2 = 1

x-mx
Z=
2.     Where we define Z as follows:         sx
X = mx + s x Z

X is standardized by subtracting the mean and dividing by the standard

deviation. E.g. What is Pr(X<=2) when X~N(1,4)?

3.     The sum of two normal distributions is a normal distribution

4.     Density curves: the graph of a continuous probability distribution

satisfying the following properties

    Total area underneath the graph sums to 1

    Every point has a height >=0.

    Probability density function (pdf): Area under the pdf between two

points is the probability that a random variable falls between those

two points

15
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

    Cumulative distribution function (cdf or Ф): the probability that a

random variable is less than or equal to a particular value i.e.

Pr(Z<=c) = Ф(c)

5.     A continuous random variable has a normal distribution if that

distribution is symmetric and bell-shaped.

C. Why do we focus on the Normal Distribution??? Data influenced by many

small and unrelated random effects are approximately normally distributed.

This fact is the key to the rest of the course. Assuming random errors, and large

samples, everything is normally distributed. Thus we can use the normal

distribution (a theoretical concept) to examine the statistical relationship

between data [what we’ll be doing for the remainder of the course. ] Because in

the long run, everything approaches the normal distribution.

D. Facts about the normal distribution:

1.     It’s a bell-shaped curve; symmetric; the area underneath the curve sums

to 1; the expected value (average) pins the center of the probability

histogram to the horizontal axis and the standard error fixes it’s spread.

E. We will be working with the Standard Normal distribution: a very special

normal distribution. The mean = 0 the SD = 1. Any normal random variable

can be converted into a standard normal random variable--where we use the

standard normal distribution:

x-m
1.     The z transformation: z =          ; subtract the mean from the random
s

variable and divide by the standard deviation. This converts every

16
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

normal distribution into a standard normal distribution. (You'll be using

this method for the remainder of the semester).

2.   Suppose we have a normal distribution, then we can convert it to the

standard normal as long as we know the mean and the standard

deviation. Once we’ve converted it to the standard normal, we can

determine probabilities by looking them up in a standard normal

probability table.

   The probability table gives us the z-score and F(z)the probability of

being at a location in the standard normal distribution (see handout)

   The formula for this F(a) = Pr(z<=a);

   What about the area to the right z = a?

   We can find the probability of z being in any interval a<=z<=b.

   Rule of thumb: when in doubt, draw a picture of what you’re trying

to find!

3.   Examples:

4.   Let’s assume the weights of students are normally distributed with mean

= 150 and standard deviation =20. What’s the probability of weighing

more than 170 lbs?

17
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

5.     The general rule for computing normal probabilities:

         Pr( a <= x <= b)

F.   Confidence Intervals from the normal distribution are given by:

I.   Chi-Squared Distribution

A. given the following sequence of random numbers, { xi ,… i = 1, n} ~ N(0,1)

n
B. Then    åX
i=1
i
2
~ c 2 (n) with n degrees of freedom.

1.     The sum of squares of N independently distributed standard normal

random variables is distributed 2 with N degrees of freedom.

2.     Useful for testing hypotheses that deal with variances.

3.     We can show that the variance is distributed chi-squared with N-1

degrees of freedom,

1 n
ˆ
S2 =      å (xi - x )2, x ~ N(0,1)
n -1 i=1
ˆ
S2     n
æ xi - x ö
2

(n -1) 2 = å ç           ÷
s   è i=1   sø

where the second equation is just the standard normal squared and

summed. So the chi-square is the sum of a squared normal distribution.

18
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

II. T-distribution

A. Let Z~N(0,1) and W~2(N), Z and W are independent random variables. Then

Z
~ t(N ) . Then the ratio has a t-distribution with N degrees of freedom.
W
N

1.           t-distribution has a slightly larger variance than the normal distribution

but the same bell shape.

III. F-Distribution:

A. X,Y two independent random variables distributed 2(n(1)), and 2(n(2)), then

X
n1
the ratio is distributed F with n(1), n(2) degrees of freedom:                ~ F(n1, n2 ) .
Y
nx

1.           The relationship between the F and t distributions is given as follows:

[t(N)]2 ~ F(N,1) .

Random Sampling and Distribution of Mean

III. Importance of Random Samples

1.           N independent draws from the same population

2.           iid = independent and identically distributed.

19
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

   Most of the theoretical results in econometrics and statistics rely on

the iid assumption.

   A sequence/collection of random variables are iid. if each random

variable has the same probability distribution.

3.      In simple random sampling, n objects are drawn at random from a

population with an equal probability of being selected.

IV. Characteristics of a Sampling Distribution

A. The difference between an estimator and an estimate

1.      Estimator: The general rule for getting an “estimate” (some number)

from a population

2.      Estimate: some number that’s a function of an estimator

3.      Example: The Sample Average is an Estimator

1 N
    X=       å Xi
N i=1

   Note: X is a function of the random variable X, and is also a

random variable since the value of X differs from one sample to the

next.

   The sample mean is an estimator of the population random variable.

Since it is a random variable it has a probability distribution which

is called the sampling distribution.

20
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

The mean and variance of the sampling distribution of X

   {x1 , x2 , x3 xn } are i.i.d. draws from the same population i.e. each

xi has the same marginal distribution.

   The sample mean X is denoted as

n
X =1/n(x1+x2+…+xn) =       x
i 1
i

   Since X is a random variable, it has a sampling distribution. What

is it?

21
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

V. Large Sample Approximations of Sampling Distribution.

For small sample sizes, the distribution of Y is complicated, but if n is large, the
sampling distribution is simple!

1.     As n increases, the distribution of Y becomes more tightly centered around Y
(the Law of Large Numbers)

2.    Moreover, the distribution of Y – Y becomes normal (the Central Limit
Theorem)

These two theorems provide enormous simplifications in empirical analysis.

22
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

The Law of Large Numbers:

An estimator is consistent if the probability that its falls within an interval of the true
population value tends to one as the sample size increases.
If (Y1,…,Yn) are i.i.d. and  2 < , then Y is a consistent estimator of Y, that is,
Y

Pr[|Y – Y| < ]  1 as n  
p
which can be written, Y     Y
p
(“Y    Y”    means “Y converges in probability to Y”).
(the math: as n  , var(Y ) =     Y2  0, which implies that Pr[|Y – Y| < ]  1.)
n

23
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

The Central Limit Theorem (CLT):

If (Y1,…,Yn) are i.i.d. and 0 <  2 < , then when n is large the distribution of Y is well
Y
approximated by a normal distribution.
      Y is approximately distributed N(Y,  Y ) (“normal distribution with mean Y
2

n
and variance  2 /n”)
Y

       n (Y – Y)/Y is approximately distributed N(0,1) (standard normal)
      The “standardized” Y =            =           is approximately distributed N(0,1)
Y  E (Y ) Y      Y

var(Y )    Y / n
      The larger is n, the better is the approximation. How large does n have to be?

24
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

25
Econ 526/ Fall 2012
Manopimoke/ Chapter 2

Summary: The Sampling Distribution of Y
For Y1,…,Yn i.i.d. with 0 <  2 < ,
Y

      The exact (finite sample) sampling distribution of Y has mean Y (“Y is an
unbiased estimator of Y”) and variance  2 /n
Y

     Other than its mean and variance, the exact distribution of Y is complicated and
depends on the distribution of Y (the population distribution)
      When n is large, the sampling distribution simplifies:
p
o              Y  Y (Law of large numbers)
o                            is approximately N(0,1) (CLT)
Y  E (Y )
var(Y )

26

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 1 posted: 11/21/2012 language: Latin pages: 26