# Economics 105 Statistics by gabyion

VIEWS: 21 PAGES: 50

• pg 1
```									      Economics 105: Statistics
• Any questions?
• Go over GH1
• Any student information sheets?
Measures of Variation

Variation

Range     Interquartile   Variance      Standard        Coefficient
Range                      Deviation       of Variation

• Measures of variation
give information on
variability of the data
values.
Same center,
different variation
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
data measured in different units

S
CV     100%
X
 
Comparing Coefficient
•   Stock A:
of Variation
– Average price last year = \$50
– Standard deviation = \$5
S            \$5
CVA     100% 
X                 100%  10%
            \$50
Both stocks
• Stock B:                                     have the same
standard
– Average price last year = \$100           deviation, but
stock B is less
– Standard deviation = \$5                  variable relative
to its price
S            \$5
CVB     100% 
X                 100%  5%
           \$100
Z Scores
• A measure of RELATIVE distance from the mean
(for example, a Z-score of 2.0 means that a value
is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
an outlier (BLK p. 96 says +/- 2 std dev is outlier)

XX
Z
S
Z Scores
Example:                                       (continued)

• If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?

X  X 18.5  14.0
Z                   1.5
S       3.0
• The value 18.5 is 1.5 standard deviations above the
mean
• (A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed      Symmetric       Right-Skewed
Mean < Median    Mean = Median     Median < Mean
Using Microsoft Excel
• Descriptive Statistics can be obtained
from Microsoft® Excel
tools / data analysis / descriptive statistics

– Enter details in dialog box
Using Excel
tools / data analysis /
descriptive statistics
Using Excel
(continued)

• Enter dialog box
details
• Check box for
summary
statistics

• Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

\$2,000,000
500,000
300,000
100,000
100,000
Population Parameters vs.
Sample Statistics
• Population parameter                      Sample statistic
n

X
N

 Xi                                              i

X           i 1                     X      i 1

N                                  n
n

(X
N

(X         i    )   2
i        X )2
   2
   i 1                       sX 
2     i 1
X
N                              n 1

Cov( X , Y )                         Cov( x, y )
                                     rxy 
 X Y                             sx s y
The Empirical Rule of Thumb

• If the data distribution is approximately
bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
the population or the sample

68%

μ
μ  1σ
The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
the population or the sample
• μ  3σ contains about 99.7% of the values
in the population or the sample

95%                      99.7%

μ  2σ                     μ  3σ
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
data measured in different units

S
CV     100%
X
 
Comparing Coefficient
•   Stock A:
of Variation
– Average price last year = \$50
– Standard deviation = \$5
S            \$5
CVA     100% 
X                 100%  10%
            \$50
Both stocks
• Stock B:                                     have the same
standard
– Average price last year = \$100           deviation, but
stock B is less
– Standard deviation = \$5                  variable relative
to its price
S            \$5
CVB     100% 
X                 100%  5%
           \$100
Z Scores
• A measure of RELATIVE distance from the mean
(for example, a Z-score of 2.0 means that a value
is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
an outlier (BLK p. 96 says +/- 2 std dev is outlier)

XX
Z
S
Z Scores
Example:                                       (continued)

• If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?

X  X 18.5  14.0
Z                   1.5
S       3.0
• The value 18.5 is 1.5 standard deviations above the
mean
• (A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed      Symmetric       Right-Skewed
Mean < Median    Mean = Median     Median < Mean
Using Microsoft Excel
• Descriptive Statistics can be obtained
from Microsoft® Excel
tools / data analysis / descriptive statistics

– Enter details in dialog box
Using Excel
tools / data analysis /
descriptive statistics
Using Excel
(continued)

• Enter dialog box
details
• Check box for
summary
statistics

• Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

\$2,000,000
500,000
300,000
100,000
100,000
Population Parameters vs.
Sample Statistics
• Population parameter                      Sample statistic
n

X
N

 Xi                                              i

X           i 1                     X      i 1

N                                  n
n

(X
N

(X         i    )   2
i        X )2
   2
   i 1                       sX 
2     i 1
X
N                              n 1

Cov( X , Y )                         Cov( x, y )
                                     rxy 
 X Y                             sx s y
The Empirical Rule of Thumb

• If the data distribution is approximately
bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
the population or the sample

68%

μ
μ  1σ
The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
the population or the sample
• μ  3σ contains about 99.7% of the values
in the population or the sample

95%                      99.7%

μ  2σ                     μ  3σ
Population Parameters vs.
Sample Statistics
• Population parameter                      Sample statistic
n

X
N

 Xi                                              i

X           i 1                     X      i 1

N                                  n
n

(X
N

(X         i    )   2
i        X )2
   2
   i 1                       sX 
2     i 1
X
N                              n 1

Cov( X , Y )                         Cov( x, y )
                                     rxy 
 X Y                             sx s y
The Empirical Rule of Thumb

• If the data distribution is approximately
bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
the population or the sample

68%

μ
μ  1σ
The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
the population or the sample
• μ  3σ contains about 99.7% of the values
in the population or the sample

95%                      99.7%

μ  2σ                     μ  3σ
Chebyshev’s Inequality
• Only mean & variance given
• No distributional assumption!

1
P(  X  k X  X   X  k X )  2
k
Chebyshev’s Inequality
• For any population, given the mean and
variance, at least 100(1-1/k2)% of the population
members lie within k standard deviations of the
mean, k > 1.
?

••    ••    • • • • • •• •     • ••
μ - 2σ          μ            μ+ 2σ

1
P(  X  k X  X   X  k X )  1  2
k
Chebyshev example
• A particular brand of car has lifetimes of
100,000 miles with a standard deviation of 7,000
miles. Find a range in which it can be guaranteed
that 75% of the lifetimes of this brand of car lie.
Covariance
• Cov( X , Y )  E[( X            X   )(Y   Y )]  E[ XY ]   X  Y

• Association
• Linear
• Units
– magnitude depends on units, so use caution in
assessing strength of association
• Sample covariance (BLK p. 107)
n
 n       
    ( xi  x )( y i  y )   xi y i   nx y
Cov( x, y )  s xy    i 1                        i 1    
n 1                     n 1
Correlation
Cov( X , Y )

•           X     Y

• Association – not causality!
• Linear
• Units?
• 1    1
• Sample correlation coefficient                                                     1  r  1
n
 n       
 (x      i    x )( yi  y )                         xi yi   nx y
rxy 
Cov( x, y)
              i 1
          i 1    
sx s y            n                          n
 n 2          n 2
 (x        i    x)   2
(y    i    y)   2
  xi   nx   yi   ny 2
2

i 1                       i 1                       i 1         i 1 
Scatter Plots of Data with
Various Correlation Coefficients
Y                Y                 Y

X                 X             X
r = -1           r = -.6           r=0
Y
Y                Y

X                 X             X
r = +1           r = +.3           r=0
Using Excel to Find
the Correlation Coefficient
• Select
Tools/Data Analysis
• Choose Correlation
• Click OK . . .
Using Excel to Find
the Correlation Coefficient
(continued)

• Input data range and select
appropriate options
• Click OK to get output
Sample Correlation “by hand”
Student        GPA     Study hours GPA*Study GPA^2    Study^2
1     2.3             19     43.7     5.29       361
2     3.4             26     88.4    11.56       676
3    3.75             34    127.5 14.0625       1156
4     3.8             25       95    14.44       625
5     1.7             10       17     2.89       100
6     2.4             20       48     5.76       400
7     2.6             22     57.2     6.76       484
8       3             30       90        9       900
9     3.2             40      128    10.24      1600
10     2.7             33     89.1     7.29      1089
11     2.9             25     72.5     8.41       625
Sum            31.75            284    856.4 95.7025       8016
Mean            2.89          25.82    77.85     8.70    728.73
Covariance 3.667273
variance    0.406045    68.3636364                                 45
std dev     0.637217    8.26823055                                 40
correlation 0.696055    0.69605459                   Study hours   35
30
25
20
15
Source: P:\Economics\Eco 105                                   10
(Statistics)\cov_corr.xls                                       5
0
1.5   2   2.5         3   3.5   4
GPA
Intro to Probability:
Basic Definitions
• Random trials
– multiple outcomes & uncertainty
• Basic outcome
• Sample space
• Event
• Examples: coin toss, die roll, dice roll, deck
of cards, etc.
• Deck of cards will be defined as 52 cards,
13 of each suit (♠♣♥♦), 2, 3, ..., 10, J, K, Q, A
Set Theory
• Venn diagrams
• Union
AB

• Intersection
AB
Set Theory
• A′ is the
complement of A

• A and B are
mutually exclusive
•AB=
Set Theory
• set of events A1,
A 2, A 3 … A N
partitions the sample
space
Rules for Set Operations
•AB=BA                    Commutative
•AB=BA
•AA=A                      Idempotency
•AA=A
• A  A′ = S, A  A′ =    Complementation
• (A  B)′ = A′  B′
• (A  B)′ = A′  B′
Rules for Set Operations
• Associative
•A  (B  C) = (A  B)  C
• A  (B  C) = (A  B)  C
• Distributivie
• A  (B  C) = (A  B)  (A  C)
• A  (B  C) = (A  B)  (A  C)
Fundamental Postulates

• P(A) should satisfy certain postulates

•1: P(A) ≥ 0 [Impossible events cannot occur]
•2: P(S) = 1 [Some outcome must occur]
•3: If A1, A2, A3 … AN are N mutually exclusive
events then
or
Useful Results
• P(A′) = 1 – P(A)
• P() = 0
• If A  B, then P(A) ≤ P(B)
• 0 ≤ P(A) ≤ 1
• P(A  B) = P(A) + P(B) – P(A  B)
– avoid double counting
• P(A  B) = 1 - P(A  B)′ = 1 - P(A′  B′)
Useful Results
• If A1, A2, A3 … AN partition S, then
Example
• Suppose you apply to 3 schools: A, B, and C
• P(accepted @ A) = .20
• P(accepted @ B) = .40
• P(accepted @ C) = .60
• What is the probability of being rejected at all
3?
• What is the probability of being accepted
somewhere?
Conditional Probability
• The conditional probability that A occurs given that B
is known to have occurred is

• Problem 2.5, p. 25
Conditional Probability
• Probability a beginning golfer makes a good shot if she selects
the correct club is 1/3. The probability of a good shot with the
wrong club is 1/5. There are 4 clubs in her golf bag, one of which
is the correct club for the next shot. Club selection is random.

•What is the probability of a good shot?
•Given that she hit a good shot, what is the
probability that she chose the wrong club?

```
To top