Economics 105 Statistics
Document Sample


Economics 105: Statistics
• Any questions?
• Go over GH1
• Any student information sheets?
Measures of Variation
Variation
Range Interquartile Variance Standard Coefficient
Range Deviation of Variation
• Measures of variation
give information on
the spread or
variability of the data
values.
Same center,
different variation
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
data measured in different units
S
CV 100%
X
Comparing Coefficient
• Stock A:
of Variation
– Average price last year = $50
– Standard deviation = $5
S $5
CVA 100%
X 100% 10%
$50
Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable relative
to its price
S $5
CVB 100%
X 100% 5%
$100
Z Scores
• A measure of RELATIVE distance from the mean
(for example, a Z-score of 2.0 means that a value
is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
an outlier (BLK p. 96 says +/- 2 std dev is outlier)
XX
Z
S
Z Scores
Example: (continued)
• If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?
X X 18.5 14.0
Z 1.5
S 3.0
• The value 18.5 is 1.5 standard deviations above the
mean
• (A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean
Using Microsoft Excel
• Descriptive Statistics can be obtained
from Microsoft® Excel
– Use menu choice:
tools / data analysis / descriptive statistics
– Enter details in dialog box
Using Excel
Use menu choice:
tools / data analysis /
descriptive statistics
Using Excel
(continued)
• Enter dialog box
details
• Check box for
summary
statistics
• Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Population Parameters vs.
Sample Statistics
• Population parameter Sample statistic
n
X
N
Xi i
X i 1 X i 1
N n
n
(X
N
(X i ) 2
i X )2
2
i 1 sX
2 i 1
X
N n 1
Cov( X , Y ) Cov( x, y )
rxy
X Y sx s y
The Empirical Rule of Thumb
• If the data distribution is approximately
bell-shaped, then the interval:
• μ 1σ contains about 68% of the values in
the population or the sample
68%
μ
μ 1σ
The Empirical Rule of Thumb
• μ 2σ contains about 95% of the values in
the population or the sample
• μ 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
data measured in different units
S
CV 100%
X
Comparing Coefficient
• Stock A:
of Variation
– Average price last year = $50
– Standard deviation = $5
S $5
CVA 100%
X 100% 10%
$50
Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable relative
to its price
S $5
CVB 100%
X 100% 5%
$100
Z Scores
• A measure of RELATIVE distance from the mean
(for example, a Z-score of 2.0 means that a value
is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
an outlier (BLK p. 96 says +/- 2 std dev is outlier)
XX
Z
S
Z Scores
Example: (continued)
• If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?
X X 18.5 14.0
Z 1.5
S 3.0
• The value 18.5 is 1.5 standard deviations above the
mean
• (A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean
Using Microsoft Excel
• Descriptive Statistics can be obtained
from Microsoft® Excel
– Use menu choice:
tools / data analysis / descriptive statistics
– Enter details in dialog box
Using Excel
Use menu choice:
tools / data analysis /
descriptive statistics
Using Excel
(continued)
• Enter dialog box
details
• Check box for
summary
statistics
• Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Population Parameters vs.
Sample Statistics
• Population parameter Sample statistic
n
X
N
Xi i
X i 1 X i 1
N n
n
(X
N
(X i ) 2
i X )2
2
i 1 sX
2 i 1
X
N n 1
Cov( X , Y ) Cov( x, y )
rxy
X Y sx s y
The Empirical Rule of Thumb
• If the data distribution is approximately
bell-shaped, then the interval:
• μ 1σ contains about 68% of the values in
the population or the sample
68%
μ
μ 1σ
The Empirical Rule of Thumb
• μ 2σ contains about 95% of the values in
the population or the sample
• μ 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Population Parameters vs.
Sample Statistics
• Population parameter Sample statistic
n
X
N
Xi i
X i 1 X i 1
N n
n
(X
N
(X i ) 2
i X )2
2
i 1 sX
2 i 1
X
N n 1
Cov( X , Y ) Cov( x, y )
rxy
X Y sx s y
The Empirical Rule of Thumb
• If the data distribution is approximately
bell-shaped, then the interval:
• μ 1σ contains about 68% of the values in
the population or the sample
68%
μ
μ 1σ
The Empirical Rule of Thumb
• μ 2σ contains about 95% of the values in
the population or the sample
• μ 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Chebyshev’s Inequality
• Only mean & variance given
• No distributional assumption!
1
P( X k X X X k X ) 2
k
Chebyshev’s Inequality
• For any population, given the mean and
variance, at least 100(1-1/k2)% of the population
members lie within k standard deviations of the
mean, k > 1.
?
•• •• • • • • • •• • • ••
μ - 2σ μ μ+ 2σ
1
P( X k X X X k X ) 1 2
k
Chebyshev example
• A particular brand of car has lifetimes of
100,000 miles with a standard deviation of 7,000
miles. Find a range in which it can be guaranteed
that 75% of the lifetimes of this brand of car lie.
Covariance
• Cov( X , Y ) E[( X X )(Y Y )] E[ XY ] X Y
• Association
• Linear
• Units
– magnitude depends on units, so use caution in
assessing strength of association
• Sample covariance (BLK p. 107)
n
n
( xi x )( y i y ) xi y i nx y
Cov( x, y ) s xy i 1 i 1
n 1 n 1
Correlation
Cov( X , Y )
• X Y
• Association – not causality!
• Linear
• Units?
• 1 1
• Sample correlation coefficient 1 r 1
n
n
(x i x )( yi y ) xi yi nx y
rxy
Cov( x, y)
i 1
i 1
sx s y n n
n 2 n 2
(x i x) 2
(y i y) 2
xi nx yi ny 2
2
i 1 i 1 i 1 i 1
Scatter Plots of Data with
Various Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Using Excel to Find
the Correlation Coefficient
• Select
Tools/Data Analysis
• Choose Correlation
from the selection menu
• Click OK . . .
Using Excel to Find
the Correlation Coefficient
(continued)
• Input data range and select
appropriate options
• Click OK to get output
Sample Correlation “by hand”
Student GPA Study hours GPA*Study GPA^2 Study^2
1 2.3 19 43.7 5.29 361
2 3.4 26 88.4 11.56 676
3 3.75 34 127.5 14.0625 1156
4 3.8 25 95 14.44 625
5 1.7 10 17 2.89 100
6 2.4 20 48 5.76 400
7 2.6 22 57.2 6.76 484
8 3 30 90 9 900
9 3.2 40 128 10.24 1600
10 2.7 33 89.1 7.29 1089
11 2.9 25 72.5 8.41 625
Sum 31.75 284 856.4 95.7025 8016
Mean 2.89 25.82 77.85 8.70 728.73
Covariance 3.667273
variance 0.406045 68.3636364 45
std dev 0.637217 8.26823055 40
correlation 0.696055 0.69605459 Study hours 35
30
25
20
15
Source: P:\Economics\Eco 105 10
(Statistics)\cov_corr.xls 5
0
1.5 2 2.5 3 3.5 4
GPA
Intro to Probability:
Basic Definitions
• Random trials
– multiple outcomes & uncertainty
• Basic outcome
• Sample space
• Event
• Examples: coin toss, die roll, dice roll, deck
of cards, etc.
• Deck of cards will be defined as 52 cards,
13 of each suit (♠♣♥♦), 2, 3, ..., 10, J, K, Q, A
Set Theory
• Venn diagrams
• Union
AB
• Intersection
AB
Set Theory
• A′ is the
complement of A
• A and B are
mutually exclusive
•AB=
Set Theory
• set of events A1,
A 2, A 3 … A N
partitions the sample
space
Rules for Set Operations
•AB=BA Commutative
•AB=BA
•AA=A Idempotency
•AA=A
• A A′ = S, A A′ = Complementation
• (A B)′ = A′ B′
• (A B)′ = A′ B′
Rules for Set Operations
• Associative
•A (B C) = (A B) C
• A (B C) = (A B) C
• Distributivie
• A (B C) = (A B) (A C)
• A (B C) = (A B) (A C)
Fundamental Postulates
• P(A) should satisfy certain postulates
•1: P(A) ≥ 0 [Impossible events cannot occur]
•2: P(S) = 1 [Some outcome must occur]
•3: If A1, A2, A3 … AN are N mutually exclusive
events then
or
Useful Results
• P(A′) = 1 – P(A)
• P() = 0
• If A B, then P(A) ≤ P(B)
• 0 ≤ P(A) ≤ 1
• P(A B) = P(A) + P(B) – P(A B)
– avoid double counting
• P(A B) = 1 - P(A B)′ = 1 - P(A′ B′)
Useful Results
• If A1, A2, A3 … AN partition S, then
Example
• Suppose you apply to 3 schools: A, B, and C
• P(accepted @ A) = .20
• P(accepted @ B) = .40
• P(accepted @ C) = .60
• What is the probability of being rejected at all
3?
• What is the probability of being accepted
somewhere?
Conditional Probability
• The conditional probability that A occurs given that B
is known to have occurred is
• Problem 2.5, p. 25
Conditional Probability
• Probability a beginning golfer makes a good shot if she selects
the correct club is 1/3. The probability of a good shot with the
wrong club is 1/5. There are 4 clubs in her golf bag, one of which
is the correct club for the next shot. Club selection is random.
•What is the probability of a good shot?
•Given that she hit a good shot, what is the
probability that she chose the wrong club?
Get documents about "