Oregon Property Tax Estimator by yvk25305

VIEWS: 4 PAGES: 66

• pg 1
```									Business Statistics I
MGT 515
Intro Statistics
Simple examples
• Should male drivers be charged higher auto
– Types of cars driven
• What is the probability of mortgage default?
– How is it conditioned on the macroeconomic
environment such as unemployment, GDP….
• Do lower taxes attract businesses and create
employment growth?
Population versus Sample
• Observing the entire population is too costly
– Drawing a small sample out of the population usually
significantly reduces the cost
• Population
– All observations
• All US drivers constitute the population of US drivers
• Sample
– A small selection drawn from the population
• You, the students in this class, constitute a small sample of US
drivers.
• Observation
– Each driver represents an observation
– Each observation contains information on one or more
variables
• Driver information: gender, age, type of car driven….
– Variables: GENDER, AGE, TYPE OF CAR
Population versus Sample
• Parameter
– Population characteristic
– Population mean is a parameter
• Average age of US drivers
• Statistic
– Sample characteristic
– Sample mean is a statistic
• Average age of the drivers in our class
Types of variables
• Continuous
– Age of the driver
– Miles driven per year
• Discrete Variables (integer count)
– Number of cars in the household
• Categorical (Qualitative)
– Driver Gender
– Type of vehicle (sedan versus SUV)
• Treated as binary variables
– Quantifying qualitative variables
• Ranked (ordered) data
– Censored data
• Income (given in brackets rather than in the exact values)
US States
• Population: 50 States + DC
• Each State is an Observation
• Sample would be a selection of several states
High Income                       Private       Private
High Income       Tax Rate                      Employment    Employment    Employment
Observation           State        Tax Rate          Bracket     Sales Tax Rate    1999 (000)    2008 (000)     growth %   No Income Tax
1            ALABAMA               5             3,000             4           1568.7        1610.9       2.6901256         0
2             ALASKA               0                               0            204.1         239.5       17.344439         1
3            ARIZONA             4.54          150,000            5.6           1809         2182.8        20.66335         0
4           ARKANSAS               7            31,000             6            954.3         989.9       3.7304831         0
5          CALIFORNIA            9.3            44,815           7.25          11752.4       12474.8      6.1468296         0
6           COLORADO             4.63                             2.9          1804.3        1965.2       8.9175858         0
7         CONNECTICUT              5            10,000             6            1434         1447.1       0.9135286         0
8           DELAWARE             5.95           60,000             0            357.8         370.5        3.549469         0
9            FLORIDA               0                               6           5850.4        6635.7       13.423014         1
10            GEORGIA               6             7,000             4            3265          3409        4.4104135         0
11              HAWAII            8.25           48,000             4            422.2         494.2       17.053529         0
12              IDAHO             7.8            24,736             6            433.7         529.2       22.019829         0
13             ILLINOIS             3                             6.25          5132.9        5093.3      -0.7714937         0
14             INDIANA            3.4                               6           2572.4        2518.3      -2.1030944         0
15               IOWA             8.98           62,055             5           1229.1        1270.3       3.3520462         0
16             KANSAS             6.45           30,000            5.3          1088.8        1130.9       3.8666422         0
17           KENTUCKY               6            75,000             6           1494.3        1531.6        2.496152         0
18           LOUISIANA              6            25,000             4           1523.7        1576.7        3.478375         0
19              MAINE             8.5            19,450             5            489.6         511.8       4.5343137         0
20           MARYLAND             5.5           500,000             6           1947.5        2111.1       8.4005135         0
21        MASSACHUSETTS           5.3                               5           2814.5        2847.9       1.1867117         0
22            MICHIGAN            4.35                              6           3917.7        3511.3      -10.373433         0
23          MINNESOTA             7.85           71,591            6.5          2225.6        2340.7       5.1716391         0
24          MISSISSIPPI             5            10,000             7            926.1         898.8      -2.9478458         0
25            MISSOURI              6             9,000          4.225          2305.5        2346.3       1.7696812         0
26            MONTANA             6.9            14,900             0            301.4         358.5       18.944924         0
27           NEBRASKA             6.84           27,001            5.5           742.5         800.8       7.8518519         0
28             NEVADA               0                              6.5           865.6        1104.7       27.622458         1
29        NEW HAMPSHIRE             0                               0            524.2         550.9       5.0934758         1
30          NEW JERSEY            8.97          500,000             7           3323.5        3407.1       2.5154205         0
31          NEW MEXICO            5.3            16,000             5            549.4         649.2       18.165271         0
32           NEW YORK             6.85           20,000             4           7013.6        7282.7       3.8368313         0
33        NORTH CAROLINA          7.75           60,000           4.25          3252.1        3422.5       5.2396913         0
34         NORTH DAKOTA           5.54          349,701             5            252.7         290.9       15.116739         0
35               OHIO             6.24          200,000            5.5          4791.4        4571.5      -4.5894728         0
36           OKLAHOMA             5.5             8,701            4.5          1170.3        1270.1       8.5277279         0
37            OREGON                9             7,300             0           1313.6         1422        8.2521315         0
38         PENNSYLVANIA           3.07                              6           4870.5        5051.6       3.7183041         0

39          RHODE ISLAND      25% of federal                        7           402.1         418.3       4.0288485         0
40         SOUTH CAROLINA           7            13,350             6            1515         1583.3      4.5082508         0
41          SOUTH DAKOTA            0                               4             300         335.4           11.8          1
42           TENNESSEE              0                               7           2295.1         2350       2.3920526         1
43               TEXAS              0                             6.25          7625.5        8839.8      15.924202         1
44               UTAH               5                             4.65            869         1043.2       20.04603         0
45            VERMONT             9.5           357,700             6           243.9         252.1       3.3620336         0
46             VIRGINIA           5.75           17,000             5           2801.1        3062.9      9.3463282         0
47           WASHINGTON             0                              6.5          2174.4         2414       11.019132         1
48          WEST VIRGINIA         6.5            60,000             6           585.1         614.4        5.007691         0
49            WISCONSIN           6.75          145,460             5           2385.1        2449.7      2.7084818         0
50            WYOMING               0                               4           173.6           229       31.912442         1
51        DIST. OF COLUMBIA       8.5            40,000           5.75          404.9         470.2       16.127439         0
Visually Presenting The Data
Employment growth %
35
30
25
20                                                                                  Private Employment Growth by State
15
10
1999 – 2008 as % of State’s starting
5                                                                                  employment values
0
-5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
-10
-15

States by Growth Category
35

30

25

20
Frequency Chart                         15
Large categories                        10

5

0
<0          0<.<10          10<
Frequency Distribution
P.D.F.           C.D.F.

Category        Count      Proportion   Cumulative                                                                Proportion
-10..-5          1        0.019608      0.019608                                 0.45
-5..0           4        0.078431      0.098039                                  0.4
0..5          20        0.392157      0.490196                                 0.35
5..10          11        0.215686      0.705882                                  0.3
10..15           3        0.058824      0.764706                                 0.25
15..20           7        0.137255      0.901961                                  0.2
20..25           3        0.058824      0.960784                                 0.15
25..30           1        0.019608      0.980392                                  0.1
30..50           1        0.019608          1                                    0.05
51            1                                                    0
-10..-5   -5..0   0..5   5..10   10..15 15..20 20..25 25..30 30..50
P.D.F. Point (Probability) Density Function

C.D.F. Cumulative Density Function
Cumulative
Count                                               1.2
25
1
20                                                                                      0.8

0.6
15
0.4                                                              Cumulative
10
0.2

5                                                                                        0

0
-10..-5   -5..0   0..5    5..10   10..15    15..20   20..25   25..30   30..50
P.D.F. and C.D.F. Uniform Distribution

Raw Data                        Sorted Data             P.D.F       C.D.F.
Observation ID        X   Observation ID          X      Count   Proportion
1               4         2                 3       1      0.142857     0.142857
2               3         1                 4       1      0.142857     0.285714
3               5         3                 5       1      0.142857     0.428571
4               7         5                 6       1      0.142857     0.571429
5               6         4                 7       1      0.142857     0.714286
6               9         7                 8       1      0.142857     0.857143
7               8         6                 9       1      0.142857        1

Range of the distribution: Maximum – Minimum = 9 – 3
Number of Observations: 7
P.D.F. = 1/7 = 1/N, as each observation has equal weight of one
C.D.F. = (X – min+1)/(max-min+1) Discrete Variable Case
P.D.F. of the Normal Distribution

( x )   2
1          
f ( x)                   2 2
e
 2
Descriptive Measures
• The Mean
– Arithmetic Mean
– Weighted Mean
• Consumer Price Index
• The Median
– Center of the sorted (ranked) distribution
– If the number of observations is odd, the middle ranked
value is the median
– If the number of observations is even, the median is the
average of the two middle observations
• Mode
– Most frequent occurrence
Geometric Mean
x  ( x1  x2  ....  xn )                       1/ n

One application is in the computation of average rate of return

1  i  (1  i1 )  ...(  in )
1/ n
1

interest          1+int

0.05            1.05                          1.306901

0.06            1.06 Geometric Mean           1.054991

0.05            1.05

0.06            1.06

0.055              1.055
Employment
Observation
22
35
24
growth %
-10.37
-4.59
-2.95
Splitting the data into
14           -2.10
13           -0.77
QUARTILES

First Quartile
7             0.91
21            1.19
25            1.77
42            2.39
17            2.50
30            2.52
1             2.69
49            2.71
15            3.35
45            3.36
18            3.48
Where i represents the ith quartile
8             3.55
i (n  1)
Second Quartile

Qi 
38            3.72
4             3.73                                          and n represents the number of
32
16
3.84
3.87                                4         RANKED observations
39            4.03
10            4.41
40            4.51
19            4.53
48            5.01
29            5.09
23            5.17
33            5.24
5             6.15
Third Quartile

27            7.85
37            8.25                       Interquartile Range: Q3 – Q1: the middle 50% of the
20            8.40
36            8.53                       ranked observations
6             8.92
46            9.35
47           11.02
41           11.80
9            13.42
34           15.12
43           15.92
51           16.13
11           17.05
Fourth Quartile

2            17.34
31           18.17
26           18.94
44           20.05
3            20.66
12           22.02
28           27.62
50           31.91
Shape of the Distribution
• Mean
• Median
Frequency                                        PDF
Category   INTC   HPQ   T      VZ   CAT   SO     INTC     HPQ        T         VZ    CAT      SO

-80        1     0     0    0      0    0     0.008333     0        0        0        0       0
-75        0     0     0    0      0    0         0        0        0        0        0       0
-70        0     0     0    0      0    0         0        0        0        0        0       0
-65        0     0     0    0      0    0         0        0        0        0        0       0
-60        0     0     0    0      0    0         0        0        0        0        0       0
-55        0     0     0    0      0    0         0        0        0        0        0       0
-50        1     0     0    0      1    0     0.008333     0        0        0    0.008333    0
-45        0     1     0    0      0    0         0    0.008333     0        0        0       0
-40        0     1     0    0      1    0         0    0.008333     0        0    0.008333    0
-35        1     0     0    0      0    0     0.008333     0        0        0        0       0
-30        1     0     0    0      0    0     0.008333     0        0        0        0       0
-25        3     1     0    1      1    0       0.025 0.008333      0    0.008333 0.008333    0
-20        2     4     2    1      1    0     0.016667 0.033333 0.016667 0.008333 0.008333    0
-15        5     5     5    0      4    0     0.041667 0.041667 0.041667     0    0.033333    0
-10        7     7     8    10     6    7     0.058333 0.058333 0.066667 0.083333 0.05 0.058333
-5       15    13    16    18    13    7       0.125 0.108333 0.133333 0.15 0.108333 0.058333
0       22    24    31    30    28    31    0.183333    0.2   0.258333 0.25 0.233333 0.258333
5       25    30    29    34    24    54    0.208333 0.25 0.241667 0.283333        0.2    0.45
10       14    18    21    20    28    17    0.116667 0.15       0.175 0.166667 0.233333 0.141667
15       13     8     5    3      7    3     0.108333 0.066667 0.041667 0.025 0.058333 0.025
20        9     4     2    2      4    1       0.075 0.033333 0.016667 0.016667 0.033333 0.008333
25        0     3     1    0      1    0         0      0.025 0.008333      0    0.008333    0
30        1     1     0    1      1    0     0.008333 0.008333     0    0.008333 0.008333    0

total    120    120   120   120   120   120      1        1        1         1      1        1
Shape of the Distribution
monthly stock price change
HPQ                                                   T
INTC                            40                                                40
30                                                30                                                30
20                                                20                                                20
10                                                10                                                10
0                                                  0                                                 0
-80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30        -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30        -80 -70 -60 -50 -40 -30 -20 -10 0   10 20 30

SO
VZ                                                                               60

40                                                                    CAT                           50

30                                                30                                                40

20                                                20                                                30

10                                                10                                                20

0                                                  0                                                10
-80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30        -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 0
-80 -70 -60 -50 -40 -30 -20 -10 0   10 20 30

Data source: Yahoo.com.
Data: monthly stock price changes from July 1999 – July 2009
p.d.f.
0.5

0.45

0.4

0.35

0.3                                                                                                       INTC
HPQ
0.25                                                                                                       T
VZ
0.2                                                                                                       CAT
SO

0.15

0.1

0.05

0
-80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10   -5   0   5   10   15   20   25   30
• In the case of stocks, spread is a measure of
risk and should be captured in the option
pricing
Difference      xi  x

Sum of Squares   i
( xi  x ) 2

Sample Variance                                                Population Variance
n

 (x  x)
N


2
i                                                         ( xi   ) 2
S2    i 1
2    i 1
n 1                                                            N
Sample Standard Deviation                                   Population Standard Deviation
n                                                           N

 ( xi  x ) 2                                             ( xi   ) 2
S  S2              i 1
  2                i 1
n 1                                                       N
INTCchange                      HPQchange                        Tchange

Mean                   -1.27715 Mean                   -0.51072 Mean                  -0.62194
Standard Error         1.308779 Standard Error          1.07527 Standard Error        0.744948
Median                 0.542984 Median                 0.592682 Median                -0.24774
Mode                    #N/A    Mode                    #N/A    Mode                   #N/A
Standard Deviation     14.33695 Standard Deviation     11.77899 Standard Deviation    8.160501
Sample Variance        205.5482 Sample Variance        138.7446 Sample Variance       66.59378
Kurtosis               7.822148 Kurtosis               2.740998 Kurtosis              0.612623
Skewness               -1.98331 Skewness               -0.91781 Skewness              -0.39071
Range                  105.4057 Range                  73.20115 Range                 45.68781
Minimum                -80.1469 Minimum                -47.0547 Minimum               -23.0226
Maximum                 25.2588 Maximum                26.14648 Maximum               22.66521
Sum                    -153.258 Sum                    -61.2867 Sum                   -74.6324
Count                       120 Count                       120 Count                     120

VZchange                         CATchange                       SOchange

Mean                    -0.48819 Mean                   0.010754 Mean                    0.831933
Standard Error          0.694313 Standard Error         0.992893 Standard Error          0.492301
Median                  0.097096 Median                  0.91357 Median                  1.030665
Mode                     #N/A    Mode                    #N/A     Mode                    #N/A
Standard Deviation      7.605816 Standard Deviation      10.8766 Standard Deviation      5.392892
Sample Variance         57.84844 Sample Variance        118.3003 Sample Variance         29.08328
Kurtosis                2.054396 Kurtosis               6.289378 Kurtosis                1.683112
Skewness                0.065181 Skewness               -1.56712 Skewness                -0.29751
Range                   54.74623 Range                  80.40594 Range                   33.39245
Minimum                 -26.5565 Minimum                  -54.464 Minimum                -13.9316
Maximum                 28.18969 Maximum                 25.9419 Maximum                 19.46083
Sum                     -58.5828 Sum                    1.290532 Sum                     99.83195
Count                        120 Count                        120 Count                       120
Coefficient of Variation
Standard Deviation        S
CV                     100%  100
Mean                x

INTC    HPQ      T       VZ     CAT      SO

CV      41.79   38.67   25.48   15.95   52.25   33.92
Skewness of the Distribution
INTCchange
INTC
Mean                       -1.27715   30
Standard Error            1.308779
Median                    0.542984    25
Mode                      #N/A
Standard Deviation        14.33695    20
Sample Variance           205.5482
Kurtosis                  7.822148    15
Skewness                   -1.98331
Range                     105.4057    10
Minimum                    -80.1469
5
Maximum                     25.2588
Sum                        -153.258    0
Count                           120
-80-75-70-65-60-55-50-45-40-35-30-25-20-15-10 -5 0 5 10 15 20 25 30

Negative, or left skewed

Mean < Median (mean is pushed to the left by “outlying” observations on the left
end of the distribution)

Skewness < 0
Tchange                                                   T
Mean                 -0.62194     35
Standard Error       0.744948     30
Median               -0.24774
Mode                   #N/A       25
Standard Deviation   8.160501
20
Sample Variance      66.59378
Kurtosis             0.612623     15
Skewness             -0.39071
Range                45.68781     10
Minimum              -23.0226      5
Maximum              22.66521
Sum                  -74.6324      0
Count                     120            -80-75-70-65-60-55-50-45-40-35-30-25-20-15-10 -5 0 5 10 15 20 25 30

This distribution is only slightly skewed to the left

Mean is slightly less than the median

Skewness measure is close to zero
Chebyshev Rule

    1 
1  2   100
 k 
The Rule:
No matter what the distribution is
the Rule defines the minimum percentage of observations that will be found
within the k standard deviations of the mean
Chebyshev Rule
INTCchange

Mean                 -1.27715
Standard Error       1.308779         Based on the Rule, 75% of all observations should be
Median               0.542984
Mode                   #N/A           found within 2 standard deviations of the mean
Standard Deviation   14.33695
Sample Variance      205.5482
Kurtosis             7.822148
Skewness             -1.98331
Range                105.4057
Minimum              -80.1469
Maximum               25.2588
Defining 75% confidence interval:
Sum                  -153.258
Count                     120
x  2  S  1.27715 2 14.33695

75% of the time the stock of INTC is likely exhibit a monthly change between
-29.95% and +28.68%

In reality, during the past decade, 116 times out of 120 has INTC stock demonstrated
growth within the two standard deviations of the mean over the past decade.
Covariance and Correlation
Stock Price Distribution
INTC       HPQ       T          VZ           CAT          SO
n

 ( xi  x )( yi  y )
INTC     103.47
HPQ       45.48    134.69
COV ( x, y )    i 1
n 1             T        18.517    56.478        43.2
VZ        19.14    32.899    28.279     24.377
CAT      -79.73    108.82    38.508     7.0726       396.84

SO          -54    21.263    -0.342     -8.726       133.16       63.539

INTC       HPQ        T          VZ           CAT          SO
INTC           1

cov(x, y )                           HPQ    0.385253          1
r                                      T      0.276967 0.740395            1
SxS y                              VZ     0.381103 0.574143 0.871444                1
CAT    -0.39346 0.470676 0.294101 0.071908                    1
SO     -0.66604 0.229844 -0.00653 -0.22172 0.838598                        1

SO – Southern Power, is strongly correlated with CAT, and negatively correlated with INTC
VZ and T are strongly correlated
? Regression ?
SUMMARY OUTPUT

Regression Statistics
Multiple R       0.871444
R Square         0.759415
Square           0.757394
Standard Error   3.250819
Observations            121

ANOVA
Significan
df         SS       MS        F       ce F
Regression                  1 3969.577 3969.577 375.6287 1.28E-38
Residual                  119 1257.571 10.56782
Total                     120 5227.148

Coefficien Standard                   Lower   Upper    Lower   Upper
ts       Error  t Stat  P-value    95%     95%     95.0%   95.0%
Intercept         -10.1602 1.88385 -5.3933 3.57E-07 -13.8904 -6.42996 -13.8904 -6.42996
VZ                1.160089 0.059857 19.38114 1.28E-38 1.041567 1.27861 1.041567 1.27861
Covariance (revisited)
N
 XY   ( X i  E[ X ]) (Yi  E[Y ]) P( X iYi )
i 1

• Measure of relationship between X and Y variables
– Zero indicates that the variables are independent

X         Y     X - E[X]    Y - E[Y]
2         4           -1.75      -0.75 0.328125
3         6           -0.75       1.25 -0.23438
6         5            2.25       0.25 0.140625
4         4            0.25      -0.75 -0.04688

E[.]         3.75      4.75                           0.1875

Positive relationship: higher values of X correspond to higher values of Y
Properties of the sum of two random
variables
E[ X  Y ]  E[ X ]  E[Y ]

Var ( X  Y )   2 X Y   2 X   2Y  2 XY
Expected Return
• Lottery sells 20 tickets with one winning ticket.
The winning ticket pays 10 dollars. What is the
expected return from purchasing one ticket?
– Probability the ticket is winning: 1/20 = 0.05
– Expected Return: Payoff times its probability
• From holding only one ticket:
Probability (Winning) times Prize
0.05 * \$10 = \$0.5
Expected Return II
• What if the lottery has three winning tickets:
– One ticket paying 10 dollars
– Two tickets paying 5 dollars each
• What is the expected return now?
N
E[ X ]   Pr( X i ) * X i
i 1
Expected Return of an investment
portfolio
Daily Stock Return (\$)    Weighted   Return
INTC         T         INTC          T     Expected
-0.17      -0.11       -0.119      -0.033      -0.152
0.45      -0.02        0.315      -0.006       0.309
0.02       0.08        0.014       0.024       0.038
0.73       0.08        0.511       0.024       0.535
-0.36      -0.01       -0.252      -0.003      -0.255

E[.]                       0.134       0.004    0.0938      0.0012     0.095
Weights                     0.7         0.3
Weighted Aver Return       0.095

Standard Deviation of the Portfolio Return as the Measure of Risk of the Portfolio


 p  wx   2    2
x    w y2      2
y    2wx wy xy         1/ 2
PROBABILITY
• Likelihood
• Coin has two sides: reverse, obverse
– Number of all possible outcomes of a coin toss
• 2
– Probability that obverse shows:
• One particular outcome (obverse)
– Probability = outcome/all possible outcomes = ½ = 0.5 = 50%
• Mutually Exclusive events
– Obverse OR Reverse
• Cannot occur simultaneously
• Collectively Exhaustive
– Set of events where one event must occur, an event drawn
from the set has 100% probability
• Set of events: Obverse, Reverse. One of these must occur.
Between these two events all possibilities are covered.
Growth
Category
(%)      INTC   Probability
p.d.f.

-80        1       0.008333                                                             Probability that the stock of Intel will lose

Probability that the stock of intel will lose
-75        0              0
-70        0              0
at least 20 % would be: 7.5%
-65        0              0
-60        0              0                                                             What is the probability that the stock of Intel

value during a month
-55        0              0
-50        1       0.008333                                                             will gain between 5 and 15%?
-45        0              0
-40        0              0
-35        1       0.008333
-30        1       0.008333
-25        3          0.025
-20        2       0.016667
-15        5       0.041667
-10        7       0.058333
-5       15          0.125
gain value during a
Probability that the

0       22       0.183333
stock of Intel will

5       25       0.208333
10       14       0.116667
month

15       13       0.108333
20        9          0.075
25        0              0
30        1       0.008333

total    120
Conditional Probability
What is the probability that the stock of INTC will increase after falling?

Possible States of the World (outcomes, total outcomes 120, 58 declines and 62 increases)
Joint Probabilities:
1)         Stock declines after an increase 34 outcomes; 28.33%
2)         Stock declines after a decline 24 outcomes; 20%
3)         Stock increases after an increase 27 outcomes; 22.5%
4)         Stock increases after a decline 35 outcomes; 29.17%

Conditional Probability (applied to the subset on which the conditioning is being made):
Pr(A|B) = Pr(A and B)/Pr (B)
Probability that the stock of Intel will increase after falling in the previous
month is: Pr(Increase|Decline) = 35/58 = 60.3%

Pr(Decline|Decline) = 24/58 = 41.4%

Pr(Decline) = 58/120 = 48.3%

Pr(Increse) = 62/120 = 51.7%

Two Events are Independent if: Pr(A|B) = Pr(A)
We can argue that in the case of INTC, the behavior of the stock in a given month depends
on its behavior in the prior month.
Multiplication Rule
Pr( A and B)
Pr( A | B) 
Pr( B)

Solve for the joint probability

Pr(A and B)  Pr(A | B) Pr(B)

Pr(Increase and Decline)  29.17
Pr(Decline)  49.2
Pr(Increase | Decline)  59.3
Multiplication Rule for INDEPENDENT
EVENTS

Events A and B are statistically independent iff Pr(A|B) = Pr(A)

Multiplication rule simplifies:

Pr(A and B) = Pr(A|B) Pr(B) = Pr(A) Pr(B)
N
Where N represents N mutually exclusive and
Pr( A)   Pr( A | Bi ) Pr( Bi )
i 1
Collectively exclusive events
Pr(INTC increases) = Pr(increase|decline) Pr(decline) + Pr(increase|increase) Pr(increase)

62/120               =   ( 35      / 58 )    (58/120) + ( 27      /    62 )    ( 62/120)

Pr(A and B)  Pr(A | B) Pr(B)
Bayer’s Theorem

Pr(A and B)  Pr(A | B)Pr(B)
Conditional                          Pr( A and B)
Pr( A | B) 
Probability                              Pr(B)

Pr( A and B) Pr( A | B) Pr( B)
Similarly:    Pr( B | A)                
Pr(A)         Pr( A)

Note:         Pr(A)  Pr(A | B) Pr(B)  Pr(A | C ) Pr(C )

Pr(A | B)Pr(B)
Combining these:       Pr(B | A) 
Pr( A | B) Pr( B)  Pr( A | C ) Pr( C )
Bayes’ Theorem

•    Consider the following scenario: on average the stock of INTC posts monthly
declines 40% of the time. The stock is rated by a number of analysts. In the past
when the analysts assigned INTC stock accumulate rating they were correct 70% of
the time (the stock posted a monthly increase). However, in the remaining 30% of
the time, the stock declined. Recently, the stock of INTC again received the average
rating of accumulate, what is the probability that the stock will increase given this
rating?
• Pr(Increase) = 60%
• Pr(Decline)=40%
• Pr(Accumulate|Increase)=70%
• Pr(Accumulate|Decline)=30%
Pr(• The |Question is 
Pr( Accumulate | Increase ) Pr( Increase )
Increase Accumulate) what is Pr(Accumulate|Increase)=?
Pr( Accumulate | Increase ) Pr( Increase )  Pr( Accumulate | Decline) Pr( Decline)

0.7 * 0.6
 0.75
0.7 * 0.6  0.3 * 0.4
Practice example
• The student in the past failed every on
average every fifth exam he took.
Furthermore, 50% of the time when he failed
his exams he studied hard for them. However,
in 90% of exams that he passed he also
studied hard for them. For the upcoming exam
the student has been studying hard, what is
the probability that he will pass this exam?
Solution
• Pr(F) = 0.2
• Pr(P) = 0.8
These two are mutually exclusive and exhaustive events

•     Pr(S|F)=0.5 Half of the time when the exam was failed, the student had been studying
•     Pr(S|P)=0.9, 90% of the time when passing occurred, the student had been studying
•     Pr(P|S)=?

Pr( S | P) Pr( P)                   0.9 * 0.8        0.72
Pr( P | S )                                                              
Pr( S | P) Pr( P)  Pr( S | F ) Pr( F ) 0.9 * 0.8  0.5 * 0.2 0.82
Discrete Random Variable
• Discrete Variable – variables generated from a
counting process
- Enrollment in class
- Enrollment in econ 101 depending on the time of day it
is offered
- Number of accidents on Buffalo highways on a
given day.
- Daily number of listings on eBay of a given item
- These variables are numerical but NOT
continuous
Characteristics of The Distribution
• Each observed value has its own probability.
number of         number of      Number of
Day    listings   Day    listings       Listings       Count   Probability
1       12       4         8              8            1        0.05
2       15      12         9              9            1        0.05
3       10       3        10             10            3        0.15
4        8       7        10             11            2         0.1
5       14      17        10             12            5        0.25
6       17      10        11             13            2         0.1
7       10      18        11             14            3        0.15
8       14       1        12             15            1        0.05
9       12       9        12             16            1        0.05
10       11      11        12             17            1        0.05
11       12      15        12
12        9      19        12
13       16      16        13
14       14      20        13
15       12       5        14
16       13       8        14
17       10      14        14
18       11       2        15
19       12      13        16
20       13       6        17

N

Computing the average,         x     i                   N
mean (expected value)        i 1
 E[ x]   xi Pr( xi )
N                    i 1
Variance of a Discrete Random Variable:

N
   ( xi  E[ xi ])2 Pr(xi )
2

i 1

Standard Deviation:

  2
Binary (“dummy”) Variables

• Assume two values only, usually 0, 1
– Good weather =1 if good, =0 otherwise
– Winning/losing the lottery
•
Possible Combinations weather in
Consider a simple setup: On average 80% of the time we have good
summer time. Given that, what is the probability that next weekend we will have
two good weather days and one bad weather day?
•       ASSUMPTION: weather is independent from day to day! That is the probability of
good weather on any day is 0.8, independent of the weather on the previous day
•       One possibility: Friday – good; Saturday – good; Sunday – bad. What is the
probability of that?
– Pr(Friday=1)*Pr(Sat=1)*Pr(Sun=0)=0.8*0.8*0.2=0.128 * refer to Pr(“”=1) = p, Pr(“”=0)=(1-p)
•     p – probability of success
– The above is just a particular draw.
•       Another possibility: Friday – bad; Saturday – good; Sunday – good and etc….
Using Factorials to determine the number of possible combinations
Drawing X objects from a set of n objects

n!
n CX 
X !(n  X )!
3!      3  2 1
3 C2                       3
2!(3  2)! 2 1 (1)
Fri        Sat        Sun


G          G           B        p  p  (1  p)  p 2 (1  p)1                              Pr(2G)=0.128*3
G          B           G        p  (1  p)  p  p (1  p)
2        1       p X  (1  p ) n  X
B          G           G       (1  p)  p  p  p 2 (1  p)1
Simple Practice
• What is the probability that we will have two
• What is the probability that we will have AT
LEAST two bad days in a weekend?
• What is the probability that two consecutive
days in will be good?
Example 2
• What if there are only about 20 people in the
US who are used as referees by journals that
publish papers in a certain area of research?
Each journal assigns 2 referees to a paper.
What if you consider submitting two papers,
one to each journal, what is the probability
that at least one of the referees will be the
same?
Binomial Distribution
n!
Pr( X )               p X (1  p) n  X
X !(n  X )!

Pr(X) – probability of X successes drawn from the sample of n observations with
p representing the probability of success of each observation

Assumptions:
Probability p stays constant with each draw (effectively this assumes that the sample is
large)
Outcomes are independent
The variable is binary

  E[ X ]  np
   2  np(1  p)
Binomial Distribution Review Example
• Let’s say that historically it so happens that in
the first 10 days of September it only rains for
two days. What is the probability that it will
not rain during the 4 day labor day weekend?
– Note how the probability simplifies to the simple
multiplication of individual probabilities
• What is the probability that it will not rain on
any three of the four days of the weekend?
Continuous Distributions
Unsorted    Sorted
UNIFORM DISTRIBUTION
X           X             X    Count   PDF         CDF
2             0    0     2      0.047619    0.047619
4             0    1     2      0.047619    0.095238
3             1    2     2      0.047619    0.142857
1             1    3     2      0.047619    0.190476
min max
5
7
2
2
4
5
2
2
0.047619
0.047619
0.238095
0.285714   
8             3    6     2      0.047619    0.333333            2
6             3    7     2      0.047619    0.380952
9             4    8     2      0.047619    0.428571
10             4    9     2      0.047619     0.47619       (max  min) 2
0
12
5
5
10
11
2
2
0.047619
0.047619
0.52381
0.571429
 
2

11             6   12     2      0.047619    0.619048
12
15             6   13     2      0.047619    0.666667
13             7   14     2      0.047619    0.714286
14             7   15     2      0.047619    0.761905
20             8   16     2      0.047619    0.809524
1
18             8   17     2      0.047619    0.857143
PDF  f ( x)             , x  [min, max]
max min
19             9   18     2      0.047619    0.904762
17             9   19     2      0.047619    0.952381
16            10   20     2      0.047619           1
2            10
4            11
3            11
1            12
5            12
7            13
8            13
6            14
9            14
10            15
0            15
12            16
11            16
15            17
13            17
14            18
20            18
18            19
19            19
17            20
16            20
Normal Distribution
• Sometimes called Gaussian Distribution
• The Bell Curve
– Symmetrical
– Greater mass at the center
• PDF diminishes as you move away from the center
• Measures of central tendency are equal
– Mean, median, mode
• Interquartile Range is 4/3 standard deviations
– 50% of all observations are contained between mean plus
or minus 2/3 standard deviation
• Infinite Range
Why Study Normal Distribution

• Central Limit Theorem Property
• Extensive use of the Normal Distribution in
Statistics/Econometrics
Standardizing the Normal Distribution
Normal Distribution
( x )2
1      
P.D.F. (X)           f ( x)       e       2 2

 2
X ~ N (, 

Standardized Normal Distribution
X 
Z

1 1Z 2
f (Z )       e 2
2
P.D.F. (Z)

Z ~ N (0,1)
Computing Normal CDF, CDF as the
probability
• In Excel use NORMDIST
– NORMDIST(X, , , cumulative) = CDF (X)
• TRUE if cumulative is computed (CDF up to the X value)
• False if PDF is computed instead (PDF at X)
– NORMINV ( prob, ,   = X
– NORMSDIST (Z) = CDF (Z)
• Computes the CDF of the standardized normal distribution
– NORMSINV (probability) = Z
• For Standardized Normal Distribution
– Use Textbook Z tables
Example using the textbook Z-dist.
tables
• Consider that X ~ N (10, 4)
– What is the probability that we will draw at
random a value of X that is less than 8?
• Do it using the tables first
• Do it using excel
– What is the probability that we will draw at
random a value of X that is greater than 11?
– What is the probability that we will draw at
random a value of X between 7 and 9?
Simple examination of Data for
Normality
B
Period_Year Period_Month UF_RetailAs%Total

2009           8   13.15379
2009           7   13.21556
2009           6   13.20713            1 Comparison of mean, mode and
2009           5   13.09604            median
2009           4   13.29532
2009           3   13.48186
2 Kurtosis and Skewness
2009           2   13.61643            2 Quantile-Quantile Plot
2009           1    13.9316
2008          12   14.12047
2008          11    13.9948
2008          10   13.57838
2008           9   13.40963
2008           8   13.45029
2008           7   13.42442
2008           6   13.44574
2008           5    13.3493
2008           4   13.33186
2008           3   13.47661
2008           2   13.49046
2008           1    13.9232
2007          12   14.25173
2007          11     13.955
Confidence Interval
N

 ( xi   ) 2
Distribution of X                  2     i 1
N

Distribution of the means
•Samples of size n
•Distribution of the sample means
•The means is the unbiased estimator of the population mean


Standard Error of the mean
X 
n

Central Limit Theorem implies:

X   ~   N ( , X )
Confidence Interval
                         
X z             X z
n                         n

Increased sample size concentrates the distribution around the
population mean
As the sample size approaches infinity (population size) the
distributional standard error approaches zero as the sample
becomes equal the population
Confidence Interval
• General Form
•If St. Dev. is known
x
  z  x    z                                 z

•If St. Dev. is unknown
xx
x  tn 1S  x  x  t n 1S                        t
S

• Confidence Interval for the mean
•Standard Error for the distribution of the means is used

S                   S
x  t n 1         x  t n 1
n                   n
• As n increases the z and t distributions converge
• Increasing the sample size reduces the confidence interval
Sampling
• Probability versus non-probability sampling
– For instance, if Alaska represents 0.5% of the US
population, a US population sample should allocate a 0.5%
weight to Alaska, in this case, each state can be considered
as a strata, and a certain number (based on the
representation in the population) of random draws can be
drawn
– Random sample
– Stratified sample
• Dividing data (strata)
– Common characteristics, gender….
– Clustering Data
• Clusters represent the population
– Geographical sub samples
Sampling Problems
• Selection Bias
– Not all of population is represented in the frame
– Non-response in survey (omitted) bias
• Measurement Error
– Function of the sample size
• With sample size the standard error diminishes

```
To top