# Basic tools

Document Sample

```					Basic Quantitative Tools (Prof. Campbell)                               Data needed
last updated:                                                Friday, March 17, 2008
(INFERENTIAL STATISTICS -- inferring from a sample to a population using the laws of probability)

How confident can one be that the sample mean (or proportion) represents the population as a whole?
confidence interval (mean)                                      one interval variable
inverse: given a specific confidence interval, what is the needed sample size?
confidence interval (proportion)                                one nominal variable

Do differences found in a sample (a subset of the population) reflect differences in the
population as a whole? (commonly used to generalize from survey results)
chi-square                                                         two categorical variables
an interval variable divided
difference of means                                                      into two categories
a nominal variable divided
difference of proportions                                                into two categories

an interval variable divided
ANOVA (Analysis of Variance)                                   into three or more categories

What is the relationship between two variables?
correlation analysis (including an example of
ecological fallacy)                                                   two interval variables

How many total jobs are dependent on basic (export-based) jobs?
number of basic (export)
jobs, number of total jobs
Multiplier                                                (export + locally serving)

What is the relative concentration of local employment by sector?
employment (total and by
sector) for both the locality
Location Quotients                                                     and the nation

How can we estimate interaction (e.g., trade, traffic) between two cities?
population of two cities,
Gravity Model                                                       distance, constant

How do we measure growth over time?
Growth Rates (3 types)                                          population levels over time

How do we compare costs and benefits (e.g., of a project) over time?

quantified costs and benefits
Cost-benefit analysis                                           for each year, discount rate

35ba01ec-32e7-4987-bec3-a793840deefc.xls            Overview                                        10/1/2012 5:19 PM
calculate a confidence interval (with interval data)
that is, how confident are you that your sample estimate comes close to the populat
one interval variable
enter data
Data needed:
sample mean (X)                                 in yellow cells
std dev of sample (s)
sample size (n)
_
  X  t.025
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

Data   Hhd Income

1              24,000         24,000
2              36,000         36,000

3              12,000         12,000                    SO:
4              74,000         74,000                    u=                 42,000 +/-
5              46,000         46,000

6              27,000         27,000                    lower end of confidence interval
7              23,000         23,000                    upper end of confidence interval
8              69,000         69,000                    range
9             107,000        107,000
10             53,000         53,000
11             29,000         29,000
12             34,000         34,000                                              Confidence Interval
13             43,000         43,000
14             28,000         28,000
15             24,000         24,000
16             43,000         43,000

MEAN           42,000
STDEV          24,105                           1
n                   16
t                2.131

set the confidence level (2-tail)

0.05                                 -            20,000             40,000      60,000
close to the population mean?

_                s
 X  t.025
n
12,845
29,155
54,845
25,690

80,000      100,000
calculate a confidence interval
that is, how confident are you that your sample estimate comes close to the populat
one interval variable
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

MEAN 42,000                                                enter data
STDEV 5,000                                                in yellow cells
n        384
SO:
t                        1.966                              u=             42,000
set the confidence level (2-tail)                           lower end of confidence interval
0.05                                  upper end of confidence interval
range
Confidence Interval

1

-          20,000          40,000
e comes close to the population mean?
data (mean, std dev., n)

_            s
  X  t.025
n

+/-                       502
f confidence interval 41,498
of confidence interval 42,502
1,003
Confidence Interval

40,000           60,000   80,000     100,000
calculate a minimum sample size need to achieve a specific confi
that is, how confident are you that your sample estimate comes close to the populat
one interval variable
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

enter data
in yellow cells
MEAN                       42,000
STDEV                       5,000
c (confidence interval range) 500
SO:
t                                                       1.960                              u=
set the confidence level (2-tail)                                                          lower end of co
0.05                                  upper end of co
range
given values of stdev and c and confidence level, we calculate "n":
sample size needed                                   384

1
NOTES:
1. For the value of "t", we simply
assumed a large sample size (t --> Z),
e.g., for 95% confidence interval (2-
-          20,000
tailed), t = 1.96.
2. We are also assuming a large
population size (M), so that N/M --> 0.
a specific confidence interval range
close to the population mean?

here is the formula to calculate a confidence

_                   s
  X  t.025
n
…solving for n (sample size)

42,000 +/-                           500
t.025 s
ower end of confidence interval 41,500                                     n 
upper end of confidence interval 42,500
1,000
                                     c
Confidence Interval
…leads to this equation (so, to estima
size, you need to know Stdev, the con
and the value of t.

t.025 s 2
                                  n(        )
c
20,000      40,000     60,000        80,000     100,000


calculate a confidence interval

s
 t.025
n
mple size)

t.025 s

c
equation (so, to estimate sample
to know Stdev, the confidence interval,

t.025 s 2
(        )
c
calculate a confidence interval using proportions (nominal data)
for large n
one nominal variable (proportions)
the population proportion is    p
enter data
  P 1.96
Data needed:
sample proportion (P)                in yellow cells
sample size (n)

set the confidence level (2-tail)

0.05
P                    50%                  SO:
n                    100                  p              0.500 +/-

lower end of confidence interval
t                       1.984             upper end of confidence interval
range

Confidence Interval

1

0%    10%   20%   30%     40%   50%   60%     70%
(nominal data)

P(1  P)
 P 1.96
n
in percent
0.099      9.9%

0.401     40.1%
0.599     59.9%
0.198     19.8%

70%   80%     90%   100%
Chi-Square

CHI-SQUARE TEST (EXCEL: FUNCTION)                          does the distribution of ou
from a random distribution
ACTUAL (OBSERVED)
city suburb                 rural
strong       2      1                     1            4         enter data
medium       1      2                     1            4         in yellow cells
weak         1      1                     2            4
4      4                     4           12

PREDICTED/EXPECTED (based on mutiplying row and column to

city suburb            rural
strong     1.3333 1.3333            1.3333             4
medium     1.3333 1.3333            1.3333             4
weak       1.3333 1.3333            1.3333             4
4      4                4            12

Chi-square test (Calculated by Excel): "CHITEST"

###      (probability of this sample outcome if no difference in population)
range: 0 to 1
Difference between predicted and actual

city suburb  rural
strong    -0.6667 0.3333 0.3333                        0
medium     0.3333 -0.6667 0.3333                       0
weak       0.3333 0.3333 -0.6667                       0
0      0      0                       0

Page 11
Chi-Square

distribution of outcomes (observed) significantly differ
andom distribution (expected)?

nter data 20           8       20
yellow cells
16        12       12
12        16        4

and column totals)

2
( fo  fe )
  2
fe
fo       observed frequencies

fe
fe       expected frequencies

Page 12
The t distribution is used for hypothesis testing with small samples (e.g., smaller than about 100 cases)
the t distribution is similar to the z distribution, but is "flatter" because of the smaller sample size.
When the sample size gets large (e.g., over 50-100), the t distribution approaches that of the Z distribution (a normal c

d.f.                                                                    5          10       50    1000
tails                                                                   2           2
Probabilities 2and t-scores for various degree
2

0
1.000
1.000         1.000    1.000   1.000
test)
0.1                                          0.950
0.924          0.922          0.921         0.920
0.2                                             0.849          0.845          0.842         0.842
0.900
0.3                                             0.776          0.770          0.765         0.764
0.4                                             0.706
0.850             0.698          0.691         0.689
probability of this outcome if no difference in population

0.5                                             0.638          0.628          0.619         0.617
0.6                                             0.575
0.800             0.562          0.551         0.549
0.7                                             0.515          0.500          0.487         0.484
0.8                                          0.750
0.460          0.442          0.427         0.424
0.9                                             0.409          0.389          0.372         0.368
0.700
1                                             0.363          0.341          0.322         0.318
1.1                                             0.321
0.650             0.297          0.277         0.272
1.2                                             0.284          0.258          0.236         0.230
1.3                                             0.250
0.600             0.223          0.200         0.194
1.4                                             0.220          0.192          0.168         0.162
1.5                                          0.550
0.194          0.165          0.140         0.134
1.6                                             0.170          0.141          0.116         0.110
0.500
1.7                                             0.150          0.120          0.095         0.089
1.8                                             0.132
0.450             0.102          0.078         0.072
1.9                                             0.116          0.087          0.063         0.058
2                                             0.102
0.400             0.073          0.051         0.046
2.1                                             0.090          0.062          0.041         0.036
2.2                                          0.350
0.079          0.052          0.032         0.028
2.3                                             0.070          0.044          0.026         0.022
0.300
2.4                                             0.062          0.037          0.020         0.017
2.5                                             0.054
0.250             0.031          0.016         0.013
2.6                                             0.048          0.026          0.012         0.009
2.7                                             0.043
0.200             0.022          0.009         0.007
2.8                                                             level                        as the
0.038 the 0.050.019 is by convention used0.005 threshold of
0.007
2.9                                          0.150
0.034 statistical significance (though sometimes we use an even more
0.016          0.006         0.004
3                                                            0.013          0.004         0.003
0.030 strict level, such as 0.01 or even 0.001
0.100

0.050

0.000

0                   0.5                       1                   1.5
standardized sample differences (t
bout 100 cases)

the Z distribution (a normal curve)

r various degrees of freedom (two-tail
test)

Degrees of freedom

5
10
50
1000

the larger the sample size, the lower the
value of the critical t ...

… when the sample size gets large (e.g.,
over 50 - 100), then the critical t level (.05,
2 tail) approaches 1.96

2                          2.5       3
differences (t-scores)
difference of means

Small Standard Deviation                                                      Larger Standard Deviation

Factor                50                 45                                Factor         80          75
Case       Male Income      Female Income                                                Female Income
CaseMale Income
1            69,000             49,000                                     1     40,000      72,000
2            77,000             67,000                                     2     42,000      34,000
3            46,000             69,000                                     3     83,000      65,000
4            59,000             64,000                                     4    100,000      34,000
5            55,000             30,000                                     5    100,000      86,000
6            50,000             68,000                                     6     86,000      86,000
7            38,000             73,000                                     7    104,000      67,000
8            63,000             61,000                                     8     70,000      64,000
9            50,000             61,000                                     9     37,000      79,000
10             56,000             48,000                                   10      62,000      78,000
11             74,000             72,000                                   11      88,000      85,000
12             50,000             57,000                                   12      72,000      83,000

Mean             57,250               59,917                             Mean             73,667            69,417
Std Dev.            11,702               12,428                          Std Dev.            24,092            18,372

female                                                         female
mean
mean

Male
mean

1
Male
1
mean
-        20,000     40,000   60,000      80,000   100,000             -       20,000    40,000     60,000    80,000    100,000

t-Test: Two-Sample Assuming Equal Variances                                   t-Test: Two-Sample Assuming Equal Variances

Male Income        Female Income                                           Male IncomeFemale Income
Mean                         57250        59916.66667                         Mean               73666.6667 69416.6667
Variance              136931818.2         154446969.7                         Variance            580424242 337537879
Observations                     12                12                         Observations                12         12
Pooled Variance       145689393.9                                             Pooled Variance458981061
Hypothesized Mean Difference 0                                                Hypothesized Mean Difference 0
df                               22                                           df                          22
t Stat                       -0.541                                           t Stat                   0.486
P(T<=t) one-tail
t Critical one-tail
0.297
1.717
fail to                             P(T<=t) one-tail
t Critical one-tail
0.316
1.717
fail to
P(T<=t) two-tail
t Critical two-tail
0.594
2.074
reject                              P(T<=t) two-tail
t Critical two-tail
0.632
2.074     reject H
Data Needed:

Page 15
difference of means

number of cases for each of the two groups
sample means for the two groups
standard deviation for each group

Hypothesis (no difference between the two population means):

1  2
i.e., 1   2  0
How to calculate t (note; EXCEL will do this all for you -- so do don't need to really use this formula)

_        _
(X1  X2 )  ( 1   2 )
t
 _ _
ˆ
X 1 X 2

SINCE WE HYPOTHESIZE U1=U2, OR U1 -U2 = 0, then the (u1-u2) drops out of the numerator of the equation for t

_         _
(X 1  X 2 )
t
 _ _
ˆ
X1 X 2

the formula for the standard error (the denominator of the equation for t)

 12  22
 X1X2           
N1 N2
if we can assume the same standard deviation of the populations ("equal variance")

Page 16
difference of means

N1  N 2
 X1X 2  
N1 N 2

Page 17
difference of means

Bigger DOM

Factor         80          70
Female Income
CaseMale Income
1     37,000      55,000
2    105,000      40,000
3     52,000      82,000
4     58,000      97,000
5    107,000      56,000
6    105,000      44,000
7     96,000      39,000
8     86,000      70,000
9     82,000      77,000
10     104,000      79,000
11      71,000      44,000
12     100,000      47,000

Mean          83,583          60,833
Std Dev.         23,922          19,441

female

mean

Male                                        mean
1
100,000    120,000       -      20,000   40,000   60,000      80,000    100,000   120,000

g Equal Variances             t-Test: Two-Sample Assuming Equal Variances

emale Income                                   Male IncomeFemale Income
Mean               83583.3333 60833.3333
Variance            572265152 377969697
Observations                12         12
Pooled Variance475117424
Hypothesized Mean Difference 0
df                          22
t Stat                   2.557
fail to                     P(T<=t) one-tail
t Critical one-tail
0.009
1.717
reject H0
reject H0                   P(T<=t) two-tail
t Critical two-tail
0.018
2.074

Page 18
difference of means

r of the equation for t

Page 19
Diff of Proportions

A special case of the difference of means test

Do you Own a Car? 1= yes, 0=no
Percent of Residents Who Own a Car
Case                      S
City Residents uburban Residents
1            0            0
100.0%
2            0            0
3            0            1                                  90.0%
4            0            1
5            0            1                                  80.0%
6            0            1
7            0            1                                  70.0%
8            1            1
60.0%
9            1            1
10             1            1                                  50.0%
11             1            1
12             1            1                                  40.0%
13             1            1
14             1            1                                  30.0%
15             1            1
16             1            1                                  20.0%
17             1            1
10.0%
18             1            1
19             1            1                                   0.0%
20             1            1                                               City Residents
21             1            1
22             1            1
Mean               68.2%        90.9%
n of cases            22           22                                 The central question: does the differen
degrees of freedom (n1 +n2-2)                     42                  an actual difference among the entire p
for simplicity and conservatism, we could have
hypothesis]
also assumed that the population proportions are 50% and 50%
Alternative: the difference is due mere
t-score
Numerator:        -22.7%
variation, and that there is no difference
pu               0.79545              see Blalock, p. 234             [the null hypothesis]
sqrt(pu,qu)      0.40337
denominator=     0.12162
Remember: generally if |t| >2 (i.e., if t < -2 or t >
t-score        -1.86871
2), then it is "statistically significant" at the .05 level.
Prob-t         0.068649    That is, there is less than a 5% chance that one could get
this difference in the sample drawn from a population
where there is no difference between city and suburban

Page 20
Diff of Proportions

cent of Residents Who Own a Car

Suburban Residents

on: does the difference found in the sample reflect
ce among the entire population? [the research

difference is due merely to random sample
there is no difference in the population as a whole.

2 or t >
he .05 level.
at one could get
a population
and suburban

Page 21
Diff of Proportions (2)

Here, if given just the mean, n of cases

Mean       10.0% 20.0%
n of cases 150       120
degrees of freedom (n1 +n2-2) 268

t-score
Numerator:-10.0%
pu        0.14444          see Blalock, p. 234
0
sqrt(pu,qu) .35154
0.04305
denominator=

t-score   -2.3226

Prob-t     0.0209

NOte that as the mean values deviate from 50%, we can
be more accurate:
e.g., compare 10% to 20%, vs. 40% to 50%
or 80% 90%

Page 22
ANOVA

AUTO MILES DRIVEN PER WEEK

Case                                      Rural
City ResidentsSuburban ResidentsResidents                     City ResidentsSuburban Residents
1              20          50           40                                    0          17
rural 0        mean
2               0          80           50                                               23
3              50          90           60                                   12          24
4             100         350           70                                   18          50
5              70         240           80                                   18          60
6              35         120           90                                   20          65
7              12          90          100                                   24          70
8             150          80          100                                   35          80
9             120          70           20                                   35          80
10               0          60           30                                   42          85
11              18          90           40                                   50          90
mean
12              35         111           50                                   66          90
13              42         122           60                                   67          90
14              67         133          250                            suburban
70          96
15              95         144          170                                   75         111
16              66         155          120                                   77         120
17              77          96          150                                   95         122
18             123          23          170                                  100         133
19               0          65          180                                  120         144
20              18          24          111                                  123         155
21              24          17          130                                  150         240
22              75          85           75                             urban200         350
Mean                     54.4       104.3         97.5                                 54.4mean 104.3
1
0         50       100
AUTO MILES DRIVEN PER W
Anova: Single Factor

SUMMARY
Groups       Count             Sum           Average        Variance
City Residents        22             1397             63.5    2672.64286
Suburban Residents    22             2295      104.318182     5471.65584
Rural Residents       22             2146      97.5454545     3391.11688

ANOVA
Source of Variation SS                df            MS              F             P-value         F crit
2
Between Groups 1054.6364                  2    10527.3182     2.73782547       0.07241657    3.14280868
Within Groups 242243.727                 63    3845.13853

Total            263298.364              65

Why use ANOVA?
In situations where you are comparing the means from more than two groups.
since in a difference of means test, you compare x2-x1.
For more than two groups, you can't compare x3-x2-x1.
so you look at the variation (sum of squares) within vs. between groups.
Intuitively, sample groups with low internal variation, but high variation across groups, will
likely represent real differences in the population as a whole.

Page 23
ANOVA

While sample groups with high internal variation and low variation across groups have
a greater chance of representing populations with no real differences.

Anova: Single Factor

SUMMARY
Groups       Count           Sum          Average       Variance
City Residents        22           1197     54.4090909    1890.82468
Suburban Residents    22           2295     104.318182    5471.65584
Rural Residents       22           2146     97.5454545    3391.11688

ANOVA
Source of Variation SS              df           MS             F          P-value           F crit
3
Between Groups 2248.5758                2   16124.2879    4.49829595    0.01492455      3.14280868
Within Groups 225825.545               63   3584.53247

Total          258074.121              65

Page 24
ANOVA

Rural Residents         level
20                          1         1.1         1.2
30                          1         1.1         1.2
40                          1         1.1         1.2
40                          1         1.1         1.2
50                          1         1.1         1.2
50                          1         1.1         1.2
60                          1         1.1         1.2
60                          1         1.1         1.2
70                          1         1.1         1.2
75                          1         1.1         1.2
mean           80                          1         1.1         1.2
90                          1         1.1         1.2
100                          1         1.1         1.2
100                          1         1.1         1.2
111                          1         1.1         1.2
120                          1         1.1         1.2
130                          1         1.1         1.2
150                          1         1.1         1.2
170                          1         1.1         1.2
170                          1         1.1         1.2
180                          1         1.1         1.2
250                          1         1.1         1.2
97.5                          1         1.1         1.2
150       200     250       300       350         400
AUTO MILES DRIVEN PER WEEK

SSbe twe en d . f .
F  SSw ithin d. f .
SSbetween  sum of squares between the groups
SSwithin  sum of squares within the groups
d.f. = degrees of freedom
Page 25
SSbetween  sum of squares between the groups
SSwithin  sum of squares within the groups
ANOVA

d.f. = degrees of freedom

Page 26
Case     x     y     1
Case     x
1     0     0                                                                           1 0.2
2 0.1     0.1    0.9
2 0.3
3 0.2     0.2    0.8
3 0.4
4 0.3     0.3                                                                           4 0.4
0.7
5 0.4     0.3                                                                           5 0.5
6 0.5     0.4    0.6                                                                    6 0.5
7 0.6     0.5                                                                           7 0.5
0.5
8 0.7     0.6                                                                           8 0.5
9 0.8     0.7    0.4                                                                    9 0.6
10 0.8     0.8    0.3
10 0.6
11 0.9     0.9                                                                          11 0.7
12     1   0.9    0.2
12 0.8
correlation+0.99    0.1                                                                 correlation
F          ### 0 0 0.1 0.2 0.3                  0.4   0.5   0.6   0.7   0.8   0.9   1
F
p value      0.00                                                                       p value
correlation (range: -1 < r < +1)
1
Case     x      y                                                                       Case     x
0.9
1     0      1                                                                          1     0
2 0.1      0.9   0.8
2 0.1
3 0.2      0.8   0.7                                                                    3 0.2
4 0.3      0.7   0.6
4 0.3
5 0.4      0.6                                                                          5 0.4
0.5
6 0.5      0.5                                                                          6 0.5
7 0.6      0.3   0.4                                                                    7 0.6
8 0.7      0.4   0.3                                                                    8 0.7
9 0.8      0.2   0.2
9 0.8
10 0.8      0.2                                                                         10 0.8
0.1
11 0.9      0.1                                                                         11 0.9
12     1      0     0                                                                   12     1
0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
correlation -0.99                                                                       correlation
F          ###                                                                          F
p value     0.00                                                                        p value
y
1
0.5
0.4   0.9

0.6   0.8
0.4
0.7
0.8
0.3   0.6

0.5
0.5
0.6
0.3   0.4

0.8   0.3
0.2
0.2
0.5
-0.09   0.1

0.1     0
0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
0.79

1
y
0.9
0
0.2   0.8

0.4   0.7

0.6   0.6
0.8
0.5
1
1   0.4

0.8   0.3
0.6
0.2
0.4
0.1
0.2
0     0
0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
+0.07
0.0
0.84
Case         x        y                       -0.4 correlation (range: -1 < r < +1)
1    0.8406    0.372
2    0.4639   0.2545
3     0.069   0.8191              1

4     0.327   0.9153
5    0.9635   0.7406             0.9
6    0.1542   0.4803
7    0.0266    0.627
8    0.3561   0.4515             0.8

9    0.7929   0.3996
10    0.3435   0.5974             0.7
11    0.8348   0.1966
12     0.297   0.3313
0.6

using a random number
generator                             0.5

F         1.4                         0.4
p-value 0.264
P-Value: this is the probability     0.3

that the x-y relationship found in
the sample cases -- expressed as     0.2
an r-value -- is simply due to
random variation, and that if one
looked at the population as a        0.1
whole, there would be no
relationship. If p<.05, we
0
generally conclude that the                0   0.1   0.2   0.3    0.4    0.5   0.6
Hit "recalculate now" to
see a new set of numbers
(note: on a MAC this is "COMMAND ="
nge: -1 < r < +1)

0.6     0.7   0.8   0.9   1
r           -0.95 # # # # # # # # # # # # # # # # # # 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
F=       92.5641 # # # # # 7 6 4 3 3 2 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 4 6 7 # # # #
sign F   2.3E-06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
n              12

Comparing values of correlation coefficient r (with a range of
to +1) from a sample of size n and the corresponding probabililty
of its outcome if no relationship in the population as a whole

1
ABOVE the .05 line: relationship 0.95
NOT statistically significant at the
0.9
0.05 level
0.85
Probability of this outcome (based on the F-test)

0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45                                BELOW the .05 line:
relationship is statistically
0.4
significant at the 0.05 level
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-1   -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1         0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8
Value of r
11
##
00

a range of -1
probabililty                   Note: as r gets farther
as a whole                       away from zero, both the
strength of the
relationship and the
statistical significance
increase.
Also: as the sample
size (n) increases, the
statistical significance
increases.

As a result: If you want
to demonstrate a
statistically bivariate
relationship, you will
need either an r value
that is far from zero
LOW the .05 line:                and/or a large sample
ationship is statistically
nificant at the 0.05 level

red line:
critical
value: .05
0.8   0.9      1
percent of
hhd trips                  "The ecological fallacy consists in thinking that re
using public              invisible invisible for groups necessarily hold for individu
observed
case     city       transit  hhd income factor     factor     constant1                0 range 0 to 1
source: "Ecological Inference and the Ecological Fallacy"
1       A           63% \$     57,873        0.5     80000 constant2                 0 range 0 to 1
David A. Freedman (Department of Statistics, University of California,
2       A           64% \$     57,635        0.5
Prepared 80000 constant3                   3 range high to 1
for the International Encyclopedia of the Social & Behaviora
3       A           61% \$     58,529        0.5     80000
Technical Report No. 549, 15 October 1999.
4       A           59% \$     59,498     pdf file accessed Jan. 13, 2002, http://www.stanford.edu/class/ed260/f
0.5     80000
5       A           75% \$     53,915        0.5     80000
6       A           69% \$     55,713        0.5     80000
7       A           51% \$     61,994        0.5     80000
8       A           69% \$     55,980        0.5     80000
9       A           68% \$     56,268        0.5     80000
10       A           60% \$     59,015        0.5     80000
11       B           44% \$     59,656        0.4     75000
12       B           72% \$     49,910        0.4     75000                      Scatterplot; unit of analysis: in
13       B           59% \$     54,211        0.4     75000
14       B           66% \$     51,803        0.4     75000
one sees patterns in the indiv
Scatterplot; unit of analysis: indivi
15   This example: \$
B           71%       50,100        0.4     75000
16   Is there a relationship between use75000
B           68% \$     51,179        0.4      of                \$65,000
\$65,000
17       B           47% \$     58,553        0.4     75000
18
public transit and hhd income?
B           43% \$     59,930        0.4     75000

annual hhd income
\$60,000
\$60,000
Aggregate data\$(unit57,751  of analysis: 75000
annual hhd income
19       B           49%                     0.4
20                   68% relationship
city): positive \$
B                     51,265        0.4     75000              \$55,000
21       C           62% \$     48,269        0.3     70000          \$55,000
Individual data (unit of analysis: hhd):
22       C           61% \$     48,597        0.3     70000              \$50,000
23   negative relationship
C           49% \$     53,006        0.3     70000          \$50,000
24       C           53% \$     51,338        0.3     70000
\$45,000
25       C           51% \$     52,322
DANGER: making an ecological 70000      0.3                    \$45,000
26       C           59% \$     49,426        0.3     70000
27   fallacy -- using aggregate data to 70000
C           55% \$     50,579        0.3
\$40,000
\$40,000     0%     10%      20%
28       C           56% \$     50,352        0.3     70000                  0%     10%      20%      30%
29       C           52% \$     51,688        0.3     70000                                     Percent of hhd trips
30       C           35% \$     57,726        0.3     70000                                 Percent of hhd trips by p
31       D           39% \$     54,312        0.2     68000

32        D          53%   \$      49,461        0.2    68000   correlation                                                        -0.30
33        D          51%   \$      50,152        0.2    68000
34        D          51%   \$      50,264        0.2    68000
35        D          52%   \$      49,771        0.2    68000                                                                   Scatterplot; unit of anal
36        D          31%   \$      57,222        0.2    68000                                                         yes: cities with more transit have
37        D          50%   \$      50,403        0.2    68000
38        D          20%   \$      60,972        0.2    68000
39        D          22%   \$      60,169        0.2    68000                                                     \$58,000
40        D          25%   \$      59,079        0.2    68000
mean annual hhd income

41        E          26%   \$      53,852        0.1    63000                                                     \$57,000
42        E          11%   \$      59,151        0.1    63000
\$56,000
43        E          39%   \$      49,508        0.1    63000
44        E          14%   \$      58,168        0.1    63000                                                     \$55,000
45        E          21%   \$      55,609        0.1    63000
46        E          25%   \$      54,186        0.1    63000                                                     \$54,000
47        E          22%   \$      55,433        0.1    63000
\$53,000
mean annual hhd inc
\$53,000
48     E            37%     \$      50,148    0.1   63000
49     E            27%     \$      53,540    0.1   63000                             \$52,000
50     E            30%     \$      52,562    0.1   63000
51     F            29%     \$      49,933      0   60000                             \$51,000
52     F            24%     \$      51,559      0   60000                                       0%   10%      20%
53     F            29%     \$      49,724      0   60000                                                  Percent of hhd trips
54     F            21%     \$      52,783      0   60000
55     F             8%     \$      57,283      0   60000

56     F             9%     \$      56,859     0    60000   correlation                               +0.33
57     F             3%     \$      58,854     0    60000
58     F            27%     \$      50,609     0    60000
59     F            31%     \$      49,231     0    60000
60     F             5%     \$      58,390     0    60000

AGGREGATED DATA

percent of
hhd trips
using public
CITY         transit       hhd income
A            64%    \$       57,642
B            59%    \$       54,436
C            53%    \$       51,330
D            39%    \$       54,181
E            25%    \$       54,216
F            19%    \$       53,523
s in thinking that relationships
ly hold for individuals..."
ogical Fallacy"
s, University of California, Berkeley)
ange high to 1 & Behavioral Sciences,
of the Social

stanford.edu/class/ed260/freedman549.pdf

lot; unit of analysis: individual household
es patterns in the individual data by cities
unit of analysis: individual household

30%         40%         50%     60%     70%     80%
30%         40%         50%         60%     70%     80%
Percent of hhd trips by public transit
Percent of hhd trips by public transit

Scatterplot; unit of analysis: cities
with more transit have higher income, but...
20%       30%       40%       50%     60%   70%

Percent of hhd trips by public transit
Multiplier

Multiplier: the relationship between local and export employmen

R.O.W.
(rest of world)
Twin Peaks

Revenues from Timber                 Local
Services
Timber

Export Jobs (Basic) + Non-Export Jobs (NonBasic) = TOTAL JOBS

Imagine a simple economy of Twin Peaks, an isolated timber economy

Service Jobs                          2,000
Timber Jobs (export)                  1,000
Total Jobs                            3,000

Mutliplier                                    3.0

So, can use a multipler to estimate the impact of a change in basic em
(Up or down) on total employment.
[assumes a simple, linear relationship]

Page 37
Multiplier

Change in Total Employment
Change in Basic Employment
100       300
500      1500
1200      3600
-100      -300
-500     -1500

Page 38
Multiplier

port employment

TOTAL JOBS

d timber economy:

ange in basic employment

Page 39
Location Quotients

Location Quotient (LQ) - a measure a relative local employment concentration in a s
Used to also estimate local vs. export (I.e., non-basic vs. basic) employment
(Can also use to help understand the level of industrial diversification in a local eco

EXAMPLE: You are given data for the town of Icarus in the far-away country of Daedalus
Icarus Daedalus
Population                   20,000 2,500,000
\$17,000
Annual Gross Per Capita Income        \$25,000
Total Employment             10,000 1,000,000
Agricultural Emp.              1,000    50,000
Govt Employment                  300   100,000
Private Service Emp.           4,000   500,000
Airplane Manufacturing Emp.      700    10,000
Non-airplane Manufacturing Emp.1,000   200,000
All Other Employment           3,000   140,000

Based upon this data, which sectors of the Icarus economy likely are exporting goods or s
Estimate the share of each sector's employment that could be due to exports
(and explain how you did these estimates and the name of the technique(s) you used).
Finally, explain why these estimates may not be accurate.

Percent of Total Employment
Icarus      Daedalus       Icarus Daedalus
Population                   20,000       2,500,000
\$17,000
Annual Gross Per Capita Income             \$25,000
Total Employment             10,000       1,000,000       100%         100%
Agricultural Emp.              1,000         50,000        10%           5%
Govt Employment                  300        100,000         3%          10%
Private Service Emp.           4,000        500,000        40%          50%
Airplane Manufacturing Emp.      700         10,000         7%           1%
Non-airplane Manufacturing Emp.1,000        200,000        10%          20%
All Other Employment           3,000        140,000        30%          14%

Take the locatio quotients to estimate amount of export jobs (if any):
TOTAL JOBS = LOCAL JOBS + EXPORT JOBS

Page 40
Location Quotients

Total Jobs Local Jobs Export Jobs
Total Employment            10,000
Agricultural Emp.             1,000       500         500
Govt Employment                 300       300          -
Private Service Emp.          4,000     4,000          -
Airplane Manufacturing Emp.     700       100         600
1,000
Non-airplane Manufacturing Emp.         1,000          -
All Other Employment          3,000     1,400       1,600
TOTAL                       10,000       7,300       2,700

5,000

4,500

4,000

3,500

3,000
Export Jobs
2,500

2,000                                                       Local Jobs

1,500

1,000

500

-
Agricultural Emp.          Private Service Emp.     Non-airplane
Manufacturing Emp.

Page 41
Location Quotients

t concentration in a specific sector
employment
ication in a local economy)

ei
ay country of Daedalus.

ei = local employment in sector i

LQ  e
e = total local employment
Ei = national employment in sector i

Ei
E = total national employment

E
e exporting goods or services outside the community?
o exports
nique(s) you used).


LOCATION QUOTIENT: Ratio of Local to National Percentages

2.00 export industry
0.30
0.80
7.00 export industry
0.50
2.14 export industry

Page 42
Location Quotients

xport Jobs

ocal Jobs

airplane
ufacturing Emp.

Page 43
Gravity Model

Gravity Model
Using Newton's Universal Law of Gravitation for social processes

m1m2
F  G( 2 )
r                                                    m1
where F = force of gravity between m1 and m2
G is the universal constant
r is the distance between m1 and m2

To convert to society:
F becomes the interaction between m1 and m2 (e.g., traffic, trade, etc.
m1 and m2 become population (or employment, or GDP, etc.)
r is distance
G is a constant

Example:
Population                    Distance Interaction (e.g., car trip
G          m1        m2                  r           F
0.001      10,000     20,000                20         500
0.001      20,000     20,000                20       1,000
0.001      20,000     40,000                20       2,000
0.001      10,000     20,000                10       2,000
0.001      10,000     20,000                 5       8,000

Page 44
Gravity Model

cial processes

m1                 r
m2

GDP, etc.)

raction (e.g., car trips/day)

Page 45
3 Growth Rates

Three Growth Rates

Pn  P0 (1nr)
Simple, Linear Growth (e.g., average annual growth)

Discrete Compounded Growth (e.g., annual)

Pn  P0 (1r)
n

Compounded continuously (with exponent)
[almost the same results as discrete]

Pn  P0e
rn

where e = 2.7183....
remember than ln (e) = 1

A Comparison of these Three Growth Patterns
Discrete   Continuously
Linear     Compounded    Compounded
Po          n           r           Pn             Pn             Pn
100          0        0.05          100           100.0          100.0
100          1        0.05          105           105.0          105.1
100          2        0.05          110           110.3          110.5
100          3        0.05          115           115.8          116.2
100          4        0.05          120           121.6          122.1
100          5        0.05          125           127.6          128.4
100          6        0.05          130           134.0          135.0
100          7        0.05          135           140.7          141.9
100          8        0.05          140           147.7          149.2
100          9        0.05          145           155.1          156.8

Page 46
3 Growth Rates

100      10         0.05              150          162.9          164.9
100      15         0.05              175          207.9          211.7
100      20         0.05              200          265.3          271.8
100      25         0.05              225          338.6          349.0
100      30         0.05              250          432.2          448.2
100      40         0.05              300          704.0          738.9
100      50         0.05              350         1146.7         1218.2
100      75         0.05              475         3883.3         4252.1
100     100         0.05              600        13150.1        14841.3

16000.0

14000.0

12000.0

10000.0

8000.0

6000.0

4000.0

2000.0

0.0
0          20          40         60       80

Page 47
3 Growth Rates

Page 48
3 Growth Rates

Continuously Compounded
Discrete Compounded
Linear

100   120

Page 49
Cost-benefit

Cost-Benefit Thinking

TWO CHALLENGES:
1. how to sum up all the costs and benefits.
2. How to deal with time: discounting. --->>> time preferences.

Present value (PV) = B(t) / (1+r)t
where B(t) is the benefit in year t, r is the discount rate.

Net Present Value (NPV) = ∑ (B(t) - C(t)) / (1+r)t
where B is benefits and C is costs.

why is money worth less in the future?
1 people are impatient (and mortal)
2 opportunity cost of investing the capital elsewhere.

The argument for discounting is referred to as the 'marginal productivity of capital'

AND THE TRICK IS TO INCLUDE ENVIRONMENTAL COSTS AND BENEFITS. [99]
if ∑ (B(t) - C(t)±E(t)) * (1+r)t > 0 , then the project is a net good project.

The Problems with Discounting for the Environment
a way to shift heavy costs to future generations.

note: it is hard to shift capital costs to future generations, since lenders want payba

1 actual damage may be far larger than the discounted value.
2 long-term benefits are also not strongly valued (even though today's action
3 will lead to greater exhaustion of exhaustible resources, esp. with a high d

However: "There is, in fact, no unique relationship between high discount rates an

How to select a discount rate: simply the rate of economic growth for a nation?         t

Taking sustainability into account:

Page 50
Cost-benefit

EX: "require that any environmental damage be compensated by projects specifica

note how the r can really change the outcome, especially if costs and benefits patte
EXAMPLE

discoun discoun
Benefit Cost         Net Benefit        t rate    t rate
t         B(t)    C(t)         B(t) - C(t)        r         (1+r)^t
0          0 1,000,000 -1,000,000                0.02      1.00
1   100,000      100,000             0           0.02      1.02
2   110,000      100,000       10,000            0.02      1.04
3   120,000      100,000       20,000            0.02      1.06
4   130,000      100,000       30,000            0.02      1.08
5   140,000      100,000       40,000            0.02      1.10
6   150,000      100,000       50,000            0.02      1.13
7   160,000      100,000       60,000            0.02      1.15
8   170,000      100,000       70,000            0.02      1.17
9   180,000      100,000       80,000            0.02      1.20
10   190,000      100,000       90,000            0.02      1.22
11   200,000      100,000     100,000             0.02      1.24
12   210,000      100,000     110,000             0.02      1.27
13   220,000      100,000     120,000             0.02      1.29
14   230,000      100,000     130,000             0.02      1.32
15   240,000      100,000     140,000             0.02      1.35
16   250,000      100,000     150,000             0.02      1.37
17   260,000      100,000     160,000             0.02      1.40
18   270,000      100,000     170,000             0.02      1.43
19   280,000      100,000     180,000             0.02      1.46
20   290,000      100,000     190,000             0.02      1.49

and changing discount rates

1,500,000

Page 51
Cost-benefit

Benefit
1,000,000                                           Cost
Cumulative Net Present Value (NPV)

500,000
Net Benefit

0
1   2   3   4   5    6     7       8   9      10   11     12   13

-500,000

the year when the green line crosses over
axis (where y=0) is the year when the cumu
-1,000,000
impact shifts from a net cost to a net benef

-1,500,000
Year

Page 52
Cost-benefit

(Bt  Ct )  n
preferences.
NPV             t
t  0 (1 r)
Bt  benefits in year t
Ct  costs in year t
t  year
NPV  net present value (benefits adjusted for cost)
r  discount rate (e.g.,6% per year or 0.06)
lsewhere.

marginal productivity of capital' argument, the use of the word 'marginal' indicating that it is

COSTS AND BENEFITS. [99]
is a net good project.

rations, since lenders want paybacks. e.g., 30 year loans. but it is easier to shift non-mone

discounted value.
alued (even though today's actions are required for those 50 years from now to enjoy them).
ble resources, esp. with a high discount rate.

p between high discount rates and environmental deterioration." [103]

conomic growth for a nation?      the interest rate?       [104]

Page 53
Cost-benefit

ompensated by projects specifically designed to improve the environment." [106]

ecially if costs and benefits patterns vary over time.     (see graph).

Net Benefit
discounted for
present value         Cumulative Net Present Value (NPV)
(B(t) - C(t)) / (1+r)t ∑ (B(t) - C(t)) / (1+r)t
-1,000,000         -1,000,000
0       -1,000,000
9,612         -990,388
18,846          -971,542
27,715          -943,827
36,229          -907,597
44,399          -863,199
52,234          -810,965
59,744          -751,221
66,940          -684,280
73,831          -610,449
80,426          -530,023
86,734          -443,288
92,764          -350,525
98,524          -252,001
104,022           -147,979
109,267             -38,712
114,266              75,554
119,027            194,581
123,558            318,139
127,865            446,003

Page 54
Cost-benefit

Net Present Value (NPV)

13    14      15   16   17   18   19   20   21

the year when the green line crosses over the x
axis (where y=0) is the year when the cumulative
impact shifts from a net cost to a net benefit.

Page 55
Cost-benefit

nal' indicating that it is the productivity of additional units of capital that is relevant. [99]

sier to shift non-monetary costs to the future, since the lenders are around to complain! the

m now to enjoy them). ie., they should not be discounted like capital.

Page 56
Cost-benefit

ent." [106]

Page 57
Cost-benefit

Page 58
Cost-benefit

is relevant. [99]

und to complain! they don't have a contractual agr

Page 59
gini
0.386
RANGE: 0 (PERFECT EQUALITY; 1 PERFECT INEQUALITY)

n                        20

Income calculatedalculated
Person "i"                   c            calculated
"i"        X(i)    x(i)
CULULATIVE X(i)        x(i)*i
1     1,000    0.003         0.003       0.00
2     3,000    0.009         0.013       0.02
3     4,000    0.013         0.025       0.04
4     5,000    0.016         0.041       0.06
5     6,000    0.019         0.059       0.09
6     8,000    0.025         0.084       0.15
7     8,000    0.025         0.109       0.18
8     9,000    0.028         0.138       0.23
9   11,000     0.034         0.172       0.31
10    12,000     0.038         0.209       0.38
11    14,000     0.044         0.253       0.48
12    17,000     0.053         0.306       0.64
13    19,000     0.059         0.366       0.77
14    21,000     0.066         0.431       0.92
15    23,000     0.072         0.503       1.08
16    27,000     0.084         0.588       1.35
17    29,000     0.091         0.678       1.54                 GINI COEFFICENT
18    32,000     0.100         0.778       1.80                  CUMULATIVE X
19    33,000     0.103         0.881       1.96
20    38,000     0.119         1.000       2.38         1.000
SUM         320000        1                   14.36                   LINE OF EQUALITY
0.900
mean                   0.05
Insert income amounts for each of       0.800
the 20 people here -- be sure to
arrange from LOW to HIGH                0.700
Do NOT enter data in any of the
0.600
other columns -- those are
calculated.                             0.500
Try entering both a fairly equal
income distribution -- and then try a   0.400
0.300

0.200

0.100
0.000
0   5      10

the LORENZ CURVE -- see how
the curve deviates from the line
of equality as the gini coefficient
source of formula and text: U.S. Census Bureau. The
Changing Shape of t he Nation’s Income Distribution, 1947-
1998, Curren tPopulationReport, By Arthur F. Jones Jr.and
Daniel H. Weinberg, (Issued June 2000)
http://www.census.gov/prod/2000pubs/p60-204.pdf

MEASURES OF
INEQUALITY/DISPARITY:
how to calculate a Gini
Coefficient

COEFFICENT                GINI COEFFICENT                           GINI COEFFICENT
MULATIVE X                 CUMULATIVE X                              CUMULATIVE X

1.000                                        1.000

OF EQUALITY                 LINE OF EQUALITY                          LINE OF EQUALITY
0.900                                        0.900

0.800                                        0.800

0.700                                        0.700

0.600                                        0.600

0.500                                        0.500

0.400                                        0.400

0.300                                        0.300

0.200                                        0.200

0.100                                        0.100
0.000                                                0.000

1

3

5

7

9

1

3

5

7

9
11

13

15

17

19

11
15      20

CURVE -- see how
viates from the line
s the gini coefficient
I COEFFICENT
UMULATIVE X

NE OF EQUALITY
13

15

17

19

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 7 posted: 10/2/2012 language: Latin pages: 65