Embed
Email

Basic tools

Document Sample
Basic tools
Basic Quantitative Tools (Prof. Campbell) Data needed

last updated: Friday, March 17, 2008

(INFERENTIAL STATISTICS -- inferring from a sample to a population using the laws of probability)



How confident can one be that the sample mean (or proportion) represents the population as a whole?

confidence interval (mean) one interval variable

inverse: given a specific confidence interval, what is the needed sample size?

confidence interval (proportion) one nominal variable



Do differences found in a sample (a subset of the population) reflect differences in the

population as a whole? (commonly used to generalize from survey results)

chi-square two categorical variables

an interval variable divided

difference of means into two categories

a nominal variable divided

difference of proportions into two categories



an interval variable divided

ANOVA (Analysis of Variance) into three or more categories



What is the relationship between two variables?

correlation analysis (including an example of

ecological fallacy) two interval variables



How many total jobs are dependent on basic (export-based) jobs?

number of basic (export)

jobs, number of total jobs

Multiplier (export + locally serving)



What is the relative concentration of local employment by sector?

employment (total and by

sector) for both the locality

Location Quotients and the nation



How can we estimate interaction (e.g., trade, traffic) between two cities?

population of two cities,

Gravity Model distance, constant



How do we measure growth over time?

Growth Rates (3 types) population levels over time



How do we compare costs and benefits (e.g., of a project) over time?



quantified costs and benefits

Cost-benefit analysis for each year, discount rate









c672515f-24ea-49e6-9a63-dfa6309fbbec.xls Overview 9/2/2009 10:22 PM

calculate a confidence interval (with interval data)

that is, how confident are you that your sample estimate comes close to the populat

one interval variable

enter data

Data needed:

sample mean (X) in yellow cells

std dev of sample (s)

sample size (n)

_

  X  t.025

value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate







Data Hhd Income



1 24,000 24,000

2 36,000 36,000



3 12,000 12,000 SO:

4 74,000 74,000 u= 42,000 +/-

5 46,000 46,000



6 27,000 27,000 lower end of confidence interval

7 23,000 23,000 upper end of confidence interval

8 69,000 69,000 range

9 107,000 107,000

10 53,000 53,000

11 29,000 29,000

12 34,000 34,000 Confidence Interval

13 43,000 43,000

14 28,000 28,000

15 24,000 24,000

16 43,000 43,000



MEAN 42,000

STDEV 24,105 1

n 16

t 2.131



set the confidence level (2-tail)



0.05 - 20,000 40,000 60,000

close to the population mean?









_ s

 X  t.025

n

12,845

29,155

54,845

25,690









80,000 100,000

calculate a confidence interval

that is, how confident are you that your sample estimate comes close to the populat

one interval variable

Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)

Data needed:

sample mean (X)

std dev of sample (s)

sample size (n)

value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate









MEAN 42,000 enter data

STDEV 5,000 in yellow cells

n 384

SO:

t 1.966 u= 42,000

set the confidence level (2-tail) lower end of confidence interval

0.05 upper end of confidence interval

range



Confidence Interval









1









- 20,000 40,000

e comes close to the population mean?

data (mean, std dev., n)









_ s

  X  t.025

n



+/- 502

f confidence interval 41,498

of confidence interval 42,502

1,003



Confidence Interval









40,000 60,000 80,000 100,000

calculate a minimum sample size need to achieve a specific confi

that is, how confident are you that your sample estimate comes close to the populat

one interval variable

Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)

Data needed:

sample mean (X)

std dev of sample (s)

sample size (n)

value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate



enter data

in yellow cells

MEAN 42,000

STDEV 5,000

c (confidence interval range) 500

SO:

t 1.960 u=

set the confidence level (2-tail) lower end of co

0.05 upper end of co

range

given values of stdev and c and confidence level, we calculate "n":

sample size needed 384





1

NOTES:

1. For the value of "t", we simply

assumed a large sample size (t --> Z),

e.g., for 95% confidence interval (2-

tailed), t = 1.96. - 20,000



2. We are also assuming a large

population size (M), so that N/M --> 0.

a specific confidence interval range

close to the population mean?



here is the formula to calculate a confidence



_

s

  X  t.025

n

…solving for n (sample size)



42,000 +/- 500

t.025 s

ower end of confidence interval 41,500 n 

upper end of confidence interval 42,500

1,000

 c

Confidence Interval

…leads to this equation (so, to estima

size, you need to know Stdev, the con

and the value of t.



t.025 s 2

 n( )

c

20,000 40,000 60,000 80,000 100,000











calculate a confidence interval





s

 t.025

n

mple size)





t.025 s



c

equation (so, to estimate sample

to know Stdev, the confidence interval,





t.025 s 2

( )

c

calculate a confidence interval using proportions (nominal data)

for large n

one nominal variable (proportions)

the population proportion is p

enter data

  P 1.96

Data needed:

sample proportion (P) in yellow cells

sample size (n)



set the confidence level (2-tail)



0.05

P 50% SO:

n 100 p 0.500 +/-



lower end of confidence interval

t 1.984 upper end of confidence interval

range



Confidence Interval









1









0% 10% 20% 30% 40% 50% 60% 70%

(nominal data)







P(1  P)

 P 1.96

n

in percent

0.099 9.9%



0.401 40.1%

0.599 59.9%

0.198 19.8%









70% 80% 90% 100%

Chi-Square







CHI-SQUARE TEST (EXCEL: FUNCTION) does the distribution of ou

from a random distribution

ACTUAL (OBSERVED)

city suburb rural

strong 2 1 1 4 enter data

medium 1 2 1 4 in yellow cells

weak 1 1 2 4

4 4 4 12



PREDICTED/EXPECTED (based on mutiplying row and column to



city suburb rural

strong 1.3333 1.3333 1.3333 4

medium 1.3333 1.3333 1.3333 4

weak 1.3333 1.3333 1.3333 4

4 4 4 12



Chi-square test (Calculated by Excel): "CHITEST"



### (probability of this sample outcome if no difference in population)

range: 0 to 1

Difference between predicted and actual



city suburb rural

strong -0.6667 0.3333 0.3333 0

medium 0.3333 -0.6667 0.3333 0

weak 0.3333 0.3333 -0.6667 0

0 0 0 0





Page 11

Chi-Square







distribution of outcomes (observed) significantly differ

andom distribution (expected)?





nter data 20 8 20

yellow cells

16 12 12

12 16 4





and column totals)





2

( fo  fe )

  2

fe

fo observed frequencies



fe

fe expected frequencies









Page 12

The t distribution is used for hypothesis testing with small samples (e.g., smaller than about 100 cases)

the t distribution is similar to the z distribution, but is "flatter" because of the smaller sample size.

When the sample size gets large (e.g., over 50-100), the t distribution approaches that of the Z distribution (a normal c



d.f. 5 10 50 1000

tails 2 2 Probabilities 2and t-scores for various degree

2



0

1.000

1.000 1.000 1.000 1.000

test)

0.1 0.950

0.924 0.922 0.921 0.920

0.2 0.849 0.845 0.842 0.842

0.900

0.3 0.776 0.770 0.765 0.764

0.4 0.706

0.850 0.698 0.691 0.689

probability of this outcome if no difference in population









0.5 0.638 0.628 0.619 0.617

0.6 0.575

0.800 0.562 0.551 0.549

0.7 0.515 0.500 0.487 0.484

0.8 0.750

0.460 0.442 0.427 0.424

0.9 0.409 0.389 0.372 0.368

0.700

1 0.363 0.341 0.322 0.318

1.1 0.321

0.650 0.297 0.277 0.272

1.2 0.284 0.258 0.236 0.230

1.3 0.250

0.600 0.223 0.200 0.194

1.4 0.220 0.192 0.168 0.162

1.5 0.550

0.194 0.165 0.140 0.134

1.6 0.170 0.141 0.116 0.110

0.500

1.7 0.150 0.120 0.095 0.089

1.8 0.132

0.450 0.102 0.078 0.072

1.9 0.116 0.087 0.063 0.058

2 0.102

0.400 0.073 0.051 0.046

2.1 0.090 0.062 0.041 0.036

2.2 0.350

0.079 0.052 0.032 0.028

2.3 0.070 0.044 0.026 0.022

0.300

2.4 0.062 0.037 0.020 0.017

2.5 0.054

0.250 0.031 0.016 0.013

2.6 0.048 0.026 0.012 0.009

2.7 0.043

0.200 0.022 0.009 0.007

2.8 level as the

0.038 the 0.050.019 is by convention used0.005 threshold of

0.007

2.9 0.150 0.016 0.006 0.004

0.034 statistical significance (though sometimes we use an even more

3 0.030 0.013 0.004 0.003

0.100 strict level, such as 0.01 or even 0.001



0.050



0.000



0 0.5 1 1.5

standardized sample differences (t

bout 100 cases)



the Z distribution (a normal curve)





r various degrees of freedom (two-tail

test)









Degrees of freedom



5

10

50

1000









the larger the sample size, the lower the

value of the critical t ...



… when the sample size gets large (e.g.,

over 50 - 100), then the critical t level (.05,

2 tail) approaches 1.96









2 2.5 3

differences (t-scores)

difference of means





Small Standard Deviation Larger Standard Deviation



Factor 50 45 Factor 80 75

Case Male Income Female Income Female Income

CaseMale Income

1 69,000 49,000 1 40,000 72,000

2 77,000 67,000 2 42,000 34,000

3 46,000 69,000 3 83,000 65,000

4 59,000 64,000 4 100,000 34,000

5 55,000 30,000 5 100,000 86,000

6 50,000 68,000 6 86,000 86,000

7 38,000 73,000 7 104,000 67,000

8 63,000 61,000 8 70,000 64,000

9 50,000 61,000 9 37,000 79,000

10 56,000 48,000 10 62,000 78,000

11 74,000 72,000 11 88,000 85,000

12 50,000 57,000 12 72,000 83,000



Mean 57,250 59,917 Mean 73,667 69,417

Std Dev. 11,702 12,428 Std Dev. 24,092 18,372







female female

mean

mean









Male

mean



1

Male

1

mean

- 20,000 40,000 60,000 80,000 100,000 - 20,000 40,000 60,000 80,000 100,000









t-Test: Two-Sample Assuming Equal Variances t-Test: Two-Sample Assuming Equal Variances



Male Income Female Income Male IncomeFemale Income

Mean 57250 59916.66667 Mean 73666.6667 69416.6667

Variance 136931818.2 154446969.7 Variance 580424242 337537879

Observations 12 12 Observations 12 12

Pooled Variance 145689393.9 Pooled Variance458981061

Hypothesized Mean Difference 0 Hypothesized Mean Difference 0

df 22 df 22

t Stat -0.541 t Stat 0.486

P(T2 (i.e., if t

t-score -1.86871

2), then it is "statistically significant" at the .05 level.

Prob-t 0.068649 That is, there is less than a 5% chance that one could get

this difference in the sample drawn from a population

where there is no difference between city and suburban









Page 20

Diff of Proportions









cent of Residents Who Own a Car









Suburban Residents







on: does the difference found in the sample reflect

ce among the entire population? [the research





difference is due merely to random sample

there is no difference in the population as a whole.







2 or t >

he .05 level.

at one could get

a population

and suburban









Page 21

Diff of Proportions (2)







Here, if given just the mean, n of cases





Mean 10.0% 20.0%

n of cases 150 120

degrees of freedom (n1 +n2-2) 268





t-score

Numerator:-10.0%

pu 0.14444 see Blalock, p. 234

0

sqrt(pu,qu) .35154

0.04305

denominator=



t-score -2.3226



Prob-t 0.0209





NOte that as the mean values deviate from 50%, we can

be more accurate:

e.g., compare 10% to 20%, vs. 40% to 50%

or 80% 90%









Page 22

ANOVA





AUTO MILES DRIVEN PER WEEK



Case Rural

City ResidentsSuburban ResidentsResidents City ResidentsSuburban Residents

1 20 50 40 0 17

rural 0 mean

2 0 80 50 23

3 50 90 60 12 24

4 100 350 70 18 50

5 70 240 80 18 60

6 35 120 90 20 65

7 12 90 100 24 70

8 150 80 100 35 80

9 120 70 20 35 80

10 0 60 30 42 85

11 18 90 40 50 90

mean

12 35 111 50 66 90

13 42 122 60 67 90

14 67 133 250 suburban

70 96

15 95 144 170 75 111

16 66 155 120 77 120

17 77 96 150 95 122

18 123 23 170 100 133

19 0 65 180 120 144

20 18 24 111 123 155

21 24 17 130 150 240

22 75 85 75 urban200 350

Mean 54.4 104.3 97.5 54.4mean 104.3

1

0 50 100

AUTO MILES DRIVEN PER W

Anova: Single Factor



SUMMARY

Groups Count Sum Average Variance

City Residents 22 1397 63.5 2672.64286

Suburban Residents 22 2295 104.318182 5471.65584

Rural Residents 22 2146 97.5454545 3391.11688





ANOVA

Source of Variation SS df MS F P-value F crit

2

Between Groups 1054.6364 2 10527.3182 2.73782547 0.07241657 3.14280868

Within Groups 242243.727 63 3845.13853



Total 263298.364 65





Why use ANOVA?

In situations where you are comparing the means from more than two groups.

since in a difference of means test, you compare x2-x1.

For more than two groups, you can't compare x3-x2-x1.

so you look at the variation (sum of squares) within vs. between groups.

Intuitively, sample groups with low internal variation, but high variation across groups, will

likely represent real differences in the population as a whole.





Page 23

ANOVA





While sample groups with high internal variation and low variation across groups have

a greater chance of representing populations with no real differences.



Anova: Single Factor



SUMMARY

Groups Count Sum Average Variance

City Residents 22 1197 54.4090909 1890.82468

Suburban Residents 22 2295 104.318182 5471.65584

Rural Residents 22 2146 97.5454545 3391.11688





ANOVA

Source of Variation SS df MS F P-value F crit

3

Between Groups 2248.5758 2 16124.2879 4.49829595 0.01492455 3.14280868

Within Groups 225825.545 63 3584.53247



Total 258074.121 65









Page 24

ANOVA









Rural Residents level

20 1 1.1 1.2

30 1 1.1 1.2

40 1 1.1 1.2

40 1 1.1 1.2

50 1 1.1 1.2

50 1 1.1 1.2

60 1 1.1 1.2

60 1 1.1 1.2

70 1 1.1 1.2

75 1 1.1 1.2

mean 80 1 1.1 1.2

90 1 1.1 1.2

100 1 1.1 1.2

100 1 1.1 1.2

111 1 1.1 1.2

120 1 1.1 1.2

130 1 1.1 1.2

150 1 1.1 1.2

170 1 1.1 1.2

170 1 1.1 1.2

180 1 1.1 1.2

250 1 1.1 1.2

97.5 1 1.1 1.2



150 200 250 300 350 400

AUTO MILES DRIVEN PER WEEK









SSbetwee nd . f .

F  SSwithind. f .

SSbetween  sum of squares between the groups

SSwithin  sum of squares within the groups

d.f. = degrees of freedom

Page 25

SSbetween  sum of squares between the groups

SSwithin  sum of squares within the groups

ANOVA



d.f. = degrees of freedom









Page 26

Case x y 1

Case x

1 0 0 1 0.2

2 0.1 0.1 0.9

2 0.3

3 0.2 0.2 0.8

3 0.4

4 0.3 0.3 4 0.4

0.7

5 0.4 0.3 5 0.5

6 0.5 0.4 0.6 6 0.5

7 0.6 0.5 7 0.5

0.5

8 0.7 0.6 8 0.5

9 0.8 0.7 0.4 9 0.6

10 0.8 0.8 0.3

10 0.6

11 0.9 0.9 11 0.7

12 1 0.9 0.2

12 0.8

correlation+0.99 0.1 correlation

F ### 0 F

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p value 0.00 p value

correlation (range: -1 >> time preferences.



Present value (PV) = B(t) / (1+r)t

where B(t) is the benefit in year t, r is the discount rate.



Net Present Value (NPV) = ∑ (B(t) - C(t)) / (1+r)t

where B is benefits and C is costs.







why is money worth less in the future?

1 people are impatient (and mortal)

2 opportunity cost of investing the capital elsewhere.



The argument for discounting is referred to as the 'marginal productivity of capital'



AND THE TRICK IS TO INCLUDE ENVIRONMENTAL COSTS AND BENEFITS. [99]

if ∑ (B(t) - C(t)±E(t)) * (1+r)t > 0 , then the project is a net good project.



The Problems with Discounting for the Environment

a way to shift heavy costs to future generations.



note: it is hard to shift capital costs to future generations, since lenders want payba



1 actual damage may be far larger than the discounted value.

2 long-term benefits are also not strongly valued (even though today's action

3 will lead to greater exhaustion of exhaustible resources, esp. with a high d



However: "There is, in fact, no unique relationship between high discount rates an



How to select a discount rate: simply the rate of economic growth for a nation? t



Taking sustainability into account:



Page 50

Cost-benefit







EX: "require that any environmental damage be compensated by projects specifica



note how the r can really change the outcome, especially if costs and benefits patte

EXAMPLE



discoun discoun

Benefit Cost Net Benefit t rate t rate

t B(t) C(t) B(t) - C(t) r (1+r)^t

0 0 1,000,000 -1,000,000 0.02 1.00

1 100,000 100,000 0 0.02 1.02

2 110,000 100,000 10,000 0.02 1.04

3 120,000 100,000 20,000 0.02 1.06

4 130,000 100,000 30,000 0.02 1.08

5 140,000 100,000 40,000 0.02 1.10

6 150,000 100,000 50,000 0.02 1.13

7 160,000 100,000 60,000 0.02 1.15

8 170,000 100,000 70,000 0.02 1.17

9 180,000 100,000 80,000 0.02 1.20

10 190,000 100,000 90,000 0.02 1.22

11 200,000 100,000 100,000 0.02 1.24

12 210,000 100,000 110,000 0.02 1.27

13 220,000 100,000 120,000 0.02 1.29

14 230,000 100,000 130,000 0.02 1.32

15 240,000 100,000 140,000 0.02 1.35

16 250,000 100,000 150,000 0.02 1.37

17 260,000 100,000 160,000 0.02 1.40

18 270,000 100,000 170,000 0.02 1.43

19 280,000 100,000 180,000 0.02 1.46

20 290,000 100,000 190,000 0.02 1.49



Compare front-loading and backloading costs

and changing discount rates





1,500,000









Page 51

Cost-benefit









Benefit

1,000,000 Cost

Cumulative Net Present Value (NPV)









500,000

Net Benefit









0

1 2 3 4 5 6 7 8 9 10 11 12 13









-500,000









the year when the green line crosses over

axis (where y=0) is the year when the cumu

-1,000,000

impact shifts from a net cost to a net benef









-1,500,000

Year









Page 52

Cost-benefit









(Bt  Ct ) n

preferences.

NPV   t

t  0 (1 r)

Bt  benefits in year t

Ct  costs in year t

t  year

NPV  net present value (benefits adjusted for cost)

r  discount rate (e.g., 6% per year or 0.06)

lsewhere.



marginal productivity of capital' argument, the use of the word 'marginal' indicating that it is



COSTS AND BENEFITS. [99]

is a net good project.









rations, since lenders want paybacks. e.g., 30 year loans. but it is easier to shift non-mone



discounted value.

alued (even though today's actions are required for those 50 years from now to enjoy them).

ble resources, esp. with a high discount rate.



p between high discount rates and environmental deterioration." [103]



conomic growth for a nation? the interest rate? [104]









Page 53

Cost-benefit







ompensated by projects specifically designed to improve the environment." [106]



ecially if costs and benefits patterns vary over time. (see graph).



Net Benefit

discounted for

present value Cumulative Net Present Value (NPV)

(B(t) - C(t)) / (1+r)t ∑ (B(t) - C(t)) / (1+r)t

-1,000,000 -1,000,000

0 -1,000,000

9,612 -990,388

18,846 -971,542

27,715 -943,827

36,229 -907,597

44,399 -863,199

52,234 -810,965

59,744 -751,221

66,940 -684,280

73,831 -610,449

80,426 -530,023

86,734 -443,288

92,764 -350,525

98,524 -252,001

104,022 -147,979

109,267 -38,712

114,266 75,554

119,027 194,581

123,558 318,139

127,865 446,003









Page 54

Cost-benefit









Net Present Value (NPV)









13 14 15 16 17 18 19 20 21









the year when the green line crosses over the x

axis (where y=0) is the year when the cumulative

impact shifts from a net cost to a net benefit.









Page 55

Cost-benefit









nal' indicating that it is the productivity of additional units of capital that is relevant. [99]









sier to shift non-monetary costs to the future, since the lenders are around to complain! the





m now to enjoy them). ie., they should not be discounted like capital.









Page 56

Cost-benefit







ent." [106]









Page 57

Cost-benefit









Page 58

Cost-benefit









is relevant. [99]









und to complain! they don't have a contractual agr









Page 59

gini

0.386

RANGE: 0 (PERFECT EQUALITY; 1 PERFECT INEQUALITY)



n 20



Income calculatedalculated

Person "i" c calculated

"i" X(i) x(i)

CULULATIVE X(i) x(i)*i

1 1,000 0.003 0.003 0.00

2 3,000 0.009 0.013 0.02

3 4,000 0.013 0.025 0.04

4 5,000 0.016 0.041 0.06

5 6,000 0.019 0.059 0.09

6 8,000 0.025 0.084 0.15

7 8,000 0.025 0.109 0.18

8 9,000 0.028 0.138 0.23

9 11,000 0.034 0.172 0.31

10 12,000 0.038 0.209 0.38

11 14,000 0.044 0.253 0.48

12 17,000 0.053 0.306 0.64

13 19,000 0.059 0.366 0.77

14 21,000 0.066 0.431 0.92

15 23,000 0.072 0.503 1.08

16 27,000 0.084 0.588 1.35

17 29,000 0.091 0.678 1.54 GINI COEFFICENT

18 32,000 0.100 0.778 1.80 CUMULATIVE X

19 33,000 0.103 0.881 1.96

20 38,000 0.119 1.000 2.38 1.000

SUM 320000 1 14.36 LINE OF EQUALITY

0.900

mean 0.05

Insert income amounts for each of 0.800

the 20 people here -- be sure to

arrange from LOW to HIGH 0.700

Do NOT enter data in any of the

0.600

other columns -- those are

calculated. 0.500

Try entering both a fairly equal

income distribution -- and then try a 0.400

broadly unequal one.

0.300



0.200



0.100

0.000

0 5 10









the LORENZ CURVE -- see how

the curve deviates from the line

of equality as the gini coefficient

source of formula and text: U.S. Census Bureau. The

Changing Shape of t he Nation’s Income Distribution, 1947-

1998, Curren tPopulationReport, By Arthur F. Jones Jr.and

Daniel H. Weinberg, (Issued June 2000)

http://www.census.gov/prod/2000pubs/p60-204.pdf









MEASURES OF

INEQUALITY/DISPARITY:

how to calculate a Gini

Coefficient









COEFFICENT GINI COEFFICENT GINI COEFFICENT

MULATIVE X CUMULATIVE X CUMULATIVE X



1.000 1.000



OF EQUALITY LINE OF EQUALITY LINE OF EQUALITY

0.900 0.900



0.800 0.800



0.700 0.700



0.600 0.600



0.500 0.500



0.400 0.400



0.300 0.300



0.200 0.200



0.100 0.100

0.000 0.000









11



13



15



17



19









11

1



3



5



7









1



3

9









5



7



9

15 20









CURVE -- see how

viates from the line

s the gini coefficient

I COEFFICENT

UMULATIVE X







NE OF EQUALITY

13



15



17



19


Related docs
Other docs by stevenTerrell
Instructional Pilot Project
Views: 5  |  Downloads: 0
062500 기도문
Views: 8  |  Downloads: 0
group_prefs
Views: 8  |  Downloads: 0
Simplify each expression without a calculator
Views: 177  |  Downloads: 0
Diminutives
Views: 29  |  Downloads: 1
Michelle Dalton RN, MS
Views: 8  |  Downloads: 0
Intellectual Property Today
Views: 4  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!