Document Sample

Basic Quantitative Tools (Prof. Campbell) Data needed last updated: Friday, March 17, 2008 (INFERENTIAL STATISTICS -- inferring from a sample to a population using the laws of probability) How confident can one be that the sample mean (or proportion) represents the population as a whole? confidence interval (mean) one interval variable inverse: given a specific confidence interval, what is the needed sample size? confidence interval (proportion) one nominal variable Do differences found in a sample (a subset of the population) reflect differences in the population as a whole? (commonly used to generalize from survey results) chi-square two categorical variables an interval variable divided difference of means into two categories a nominal variable divided difference of proportions into two categories an interval variable divided ANOVA (Analysis of Variance) into three or more categories What is the relationship between two variables? correlation analysis (including an example of ecological fallacy) two interval variables How many total jobs are dependent on basic (export-based) jobs? number of basic (export) jobs, number of total jobs Multiplier (export + locally serving) What is the relative concentration of local employment by sector? employment (total and by sector) for both the locality Location Quotients and the nation How can we estimate interaction (e.g., trade, traffic) between two cities? population of two cities, Gravity Model distance, constant How do we measure growth over time? Growth Rates (3 types) population levels over time How do we compare costs and benefits (e.g., of a project) over time? quantified costs and benefits Cost-benefit analysis for each year, discount rate 35ba01ec-32e7-4987-bec3-a793840deefc.xls Overview 10/1/2012 5:19 PM calculate a confidence interval (with interval data) that is, how confident are you that your sample estimate comes close to the populat one interval variable enter data Data needed: sample mean (X) in yellow cells std dev of sample (s) sample size (n) _ X t.025 value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate Data Hhd Income 1 24,000 24,000 2 36,000 36,000 3 12,000 12,000 SO: 4 74,000 74,000 u= 42,000 +/- 5 46,000 46,000 6 27,000 27,000 lower end of confidence interval 7 23,000 23,000 upper end of confidence interval 8 69,000 69,000 range 9 107,000 107,000 10 53,000 53,000 11 29,000 29,000 12 34,000 34,000 Confidence Interval 13 43,000 43,000 14 28,000 28,000 15 24,000 24,000 16 43,000 43,000 MEAN 42,000 STDEV 24,105 1 n 16 t 2.131 set the confidence level (2-tail) 0.05 - 20,000 40,000 60,000 close to the population mean? _ s X t.025 n 12,845 29,155 54,845 25,690 80,000 100,000 calculate a confidence interval that is, how confident are you that your sample estimate comes close to the populat one interval variable Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n) Data needed: sample mean (X) std dev of sample (s) sample size (n) value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate MEAN 42,000 enter data STDEV 5,000 in yellow cells n 384 SO: t 1.966 u= 42,000 set the confidence level (2-tail) lower end of confidence interval 0.05 upper end of confidence interval range Confidence Interval 1 - 20,000 40,000 e comes close to the population mean? data (mean, std dev., n) _ s X t.025 n +/- 502 f confidence interval 41,498 of confidence interval 42,502 1,003 Confidence Interval 40,000 60,000 80,000 100,000 calculate a minimum sample size need to achieve a specific confi that is, how confident are you that your sample estimate comes close to the populat one interval variable Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n) Data needed: sample mean (X) std dev of sample (s) sample size (n) value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate enter data in yellow cells MEAN 42,000 STDEV 5,000 c (confidence interval range) 500 SO: t 1.960 u= set the confidence level (2-tail) lower end of co 0.05 upper end of co range given values of stdev and c and confidence level, we calculate "n": sample size needed 384 1 NOTES: 1. For the value of "t", we simply assumed a large sample size (t --> Z), e.g., for 95% confidence interval (2- - 20,000 tailed), t = 1.96. 2. We are also assuming a large population size (M), so that N/M --> 0. a specific confidence interval range close to the population mean? here is the formula to calculate a confidence _ s X t.025 n …solving for n (sample size) 42,000 +/- 500 t.025 s ower end of confidence interval 41,500 n upper end of confidence interval 42,500 1,000 c Confidence Interval …leads to this equation (so, to estima size, you need to know Stdev, the con and the value of t. t.025 s 2 n( ) c 20,000 40,000 60,000 80,000 100,000 calculate a confidence interval s t.025 n mple size) t.025 s c equation (so, to estimate sample to know Stdev, the confidence interval, t.025 s 2 ( ) c calculate a confidence interval using proportions (nominal data) for large n one nominal variable (proportions) the population proportion is p enter data P 1.96 Data needed: sample proportion (P) in yellow cells sample size (n) set the confidence level (2-tail) 0.05 P 50% SO: n 100 p 0.500 +/- lower end of confidence interval t 1.984 upper end of confidence interval range Confidence Interval 1 0% 10% 20% 30% 40% 50% 60% 70% (nominal data) P(1 P) P 1.96 n in percent 0.099 9.9% 0.401 40.1% 0.599 59.9% 0.198 19.8% 70% 80% 90% 100% Chi-Square CHI-SQUARE TEST (EXCEL: FUNCTION) does the distribution of ou from a random distribution ACTUAL (OBSERVED) city suburb rural strong 2 1 1 4 enter data medium 1 2 1 4 in yellow cells weak 1 1 2 4 4 4 4 12 PREDICTED/EXPECTED (based on mutiplying row and column to city suburb rural strong 1.3333 1.3333 1.3333 4 medium 1.3333 1.3333 1.3333 4 weak 1.3333 1.3333 1.3333 4 4 4 4 12 Chi-square test (Calculated by Excel): "CHITEST" ### (probability of this sample outcome if no difference in population) range: 0 to 1 Difference between predicted and actual city suburb rural strong -0.6667 0.3333 0.3333 0 medium 0.3333 -0.6667 0.3333 0 weak 0.3333 0.3333 -0.6667 0 0 0 0 0 Page 11 Chi-Square distribution of outcomes (observed) significantly differ andom distribution (expected)? nter data 20 8 20 yellow cells 16 12 12 12 16 4 and column totals) 2 ( fo fe ) 2 fe fo observed frequencies fe fe expected frequencies Page 12 The t distribution is used for hypothesis testing with small samples (e.g., smaller than about 100 cases) the t distribution is similar to the z distribution, but is "flatter" because of the smaller sample size. When the sample size gets large (e.g., over 50-100), the t distribution approaches that of the Z distribution (a normal c d.f. 5 10 50 1000 tails 2 2 Probabilities 2and t-scores for various degree 2 0 1.000 1.000 1.000 1.000 1.000 test) 0.1 0.950 0.924 0.922 0.921 0.920 0.2 0.849 0.845 0.842 0.842 0.900 0.3 0.776 0.770 0.765 0.764 0.4 0.706 0.850 0.698 0.691 0.689 probability of this outcome if no difference in population 0.5 0.638 0.628 0.619 0.617 0.6 0.575 0.800 0.562 0.551 0.549 0.7 0.515 0.500 0.487 0.484 0.8 0.750 0.460 0.442 0.427 0.424 0.9 0.409 0.389 0.372 0.368 0.700 1 0.363 0.341 0.322 0.318 1.1 0.321 0.650 0.297 0.277 0.272 1.2 0.284 0.258 0.236 0.230 1.3 0.250 0.600 0.223 0.200 0.194 1.4 0.220 0.192 0.168 0.162 1.5 0.550 0.194 0.165 0.140 0.134 1.6 0.170 0.141 0.116 0.110 0.500 1.7 0.150 0.120 0.095 0.089 1.8 0.132 0.450 0.102 0.078 0.072 1.9 0.116 0.087 0.063 0.058 2 0.102 0.400 0.073 0.051 0.046 2.1 0.090 0.062 0.041 0.036 2.2 0.350 0.079 0.052 0.032 0.028 2.3 0.070 0.044 0.026 0.022 0.300 2.4 0.062 0.037 0.020 0.017 2.5 0.054 0.250 0.031 0.016 0.013 2.6 0.048 0.026 0.012 0.009 2.7 0.043 0.200 0.022 0.009 0.007 2.8 level as the 0.038 the 0.050.019 is by convention used0.005 threshold of 0.007 2.9 0.150 0.034 statistical significance (though sometimes we use an even more 0.016 0.006 0.004 3 0.013 0.004 0.003 0.030 strict level, such as 0.01 or even 0.001 0.100 0.050 0.000 0 0.5 1 1.5 standardized sample differences (t bout 100 cases) the Z distribution (a normal curve) r various degrees of freedom (two-tail test) Degrees of freedom 5 10 50 1000 the larger the sample size, the lower the value of the critical t ... … when the sample size gets large (e.g., over 50 - 100), then the critical t level (.05, 2 tail) approaches 1.96 2 2.5 3 differences (t-scores) difference of means Small Standard Deviation Larger Standard Deviation Factor 50 45 Factor 80 75 Case Male Income Female Income Female Income CaseMale Income 1 69,000 49,000 1 40,000 72,000 2 77,000 67,000 2 42,000 34,000 3 46,000 69,000 3 83,000 65,000 4 59,000 64,000 4 100,000 34,000 5 55,000 30,000 5 100,000 86,000 6 50,000 68,000 6 86,000 86,000 7 38,000 73,000 7 104,000 67,000 8 63,000 61,000 8 70,000 64,000 9 50,000 61,000 9 37,000 79,000 10 56,000 48,000 10 62,000 78,000 11 74,000 72,000 11 88,000 85,000 12 50,000 57,000 12 72,000 83,000 Mean 57,250 59,917 Mean 73,667 69,417 Std Dev. 11,702 12,428 Std Dev. 24,092 18,372 female female mean mean Male mean 1 Male 1 mean - 20,000 40,000 60,000 80,000 100,000 - 20,000 40,000 60,000 80,000 100,000 t-Test: Two-Sample Assuming Equal Variances t-Test: Two-Sample Assuming Equal Variances Male Income Female Income Male IncomeFemale Income Mean 57250 59916.66667 Mean 73666.6667 69416.6667 Variance 136931818.2 154446969.7 Variance 580424242 337537879 Observations 12 12 Observations 12 12 Pooled Variance 145689393.9 Pooled Variance458981061 Hypothesized Mean Difference 0 Hypothesized Mean Difference 0 df 22 df 22 t Stat -0.541 t Stat 0.486 P(T<=t) one-tail t Critical one-tail 0.297 1.717 fail to P(T<=t) one-tail t Critical one-tail 0.316 1.717 fail to P(T<=t) two-tail t Critical two-tail 0.594 2.074 reject P(T<=t) two-tail t Critical two-tail 0.632 2.074 reject H Data Needed: Page 15 difference of means number of cases for each of the two groups sample means for the two groups standard deviation for each group Hypothesis (no difference between the two population means): 1 2 i.e., 1 2 0 How to calculate t (note; EXCEL will do this all for you -- so do don't need to really use this formula) _ _ (X1 X2 ) ( 1 2 ) t _ _ ˆ X 1 X 2 SINCE WE HYPOTHESIZE U1=U2, OR U1 -U2 = 0, then the (u1-u2) drops out of the numerator of the equation for t _ _ (X 1 X 2 ) t _ _ ˆ X1 X 2 the formula for the standard error (the denominator of the equation for t) 12 22 X1X2 N1 N2 if we can assume the same standard deviation of the populations ("equal variance") Page 16 difference of means N1 N 2 X1X 2 N1 N 2 Page 17 difference of means Bigger DOM Factor 80 70 Female Income CaseMale Income 1 37,000 55,000 2 105,000 40,000 3 52,000 82,000 4 58,000 97,000 5 107,000 56,000 6 105,000 44,000 7 96,000 39,000 8 86,000 70,000 9 82,000 77,000 10 104,000 79,000 11 71,000 44,000 12 100,000 47,000 Mean 83,583 60,833 Std Dev. 23,922 19,441 female mean Male mean 1 100,000 120,000 - 20,000 40,000 60,000 80,000 100,000 120,000 g Equal Variances t-Test: Two-Sample Assuming Equal Variances emale Income Male IncomeFemale Income Mean 83583.3333 60833.3333 Variance 572265152 377969697 Observations 12 12 Pooled Variance475117424 Hypothesized Mean Difference 0 df 22 t Stat 2.557 fail to P(T<=t) one-tail t Critical one-tail 0.009 1.717 reject H0 reject H0 P(T<=t) two-tail t Critical two-tail 0.018 2.074 Page 18 difference of means r of the equation for t Page 19 Diff of Proportions A special case of the difference of means test Do you Own a Car? 1= yes, 0=no Percent of Residents Who Own a Car Case S City Residents uburban Residents 1 0 0 100.0% 2 0 0 3 0 1 90.0% 4 0 1 5 0 1 80.0% 6 0 1 7 0 1 70.0% 8 1 1 60.0% 9 1 1 10 1 1 50.0% 11 1 1 12 1 1 40.0% 13 1 1 14 1 1 30.0% 15 1 1 16 1 1 20.0% 17 1 1 10.0% 18 1 1 19 1 1 0.0% 20 1 1 City Residents 21 1 1 22 1 1 Mean 68.2% 90.9% n of cases 22 22 The central question: does the differen degrees of freedom (n1 +n2-2) 42 an actual difference among the entire p for simplicity and conservatism, we could have hypothesis] also assumed that the population proportions are 50% and 50% Alternative: the difference is due mere t-score Numerator: -22.7% variation, and that there is no difference pu 0.79545 see Blalock, p. 234 [the null hypothesis] sqrt(pu,qu) 0.40337 denominator= 0.12162 Remember: generally if |t| >2 (i.e., if t < -2 or t > t-score -1.86871 2), then it is "statistically significant" at the .05 level. Prob-t 0.068649 That is, there is less than a 5% chance that one could get this difference in the sample drawn from a population where there is no difference between city and suburban Page 20 Diff of Proportions cent of Residents Who Own a Car Suburban Residents on: does the difference found in the sample reflect ce among the entire population? [the research difference is due merely to random sample there is no difference in the population as a whole. 2 or t > he .05 level. at one could get a population and suburban Page 21 Diff of Proportions (2) Here, if given just the mean, n of cases Mean 10.0% 20.0% n of cases 150 120 degrees of freedom (n1 +n2-2) 268 t-score Numerator:-10.0% pu 0.14444 see Blalock, p. 234 0 sqrt(pu,qu) .35154 0.04305 denominator= t-score -2.3226 Prob-t 0.0209 NOte that as the mean values deviate from 50%, we can be more accurate: e.g., compare 10% to 20%, vs. 40% to 50% or 80% 90% Page 22 ANOVA AUTO MILES DRIVEN PER WEEK Case Rural City ResidentsSuburban ResidentsResidents City ResidentsSuburban Residents 1 20 50 40 0 17 rural 0 mean 2 0 80 50 23 3 50 90 60 12 24 4 100 350 70 18 50 5 70 240 80 18 60 6 35 120 90 20 65 7 12 90 100 24 70 8 150 80 100 35 80 9 120 70 20 35 80 10 0 60 30 42 85 11 18 90 40 50 90 mean 12 35 111 50 66 90 13 42 122 60 67 90 14 67 133 250 suburban 70 96 15 95 144 170 75 111 16 66 155 120 77 120 17 77 96 150 95 122 18 123 23 170 100 133 19 0 65 180 120 144 20 18 24 111 123 155 21 24 17 130 150 240 22 75 85 75 urban200 350 Mean 54.4 104.3 97.5 54.4mean 104.3 1 0 50 100 AUTO MILES DRIVEN PER W Anova: Single Factor SUMMARY Groups Count Sum Average Variance City Residents 22 1397 63.5 2672.64286 Suburban Residents 22 2295 104.318182 5471.65584 Rural Residents 22 2146 97.5454545 3391.11688 ANOVA Source of Variation SS df MS F P-value F crit 2 Between Groups 1054.6364 2 10527.3182 2.73782547 0.07241657 3.14280868 Within Groups 242243.727 63 3845.13853 Total 263298.364 65 Why use ANOVA? In situations where you are comparing the means from more than two groups. since in a difference of means test, you compare x2-x1. For more than two groups, you can't compare x3-x2-x1. so you look at the variation (sum of squares) within vs. between groups. Intuitively, sample groups with low internal variation, but high variation across groups, will likely represent real differences in the population as a whole. Page 23 ANOVA While sample groups with high internal variation and low variation across groups have a greater chance of representing populations with no real differences. Anova: Single Factor SUMMARY Groups Count Sum Average Variance City Residents 22 1197 54.4090909 1890.82468 Suburban Residents 22 2295 104.318182 5471.65584 Rural Residents 22 2146 97.5454545 3391.11688 ANOVA Source of Variation SS df MS F P-value F crit 3 Between Groups 2248.5758 2 16124.2879 4.49829595 0.01492455 3.14280868 Within Groups 225825.545 63 3584.53247 Total 258074.121 65 Page 24 ANOVA Rural Residents level 20 1 1.1 1.2 30 1 1.1 1.2 40 1 1.1 1.2 40 1 1.1 1.2 50 1 1.1 1.2 50 1 1.1 1.2 60 1 1.1 1.2 60 1 1.1 1.2 70 1 1.1 1.2 75 1 1.1 1.2 mean 80 1 1.1 1.2 90 1 1.1 1.2 100 1 1.1 1.2 100 1 1.1 1.2 111 1 1.1 1.2 120 1 1.1 1.2 130 1 1.1 1.2 150 1 1.1 1.2 170 1 1.1 1.2 170 1 1.1 1.2 180 1 1.1 1.2 250 1 1.1 1.2 97.5 1 1.1 1.2 150 200 250 300 350 400 AUTO MILES DRIVEN PER WEEK SSbe twe en d . f . F SSw ithin d. f . SSbetween sum of squares between the groups SSwithin sum of squares within the groups d.f. = degrees of freedom Page 25 SSbetween sum of squares between the groups SSwithin sum of squares within the groups ANOVA d.f. = degrees of freedom Page 26 Case x y 1 Case x 1 0 0 1 0.2 2 0.1 0.1 0.9 2 0.3 3 0.2 0.2 0.8 3 0.4 4 0.3 0.3 4 0.4 0.7 5 0.4 0.3 5 0.5 6 0.5 0.4 0.6 6 0.5 7 0.6 0.5 7 0.5 0.5 8 0.7 0.6 8 0.5 9 0.8 0.7 0.4 9 0.6 10 0.8 0.8 0.3 10 0.6 11 0.9 0.9 11 0.7 12 1 0.9 0.2 12 0.8 correlation+0.99 0.1 correlation F ### 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 F p value 0.00 p value correlation (range: -1 < r < +1) 1 Case x y Case x 0.9 1 0 1 1 0 2 0.1 0.9 0.8 2 0.1 3 0.2 0.8 0.7 3 0.2 4 0.3 0.7 0.6 4 0.3 5 0.4 0.6 5 0.4 0.5 6 0.5 0.5 6 0.5 7 0.6 0.3 0.4 7 0.6 8 0.7 0.4 0.3 8 0.7 9 0.8 0.2 0.2 9 0.8 10 0.8 0.2 10 0.8 0.1 11 0.9 0.1 11 0.9 12 1 0 0 12 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 correlation -0.99 correlation F ### F p value 0.00 p value y 1 0.5 0.4 0.9 0.6 0.8 0.4 0.7 0.8 0.3 0.6 0.5 0.5 0.6 0.3 0.4 0.8 0.3 0.2 0.2 0.5 -0.09 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.79 1 y 0.9 0 0.2 0.8 0.4 0.7 0.6 0.6 0.8 0.5 1 1 0.4 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 +0.07 0.0 0.84 Case x y -0.4 correlation (range: -1 < r < +1) 1 0.8406 0.372 2 0.4639 0.2545 3 0.069 0.8191 1 4 0.327 0.9153 5 0.9635 0.7406 0.9 6 0.1542 0.4803 7 0.0266 0.627 8 0.3561 0.4515 0.8 9 0.7929 0.3996 10 0.3435 0.5974 0.7 11 0.8348 0.1966 12 0.297 0.3313 0.6 using a random number generator 0.5 F 1.4 0.4 p-value 0.264 P-Value: this is the probability 0.3 that the x-y relationship found in the sample cases -- expressed as 0.2 an r-value -- is simply due to random variation, and that if one looked at the population as a 0.1 whole, there would be no relationship. If p<.05, we 0 generally conclude that the 0 0.1 0.2 0.3 0.4 0.5 0.6 Hit "recalculate now" to see a new set of numbers (note: on a MAC this is "COMMAND =" nge: -1 < r < +1) 0.6 0.7 0.8 0.9 1 r -0.95 # # # # # # # # # # # # # # # # # # 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 F= 92.5641 # # # # # 7 6 4 3 3 2 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 4 6 7 # # # # sign F 2.3E-06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 n 12 Comparing values of correlation coefficient r (with a range of to +1) from a sample of size n and the corresponding probabililty of its outcome if no relationship in the population as a whole 1 ABOVE the .05 line: relationship 0.95 NOT statistically significant at the 0.9 0.05 level 0.85 Probability of this outcome (based on the F-test) 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 BELOW the .05 line: relationship is statistically 0.4 significant at the 0.05 level 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Value of r 11 ## 00 a range of -1 probabililty Note: as r gets farther as a whole away from zero, both the strength of the relationship and the statistical significance increase. Also: as the sample size (n) increases, the statistical significance increases. As a result: If you want to demonstrate a statistically bivariate relationship, you will need either an r value that is far from zero LOW the .05 line: and/or a large sample ationship is statistically nificant at the 0.05 level red line: critical value: .05 0.8 0.9 1 percent of hhd trips "The ecological fallacy consists in thinking that re using public invisible invisible for groups necessarily hold for individu observed case city transit hhd income factor factor constant1 0 range 0 to 1 source: "Ecological Inference and the Ecological Fallacy" 1 A 63% $ 57,873 0.5 80000 constant2 0 range 0 to 1 David A. Freedman (Department of Statistics, University of California, 2 A 64% $ 57,635 0.5 Prepared 80000 constant3 3 range high to 1 for the International Encyclopedia of the Social & Behaviora 3 A 61% $ 58,529 0.5 80000 Technical Report No. 549, 15 October 1999. 4 A 59% $ 59,498 pdf file accessed Jan. 13, 2002, http://www.stanford.edu/class/ed260/f 0.5 80000 5 A 75% $ 53,915 0.5 80000 6 A 69% $ 55,713 0.5 80000 7 A 51% $ 61,994 0.5 80000 8 A 69% $ 55,980 0.5 80000 9 A 68% $ 56,268 0.5 80000 10 A 60% $ 59,015 0.5 80000 11 B 44% $ 59,656 0.4 75000 12 B 72% $ 49,910 0.4 75000 Scatterplot; unit of analysis: in 13 B 59% $ 54,211 0.4 75000 14 B 66% $ 51,803 0.4 75000 one sees patterns in the indiv Scatterplot; unit of analysis: indivi 15 This example: $ B 71% 50,100 0.4 75000 16 Is there a relationship between use75000 B 68% $ 51,179 0.4 of $65,000 $65,000 17 B 47% $ 58,553 0.4 75000 18 public transit and hhd income? B 43% $ 59,930 0.4 75000 annual hhd income $60,000 $60,000 Aggregate data$(unit57,751 of analysis: 75000 annual hhd income 19 B 49% 0.4 20 68% relationship city): positive $ B 51,265 0.4 75000 $55,000 21 C 62% $ 48,269 0.3 70000 $55,000 Individual data (unit of analysis: hhd): 22 C 61% $ 48,597 0.3 70000 $50,000 23 negative relationship C 49% $ 53,006 0.3 70000 $50,000 24 C 53% $ 51,338 0.3 70000 $45,000 25 C 51% $ 52,322 DANGER: making an ecological 70000 0.3 $45,000 26 C 59% $ 49,426 0.3 70000 27 fallacy -- using aggregate data to 70000 C 55% $ 50,579 0.3 $40,000 $40,000 0% 10% 20% 28 C 56% $ 50,352 0.3 70000 0% 10% 20% 30% 29 C 52% $ 51,688 0.3 70000 Percent of hhd trips 30 C 35% $ 57,726 0.3 70000 Percent of hhd trips by p 31 D 39% $ 54,312 0.2 68000 32 D 53% $ 49,461 0.2 68000 correlation -0.30 33 D 51% $ 50,152 0.2 68000 34 D 51% $ 50,264 0.2 68000 35 D 52% $ 49,771 0.2 68000 Scatterplot; unit of anal 36 D 31% $ 57,222 0.2 68000 yes: cities with more transit have 37 D 50% $ 50,403 0.2 68000 38 D 20% $ 60,972 0.2 68000 39 D 22% $ 60,169 0.2 68000 $58,000 40 D 25% $ 59,079 0.2 68000 mean annual hhd income 41 E 26% $ 53,852 0.1 63000 $57,000 42 E 11% $ 59,151 0.1 63000 $56,000 43 E 39% $ 49,508 0.1 63000 44 E 14% $ 58,168 0.1 63000 $55,000 45 E 21% $ 55,609 0.1 63000 46 E 25% $ 54,186 0.1 63000 $54,000 47 E 22% $ 55,433 0.1 63000 $53,000 mean annual hhd inc $53,000 48 E 37% $ 50,148 0.1 63000 49 E 27% $ 53,540 0.1 63000 $52,000 50 E 30% $ 52,562 0.1 63000 51 F 29% $ 49,933 0 60000 $51,000 52 F 24% $ 51,559 0 60000 0% 10% 20% 53 F 29% $ 49,724 0 60000 Percent of hhd trips 54 F 21% $ 52,783 0 60000 55 F 8% $ 57,283 0 60000 56 F 9% $ 56,859 0 60000 correlation +0.33 57 F 3% $ 58,854 0 60000 58 F 27% $ 50,609 0 60000 59 F 31% $ 49,231 0 60000 60 F 5% $ 58,390 0 60000 AGGREGATED DATA percent of hhd trips using public CITY transit hhd income A 64% $ 57,642 B 59% $ 54,436 C 53% $ 51,330 D 39% $ 54,181 E 25% $ 54,216 F 19% $ 53,523 s in thinking that relationships ly hold for individuals..." ogical Fallacy" s, University of California, Berkeley) ange high to 1 & Behavioral Sciences, of the Social stanford.edu/class/ed260/freedman549.pdf lot; unit of analysis: individual household es patterns in the individual data by cities unit of analysis: individual household 30% 40% 50% 60% 70% 80% 30% 40% 50% 60% 70% 80% Percent of hhd trips by public transit Percent of hhd trips by public transit Scatterplot; unit of analysis: cities with more transit have higher income, but... 20% 30% 40% 50% 60% 70% Percent of hhd trips by public transit Multiplier Multiplier: the relationship between local and export employmen R.O.W. (rest of world) Twin Peaks Revenues from Timber Local Services Timber Export Jobs (Basic) + Non-Export Jobs (NonBasic) = TOTAL JOBS Imagine a simple economy of Twin Peaks, an isolated timber economy Service Jobs 2,000 Timber Jobs (export) 1,000 Total Jobs 3,000 Mutliplier 3.0 So, can use a multipler to estimate the impact of a change in basic em (Up or down) on total employment. [assumes a simple, linear relationship] Page 37 Multiplier Change in Total Employment Change in Basic Employment 100 300 500 1500 1200 3600 -100 -300 -500 -1500 Page 38 Multiplier port employment TOTAL JOBS d timber economy: ange in basic employment Page 39 Location Quotients Location Quotient (LQ) - a measure a relative local employment concentration in a s Used to also estimate local vs. export (I.e., non-basic vs. basic) employment (Can also use to help understand the level of industrial diversification in a local eco EXAMPLE: You are given data for the town of Icarus in the far-away country of Daedalus Icarus Daedalus Population 20,000 2,500,000 $17,000 Annual Gross Per Capita Income $25,000 Total Employment 10,000 1,000,000 Agricultural Emp. 1,000 50,000 Govt Employment 300 100,000 Private Service Emp. 4,000 500,000 Airplane Manufacturing Emp. 700 10,000 Non-airplane Manufacturing Emp.1,000 200,000 All Other Employment 3,000 140,000 Based upon this data, which sectors of the Icarus economy likely are exporting goods or s Estimate the share of each sector's employment that could be due to exports (and explain how you did these estimates and the name of the technique(s) you used). Finally, explain why these estimates may not be accurate. Percent of Total Employment Icarus Daedalus Icarus Daedalus Population 20,000 2,500,000 $17,000 Annual Gross Per Capita Income $25,000 Total Employment 10,000 1,000,000 100% 100% Agricultural Emp. 1,000 50,000 10% 5% Govt Employment 300 100,000 3% 10% Private Service Emp. 4,000 500,000 40% 50% Airplane Manufacturing Emp. 700 10,000 7% 1% Non-airplane Manufacturing Emp.1,000 200,000 10% 20% All Other Employment 3,000 140,000 30% 14% Take the locatio quotients to estimate amount of export jobs (if any): TOTAL JOBS = LOCAL JOBS + EXPORT JOBS Page 40 Location Quotients Total Jobs Local Jobs Export Jobs Total Employment 10,000 Agricultural Emp. 1,000 500 500 Govt Employment 300 300 - Private Service Emp. 4,000 4,000 - Airplane Manufacturing Emp. 700 100 600 1,000 Non-airplane Manufacturing Emp. 1,000 - All Other Employment 3,000 1,400 1,600 TOTAL 10,000 7,300 2,700 5,000 4,500 4,000 3,500 3,000 Export Jobs 2,500 2,000 Local Jobs 1,500 1,000 500 - Agricultural Emp. Private Service Emp. Non-airplane Manufacturing Emp. Page 41 Location Quotients t concentration in a specific sector employment ication in a local economy) ei ay country of Daedalus. ei = local employment in sector i LQ e e = total local employment Ei = national employment in sector i Ei E = total national employment E e exporting goods or services outside the community? o exports nique(s) you used). LOCATION QUOTIENT: Ratio of Local to National Percentages 2.00 export industry 0.30 0.80 7.00 export industry 0.50 2.14 export industry Page 42 Location Quotients xport Jobs ocal Jobs airplane ufacturing Emp. Page 43 Gravity Model Gravity Model Using Newton's Universal Law of Gravitation for social processes m1m2 F G( 2 ) r m1 where F = force of gravity between m1 and m2 G is the universal constant r is the distance between m1 and m2 To convert to society: F becomes the interaction between m1 and m2 (e.g., traffic, trade, etc. m1 and m2 become population (or employment, or GDP, etc.) r is distance G is a constant Example: Population Distance Interaction (e.g., car trip G m1 m2 r F 0.001 10,000 20,000 20 500 0.001 20,000 20,000 20 1,000 0.001 20,000 40,000 20 2,000 0.001 10,000 20,000 10 2,000 0.001 10,000 20,000 5 8,000 Page 44 Gravity Model cial processes m1 r m2 g., traffic, trade, etc.) GDP, etc.) raction (e.g., car trips/day) Page 45 3 Growth Rates Three Growth Rates Pn P0 (1nr) Simple, Linear Growth (e.g., average annual growth) Discrete Compounded Growth (e.g., annual) Pn P0 (1r) n Compounded continuously (with exponent) [almost the same results as discrete] Pn P0e rn where e = 2.7183.... remember than ln (e) = 1 A Comparison of these Three Growth Patterns Discrete Continuously Linear Compounded Compounded Po n r Pn Pn Pn 100 0 0.05 100 100.0 100.0 100 1 0.05 105 105.0 105.1 100 2 0.05 110 110.3 110.5 100 3 0.05 115 115.8 116.2 100 4 0.05 120 121.6 122.1 100 5 0.05 125 127.6 128.4 100 6 0.05 130 134.0 135.0 100 7 0.05 135 140.7 141.9 100 8 0.05 140 147.7 149.2 100 9 0.05 145 155.1 156.8 Page 46 3 Growth Rates 100 10 0.05 150 162.9 164.9 100 15 0.05 175 207.9 211.7 100 20 0.05 200 265.3 271.8 100 25 0.05 225 338.6 349.0 100 30 0.05 250 432.2 448.2 100 40 0.05 300 704.0 738.9 100 50 0.05 350 1146.7 1218.2 100 75 0.05 475 3883.3 4252.1 100 100 0.05 600 13150.1 14841.3 16000.0 14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0 0 20 40 60 80 Page 47 3 Growth Rates Page 48 3 Growth Rates Continuously Compounded Discrete Compounded Linear 100 120 Page 49 Cost-benefit Cost-Benefit Thinking TWO CHALLENGES: 1. how to sum up all the costs and benefits. 2. How to deal with time: discounting. --->>> time preferences. Present value (PV) = B(t) / (1+r)t where B(t) is the benefit in year t, r is the discount rate. Net Present Value (NPV) = ∑ (B(t) - C(t)) / (1+r)t where B is benefits and C is costs. why is money worth less in the future? 1 people are impatient (and mortal) 2 opportunity cost of investing the capital elsewhere. The argument for discounting is referred to as the 'marginal productivity of capital' AND THE TRICK IS TO INCLUDE ENVIRONMENTAL COSTS AND BENEFITS. [99] if ∑ (B(t) - C(t)±E(t)) * (1+r)t > 0 , then the project is a net good project. The Problems with Discounting for the Environment a way to shift heavy costs to future generations. note: it is hard to shift capital costs to future generations, since lenders want payba 1 actual damage may be far larger than the discounted value. 2 long-term benefits are also not strongly valued (even though today's action 3 will lead to greater exhaustion of exhaustible resources, esp. with a high d However: "There is, in fact, no unique relationship between high discount rates an How to select a discount rate: simply the rate of economic growth for a nation? t Taking sustainability into account: Page 50 Cost-benefit EX: "require that any environmental damage be compensated by projects specifica note how the r can really change the outcome, especially if costs and benefits patte EXAMPLE discoun discoun Benefit Cost Net Benefit t rate t rate t B(t) C(t) B(t) - C(t) r (1+r)^t 0 0 1,000,000 -1,000,000 0.02 1.00 1 100,000 100,000 0 0.02 1.02 2 110,000 100,000 10,000 0.02 1.04 3 120,000 100,000 20,000 0.02 1.06 4 130,000 100,000 30,000 0.02 1.08 5 140,000 100,000 40,000 0.02 1.10 6 150,000 100,000 50,000 0.02 1.13 7 160,000 100,000 60,000 0.02 1.15 8 170,000 100,000 70,000 0.02 1.17 9 180,000 100,000 80,000 0.02 1.20 10 190,000 100,000 90,000 0.02 1.22 11 200,000 100,000 100,000 0.02 1.24 12 210,000 100,000 110,000 0.02 1.27 13 220,000 100,000 120,000 0.02 1.29 14 230,000 100,000 130,000 0.02 1.32 15 240,000 100,000 140,000 0.02 1.35 16 250,000 100,000 150,000 0.02 1.37 17 260,000 100,000 160,000 0.02 1.40 18 270,000 100,000 170,000 0.02 1.43 19 280,000 100,000 180,000 0.02 1.46 20 290,000 100,000 190,000 0.02 1.49 Compare front-loading and backloading costs and changing discount rates 1,500,000 Page 51 Cost-benefit Benefit 1,000,000 Cost Cumulative Net Present Value (NPV) 500,000 Net Benefit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 -500,000 the year when the green line crosses over axis (where y=0) is the year when the cumu -1,000,000 impact shifts from a net cost to a net benef -1,500,000 Year Page 52 Cost-benefit (Bt Ct ) n preferences. NPV t t 0 (1 r) Bt benefits in year t Ct costs in year t t year NPV net present value (benefits adjusted for cost) r discount rate (e.g.,6% per year or 0.06) lsewhere. marginal productivity of capital' argument, the use of the word 'marginal' indicating that it is COSTS AND BENEFITS. [99] is a net good project. rations, since lenders want paybacks. e.g., 30 year loans. but it is easier to shift non-mone discounted value. alued (even though today's actions are required for those 50 years from now to enjoy them). ble resources, esp. with a high discount rate. p between high discount rates and environmental deterioration." [103] conomic growth for a nation? the interest rate? [104] Page 53 Cost-benefit ompensated by projects specifically designed to improve the environment." [106] ecially if costs and benefits patterns vary over time. (see graph). Net Benefit discounted for present value Cumulative Net Present Value (NPV) (B(t) - C(t)) / (1+r)t ∑ (B(t) - C(t)) / (1+r)t -1,000,000 -1,000,000 0 -1,000,000 9,612 -990,388 18,846 -971,542 27,715 -943,827 36,229 -907,597 44,399 -863,199 52,234 -810,965 59,744 -751,221 66,940 -684,280 73,831 -610,449 80,426 -530,023 86,734 -443,288 92,764 -350,525 98,524 -252,001 104,022 -147,979 109,267 -38,712 114,266 75,554 119,027 194,581 123,558 318,139 127,865 446,003 Page 54 Cost-benefit Net Present Value (NPV) 13 14 15 16 17 18 19 20 21 the year when the green line crosses over the x axis (where y=0) is the year when the cumulative impact shifts from a net cost to a net benefit. Page 55 Cost-benefit nal' indicating that it is the productivity of additional units of capital that is relevant. [99] sier to shift non-monetary costs to the future, since the lenders are around to complain! the m now to enjoy them). ie., they should not be discounted like capital. Page 56 Cost-benefit ent." [106] Page 57 Cost-benefit Page 58 Cost-benefit is relevant. [99] und to complain! they don't have a contractual agr Page 59 gini 0.386 RANGE: 0 (PERFECT EQUALITY; 1 PERFECT INEQUALITY) n 20 Income calculatedalculated Person "i" c calculated "i" X(i) x(i) CULULATIVE X(i) x(i)*i 1 1,000 0.003 0.003 0.00 2 3,000 0.009 0.013 0.02 3 4,000 0.013 0.025 0.04 4 5,000 0.016 0.041 0.06 5 6,000 0.019 0.059 0.09 6 8,000 0.025 0.084 0.15 7 8,000 0.025 0.109 0.18 8 9,000 0.028 0.138 0.23 9 11,000 0.034 0.172 0.31 10 12,000 0.038 0.209 0.38 11 14,000 0.044 0.253 0.48 12 17,000 0.053 0.306 0.64 13 19,000 0.059 0.366 0.77 14 21,000 0.066 0.431 0.92 15 23,000 0.072 0.503 1.08 16 27,000 0.084 0.588 1.35 17 29,000 0.091 0.678 1.54 GINI COEFFICENT 18 32,000 0.100 0.778 1.80 CUMULATIVE X 19 33,000 0.103 0.881 1.96 20 38,000 0.119 1.000 2.38 1.000 SUM 320000 1 14.36 LINE OF EQUALITY 0.900 mean 0.05 Insert income amounts for each of 0.800 the 20 people here -- be sure to arrange from LOW to HIGH 0.700 Do NOT enter data in any of the 0.600 other columns -- those are calculated. 0.500 Try entering both a fairly equal income distribution -- and then try a 0.400 broadly unequal one. 0.300 0.200 0.100 0.000 0 5 10 the LORENZ CURVE -- see how the curve deviates from the line of equality as the gini coefficient source of formula and text: U.S. Census Bureau. The Changing Shape of t he Nation’s Income Distribution, 1947- 1998, Curren tPopulationReport, By Arthur F. Jones Jr.and Daniel H. Weinberg, (Issued June 2000) http://www.census.gov/prod/2000pubs/p60-204.pdf MEASURES OF INEQUALITY/DISPARITY: how to calculate a Gini Coefficient COEFFICENT GINI COEFFICENT GINI COEFFICENT MULATIVE X CUMULATIVE X CUMULATIVE X 1.000 1.000 OF EQUALITY LINE OF EQUALITY LINE OF EQUALITY 0.900 0.900 0.800 0.800 0.700 0.700 0.600 0.600 0.500 0.500 0.400 0.400 0.300 0.300 0.200 0.200 0.100 0.100 0.000 0.000 1 3 5 7 9 1 3 5 7 9 11 13 15 17 19 11 15 20 CURVE -- see how viates from the line s the gini coefficient I COEFFICENT UMULATIVE X NE OF EQUALITY 13 15 17 19

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 7 |

posted: | 10/2/2012 |

language: | Latin |

pages: | 65 |

OTHER DOCS BY r16lFE

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.