VIEWS: 21 PAGES: 50 POSTED ON: 2/4/2010 Public Domain
Economics 105: Statistics • Any questions? • Go over GH1 • Any student information sheets? Measures of Variation Variation Range Interquartile Variance Standard Coefficient Range Deviation of Variation • Measures of variation give information on the spread or variability of the data values. Same center, different variation Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Can be used to compare two or more sets of data measured in different units S CV 100% X Comparing Coefficient • Stock A: of Variation – Average price last year = $50 – Standard deviation = $5 S $5 CVA 100% X 100% 10% $50 Both stocks • Stock B: have the same standard – Average price last year = $100 deviation, but stock B is less – Standard deviation = $5 variable relative to its price S $5 CVB 100% X 100% 5% $100 Z Scores • A measure of RELATIVE distance from the mean (for example, a Z-score of 2.0 means that a value is 2.0 standard deviations from the mean) • The difference between a value and the mean, divided by the standard deviation • A Z score above 3.0 or below -3.0 is considered an outlier (BLK p. 96 says +/- 2 std dev is outlier) XX Z S Z Scores Example: (continued) • If the mean is 14.0 and the standard deviation is 3.0, what is the Z score for the value 18.5? X X 18.5 14.0 Z 1.5 S 3.0 • The value 18.5 is 1.5 standard deviations above the mean • (A negative Z-score would mean that a value is less than the mean) Shape of a Distribution • Describes how data are distributed • Measures of shape – Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean Using Microsoft Excel • Descriptive Statistics can be obtained from Microsoft® Excel – Use menu choice: tools / data analysis / descriptive statistics – Enter details in dialog box Using Excel Use menu choice: tools / data analysis / descriptive statistics Using Excel (continued) • Enter dialog box details • Check box for summary statistics • Click OK Excel output Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000,000 500,000 300,000 100,000 100,000 Population Parameters vs. Sample Statistics • Population parameter Sample statistic n X N Xi i X i 1 X i 1 N n n (X N (X i ) 2 i X )2 2 i 1 sX 2 i 1 X N n 1 Cov( X , Y ) Cov( x, y ) rxy X Y sx s y The Empirical Rule of Thumb • If the data distribution is approximately bell-shaped, then the interval: • μ 1σ contains about 68% of the values in the population or the sample 68% μ μ 1σ The Empirical Rule of Thumb • μ 2σ contains about 95% of the values in the population or the sample • μ 3σ contains about 99.7% of the values in the population or the sample 95% 99.7% μ 2σ μ 3σ Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Can be used to compare two or more sets of data measured in different units S CV 100% X Comparing Coefficient • Stock A: of Variation – Average price last year = $50 – Standard deviation = $5 S $5 CVA 100% X 100% 10% $50 Both stocks • Stock B: have the same standard – Average price last year = $100 deviation, but stock B is less – Standard deviation = $5 variable relative to its price S $5 CVB 100% X 100% 5% $100 Z Scores • A measure of RELATIVE distance from the mean (for example, a Z-score of 2.0 means that a value is 2.0 standard deviations from the mean) • The difference between a value and the mean, divided by the standard deviation • A Z score above 3.0 or below -3.0 is considered an outlier (BLK p. 96 says +/- 2 std dev is outlier) XX Z S Z Scores Example: (continued) • If the mean is 14.0 and the standard deviation is 3.0, what is the Z score for the value 18.5? X X 18.5 14.0 Z 1.5 S 3.0 • The value 18.5 is 1.5 standard deviations above the mean • (A negative Z-score would mean that a value is less than the mean) Shape of a Distribution • Describes how data are distributed • Measures of shape – Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean Using Microsoft Excel • Descriptive Statistics can be obtained from Microsoft® Excel – Use menu choice: tools / data analysis / descriptive statistics – Enter details in dialog box Using Excel Use menu choice: tools / data analysis / descriptive statistics Using Excel (continued) • Enter dialog box details • Check box for summary statistics • Click OK Excel output Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000,000 500,000 300,000 100,000 100,000 Population Parameters vs. Sample Statistics • Population parameter Sample statistic n X N Xi i X i 1 X i 1 N n n (X N (X i ) 2 i X )2 2 i 1 sX 2 i 1 X N n 1 Cov( X , Y ) Cov( x, y ) rxy X Y sx s y The Empirical Rule of Thumb • If the data distribution is approximately bell-shaped, then the interval: • μ 1σ contains about 68% of the values in the population or the sample 68% μ μ 1σ The Empirical Rule of Thumb • μ 2σ contains about 95% of the values in the population or the sample • μ 3σ contains about 99.7% of the values in the population or the sample 95% 99.7% μ 2σ μ 3σ Population Parameters vs. Sample Statistics • Population parameter Sample statistic n X N Xi i X i 1 X i 1 N n n (X N (X i ) 2 i X )2 2 i 1 sX 2 i 1 X N n 1 Cov( X , Y ) Cov( x, y ) rxy X Y sx s y The Empirical Rule of Thumb • If the data distribution is approximately bell-shaped, then the interval: • μ 1σ contains about 68% of the values in the population or the sample 68% μ μ 1σ The Empirical Rule of Thumb • μ 2σ contains about 95% of the values in the population or the sample • μ 3σ contains about 99.7% of the values in the population or the sample 95% 99.7% μ 2σ μ 3σ Chebyshev’s Inequality • Only mean & variance given • No distributional assumption! 1 P( X k X X X k X ) 2 k Chebyshev’s Inequality • For any population, given the mean and variance, at least 100(1-1/k2)% of the population members lie within k standard deviations of the mean, k > 1. ? •• •• • • • • • •• • • •• μ - 2σ μ μ+ 2σ 1 P( X k X X X k X ) 1 2 k Chebyshev example • A particular brand of car has lifetimes of 100,000 miles with a standard deviation of 7,000 miles. Find a range in which it can be guaranteed that 75% of the lifetimes of this brand of car lie. Covariance • Cov( X , Y ) E[( X X )(Y Y )] E[ XY ] X Y • Association • Linear • Units – magnitude depends on units, so use caution in assessing strength of association • Sample covariance (BLK p. 107) n n ( xi x )( y i y ) xi y i nx y Cov( x, y ) s xy i 1 i 1 n 1 n 1 Correlation Cov( X , Y ) • X Y • Association – not causality! • Linear • Units? • 1 1 • Sample correlation coefficient 1 r 1 n n (x i x )( yi y ) xi yi nx y rxy Cov( x, y) i 1 i 1 sx s y n n n 2 n 2 (x i x) 2 (y i y) 2 xi nx yi ny 2 2 i 1 i 1 i 1 i 1 Scatter Plots of Data with Various Correlation Coefficients Y Y Y X X X r = -1 r = -.6 r=0 Y Y Y X X X r = +1 r = +.3 r=0 Using Excel to Find the Correlation Coefficient • Select Tools/Data Analysis • Choose Correlation from the selection menu • Click OK . . . Using Excel to Find the Correlation Coefficient (continued) • Input data range and select appropriate options • Click OK to get output Sample Correlation “by hand” Student GPA Study hours GPA*Study GPA^2 Study^2 1 2.3 19 43.7 5.29 361 2 3.4 26 88.4 11.56 676 3 3.75 34 127.5 14.0625 1156 4 3.8 25 95 14.44 625 5 1.7 10 17 2.89 100 6 2.4 20 48 5.76 400 7 2.6 22 57.2 6.76 484 8 3 30 90 9 900 9 3.2 40 128 10.24 1600 10 2.7 33 89.1 7.29 1089 11 2.9 25 72.5 8.41 625 Sum 31.75 284 856.4 95.7025 8016 Mean 2.89 25.82 77.85 8.70 728.73 Covariance 3.667273 variance 0.406045 68.3636364 45 std dev 0.637217 8.26823055 40 correlation 0.696055 0.69605459 Study hours 35 30 25 20 15 Source: P:\Economics\Eco 105 10 (Statistics)\cov_corr.xls 5 0 1.5 2 2.5 3 3.5 4 GPA Intro to Probability: Basic Definitions • Random trials – multiple outcomes & uncertainty • Basic outcome • Sample space • Event • Examples: coin toss, die roll, dice roll, deck of cards, etc. • Deck of cards will be defined as 52 cards, 13 of each suit (♠♣♥♦), 2, 3, ..., 10, J, K, Q, A Set Theory • Venn diagrams • Union AB • Intersection AB Set Theory • A′ is the complement of A • A and B are mutually exclusive •AB= Set Theory • set of events A1, A 2, A 3 … A N partitions the sample space Rules for Set Operations •AB=BA Commutative •AB=BA •AA=A Idempotency •AA=A • A A′ = S, A A′ = Complementation • (A B)′ = A′ B′ • (A B)′ = A′ B′ Rules for Set Operations • Associative •A (B C) = (A B) C • A (B C) = (A B) C • Distributivie • A (B C) = (A B) (A C) • A (B C) = (A B) (A C) Fundamental Postulates • P(A) should satisfy certain postulates •1: P(A) ≥ 0 [Impossible events cannot occur] •2: P(S) = 1 [Some outcome must occur] •3: If A1, A2, A3 … AN are N mutually exclusive events then or Useful Results • P(A′) = 1 – P(A) • P() = 0 • If A B, then P(A) ≤ P(B) • 0 ≤ P(A) ≤ 1 • P(A B) = P(A) + P(B) – P(A B) – avoid double counting • P(A B) = 1 - P(A B)′ = 1 - P(A′ B′) Useful Results • If A1, A2, A3 … AN partition S, then Example • Suppose you apply to 3 schools: A, B, and C • P(accepted @ A) = .20 • P(accepted @ B) = .40 • P(accepted @ C) = .60 • What is the probability of being rejected at all 3? • What is the probability of being accepted somewhere? Conditional Probability • The conditional probability that A occurs given that B is known to have occurred is • Problem 2.5, p. 25 Conditional Probability • Probability a beginning golfer makes a good shot if she selects the correct club is 1/3. The probability of a good shot with the wrong club is 1/5. There are 4 clubs in her golf bag, one of which is the correct club for the next shot. Club selection is random. •What is the probability of a good shot? •Given that she hit a good shot, what is the probability that she chose the wrong club?