Economics 105 Statistics by gabyion

VIEWS: 21 PAGES: 50

									      Economics 105: Statistics
• Any questions?
• Go over GH1
• Any student information sheets?
            Measures of Variation

                          Variation


Range     Interquartile   Variance      Standard        Coefficient
             Range                      Deviation       of Variation

• Measures of variation
  give information on
  the spread or
  variability of the data
  values.
                                         Same center,
                                      different variation
      Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
  data measured in different units


              S
         CV     100%
              X
               
            Comparing Coefficient
•   Stock A:
               of Variation
    – Average price last year = $50
    – Standard deviation = $5
           S            $5
     CVA     100% 
           X                 100%  10%
                       $50
                                               Both stocks
• Stock B:                                     have the same
                                               standard
    – Average price last year = $100           deviation, but
                                               stock B is less
    – Standard deviation = $5                  variable relative
                                               to its price
              S            $5
        CVB     100% 
              X                 100%  5%
                         $100
                   Z Scores
• A measure of RELATIVE distance from the mean
  (for example, a Z-score of 2.0 means that a value
  is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
  divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
  an outlier (BLK p. 96 says +/- 2 std dev is outlier)

                     XX
                  Z
                      S
                    Z Scores
Example:                                       (continued)

• If the mean is 14.0 and the standard deviation is 3.0,
  what is the Z score for the value 18.5?


     X  X 18.5  14.0
  Z                   1.5
       S       3.0
• The value 18.5 is 1.5 standard deviations above the
  mean
• (A negative Z-score would mean that a value is less
  than the mean)
        Shape of a Distribution
• Describes how data are distributed
• Measures of shape
  – Symmetric or skewed
Left-Skewed      Symmetric       Right-Skewed
Mean < Median    Mean = Median     Median < Mean
     Using Microsoft Excel
• Descriptive Statistics can be obtained
  from Microsoft® Excel
  – Use menu choice:
    tools / data analysis / descriptive statistics

  – Enter details in dialog box
Using Excel
         Use menu choice:
         tools / data analysis /
         descriptive statistics
                     Using Excel
                                   (continued)




• Enter dialog box
  details
• Check box for
  summary
  statistics

• Click OK
                        Excel output
      Microsoft Excel
descriptive statistics output,
using the house price data:
       House Prices:

           $2,000,000
              500,000
              300,000
              100,000
              100,000
                 Population Parameters vs.
                    Sample Statistics
• Population parameter                      Sample statistic
                                                        n

                                                   X
                     N

                   Xi                                              i

     X           i 1                     X      i 1

                         N                                  n
                                                    n

                                                   (X
                 N

                 (X         i    )   2
                                                                i        X )2
        2
                i 1                       sX 
                                             2     i 1
         X
                             N                              n 1

             Cov( X , Y )                         Cov( x, y )
                                          rxy 
                  X Y                             sx s y
  The Empirical Rule of Thumb

• If the data distribution is approximately
  bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
          the population or the sample

                    68%

                     μ
                  μ  1σ
  The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
         the population or the sample
• μ  3σ contains about 99.7% of the values
               in the population or the sample


        95%                      99.7%

      μ  2σ                     μ  3σ
      Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
  data measured in different units


              S
         CV     100%
              X
               
            Comparing Coefficient
•   Stock A:
               of Variation
    – Average price last year = $50
    – Standard deviation = $5
           S            $5
     CVA     100% 
           X                 100%  10%
                       $50
                                               Both stocks
• Stock B:                                     have the same
                                               standard
    – Average price last year = $100           deviation, but
                                               stock B is less
    – Standard deviation = $5                  variable relative
                                               to its price
              S            $5
        CVB     100% 
              X                 100%  5%
                         $100
                   Z Scores
• A measure of RELATIVE distance from the mean
  (for example, a Z-score of 2.0 means that a value
  is 2.0 standard deviations from the mean)
• The difference between a value and the mean,
  divided by the standard deviation
• A Z score above 3.0 or below -3.0 is considered
  an outlier (BLK p. 96 says +/- 2 std dev is outlier)

                     XX
                  Z
                      S
                    Z Scores
Example:                                       (continued)

• If the mean is 14.0 and the standard deviation is 3.0,
  what is the Z score for the value 18.5?


     X  X 18.5  14.0
  Z                   1.5
       S       3.0
• The value 18.5 is 1.5 standard deviations above the
  mean
• (A negative Z-score would mean that a value is less
  than the mean)
        Shape of a Distribution
• Describes how data are distributed
• Measures of shape
  – Symmetric or skewed
Left-Skewed      Symmetric       Right-Skewed
Mean < Median    Mean = Median     Median < Mean
     Using Microsoft Excel
• Descriptive Statistics can be obtained
  from Microsoft® Excel
  – Use menu choice:
    tools / data analysis / descriptive statistics

  – Enter details in dialog box
Using Excel
         Use menu choice:
         tools / data analysis /
         descriptive statistics
                     Using Excel
                                   (continued)




• Enter dialog box
  details
• Check box for
  summary
  statistics

• Click OK
                        Excel output
      Microsoft Excel
descriptive statistics output,
using the house price data:
       House Prices:

           $2,000,000
              500,000
              300,000
              100,000
              100,000
                 Population Parameters vs.
                    Sample Statistics
• Population parameter                      Sample statistic
                                                        n

                                                   X
                     N

                   Xi                                              i

     X           i 1                     X      i 1

                         N                                  n
                                                    n

                                                   (X
                 N

                 (X         i    )   2
                                                                i        X )2
        2
                i 1                       sX 
                                             2     i 1
         X
                             N                              n 1

             Cov( X , Y )                         Cov( x, y )
                                          rxy 
                  X Y                             sx s y
  The Empirical Rule of Thumb

• If the data distribution is approximately
  bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
          the population or the sample

                    68%

                     μ
                  μ  1σ
  The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
         the population or the sample
• μ  3σ contains about 99.7% of the values
               in the population or the sample


        95%                      99.7%

      μ  2σ                     μ  3σ
                 Population Parameters vs.
                    Sample Statistics
• Population parameter                      Sample statistic
                                                        n

                                                   X
                     N

                   Xi                                              i

     X           i 1                     X      i 1

                         N                                  n
                                                    n

                                                   (X
                 N

                 (X         i    )   2
                                                                i        X )2
        2
                i 1                       sX 
                                             2     i 1
         X
                             N                              n 1

             Cov( X , Y )                         Cov( x, y )
                                          rxy 
                  X Y                             sx s y
  The Empirical Rule of Thumb

• If the data distribution is approximately
  bell-shaped, then the interval:
• μ  1σ contains about 68% of the values in
          the population or the sample

                    68%

                     μ
                  μ  1σ
  The Empirical Rule of Thumb
• μ  2σ contains about 95% of the values in
         the population or the sample
• μ  3σ contains about 99.7% of the values
               in the population or the sample


        95%                      99.7%

      μ  2σ                     μ  3σ
        Chebyshev’s Inequality
• Only mean & variance given
• No distributional assumption!




                                       1
    P(  X  k X  X   X  k X )  2
                                      k
         Chebyshev’s Inequality
• For any population, given the mean and
variance, at least 100(1-1/k2)% of the population
members lie within k standard deviations of the
mean, k > 1.
                         ?

       ••    ••    • • • • • •• •     • ••
         μ - 2σ          μ            μ+ 2σ

                                          1
   P(  X  k X  X   X  k X )  1  2
                                         k
           Chebyshev example
• A particular brand of car has lifetimes of
100,000 miles with a standard deviation of 7,000
miles. Find a range in which it can be guaranteed
that 75% of the lifetimes of this brand of car lie.
                                 Covariance
• Cov( X , Y )  E[( X            X   )(Y   Y )]  E[ XY ]   X  Y

• Association
• Linear
• Units
    – magnitude depends on units, so use caution in
    assessing strength of association
• Sample covariance (BLK p. 107)
                                n
                                                            n       
                                   ( xi  x )( y i  y )   xi y i   nx y
        Cov( x, y )  s xy    i 1                        i 1    
                                         n 1                     n 1
                                                Correlation
          Cov( X , Y )
     
 •           X     Y

 • Association – not causality!
 • Linear
 • Units?
 • 1    1
 • Sample correlation coefficient                                                     1  r  1
                                 n
                                                                                     n       
                                 (x      i    x )( yi  y )                         xi yi   nx y
rxy 
      Cov( x, y)
                               i 1
                                                                                    i 1    
        sx s y            n                          n
                                                                               n 2          n 2
                          (x        i    x)   2
                                                    (y    i    y)   2
                                                                                xi   nx   yi   ny 2
                                                                                           2

                         i 1                       i 1                       i 1         i 1 
        Scatter Plots of Data with
     Various Correlation Coefficients
Y                Y                 Y




             X                 X             X
    r = -1           r = -.6           r=0
                                   Y
Y                Y




             X                 X             X
    r = +1           r = +.3           r=0
    Using Excel to Find
the Correlation Coefficient
             • Select
               Tools/Data Analysis
             • Choose Correlation
               from the selection menu
             • Click OK . . .
          Using Excel to Find
      the Correlation Coefficient
                                (continued)




• Input data range and select
  appropriate options
• Click OK to get output
               Sample Correlation “by hand”
Student        GPA     Study hours GPA*Study GPA^2    Study^2
           1     2.3             19     43.7     5.29       361
           2     3.4             26     88.4    11.56       676
           3    3.75             34    127.5 14.0625       1156
           4     3.8             25       95    14.44       625
           5     1.7             10       17     2.89       100
           6     2.4             20       48     5.76       400
           7     2.6             22     57.2     6.76       484
           8       3             30       90        9       900
           9     3.2             40      128    10.24      1600
          10     2.7             33     89.1     7.29      1089
          11     2.9             25     72.5     8.41       625
Sum            31.75            284    856.4 95.7025       8016
Mean            2.89          25.82    77.85     8.70    728.73
Covariance 3.667273
variance    0.406045    68.3636364                                 45
std dev     0.637217    8.26823055                                 40
correlation 0.696055    0.69605459                   Study hours   35
                                                                   30
                                                                   25
                                                                   20
                                                                   15
    Source: P:\Economics\Eco 105                                   10
    (Statistics)\cov_corr.xls                                       5
                                                                    0
                                                                        1.5   2   2.5         3   3.5   4
                                                                                        GPA
         Intro to Probability:
          Basic Definitions
• Random trials
   – multiple outcomes & uncertainty
• Basic outcome
• Sample space
• Event
• Examples: coin toss, die roll, dice roll, deck
of cards, etc.
• Deck of cards will be defined as 52 cards,
13 of each suit (♠♣♥♦), 2, 3, ..., 10, J, K, Q, A
Set Theory
      • Venn diagrams
      • Union
         AB



      • Intersection
         AB
Set Theory
      • A′ is the
      complement of A




      • A and B are
      mutually exclusive
      •AB=
Set Theory
      • set of events A1,
      A 2, A 3 … A N
      partitions the sample
      space
    Rules for Set Operations
•AB=BA                    Commutative
•AB=BA
•AA=A                      Idempotency
•AA=A
• A  A′ = S, A  A′ =    Complementation
• (A  B)′ = A′  B′
• (A  B)′ = A′  B′
    Rules for Set Operations
• Associative
•A  (B  C) = (A  B)  C
• A  (B  C) = (A  B)  C
• Distributivie
• A  (B  C) = (A  B)  (A  C)
• A  (B  C) = (A  B)  (A  C)
        Fundamental Postulates

• P(A) should satisfy certain postulates

•1: P(A) ≥ 0 [Impossible events cannot occur]
•2: P(S) = 1 [Some outcome must occur]
•3: If A1, A2, A3 … AN are N mutually exclusive
events then
                                           or
                Useful Results
• P(A′) = 1 – P(A)
• P() = 0
• If A  B, then P(A) ≤ P(B)
• 0 ≤ P(A) ≤ 1
• P(A  B) = P(A) + P(B) – P(A  B)
  – avoid double counting
• P(A  B) = 1 - P(A  B)′ = 1 - P(A′  B′)
                Useful Results
• If A1, A2, A3 … AN partition S, then
                   Example
• Suppose you apply to 3 schools: A, B, and C
• P(accepted @ A) = .20
• P(accepted @ B) = .40
• P(accepted @ C) = .60
• What is the probability of being rejected at all
3?
• What is the probability of being accepted
somewhere?
         Conditional Probability
• The conditional probability that A occurs given that B
is known to have occurred is




                               • Problem 2.5, p. 25
           Conditional Probability
• Probability a beginning golfer makes a good shot if she selects
the correct club is 1/3. The probability of a good shot with the
wrong club is 1/5. There are 4 clubs in her golf bag, one of which
is the correct club for the next shot. Club selection is random.


•What is the probability of a good shot?
•Given that she hit a good shot, what is the
probability that she chose the wrong club?

								
To top