Chapter 7. Statistical Estimation and Sampling by phf13063

VIEWS: 26 PAGES: 42

									 Chapter 7. Statistical Estimation
       and Sampling Distributions
        7.1   Point Estimates
        7.2   Properties of Point Estimates
        7.3   Sampling Distributions
        7.4   Constructing Parameter Estimates
        7.5   Supplementary Problems


NIPRL
   7.1 Point Estimates

   7.1.1 Parameters

   • Parameters
      – In statistical inference, the term parameter is used to
        denote a quantity  , say, that is a property of an
        unknown probability distribution.

        – For example, the mean, variance, or a particular quantile
          of the probability distribution

        – Parameters are unknown, and one of the goals of
          statistical inference is to estimate them.


NIPRL
        Figure 7.1 The relationship between a point
          estimate and an unknown parameter θ
NIPRL
        Figure 7.2 Estimation of the population mean
                    by the sample mean
NIPRL
   7.1.2 Statistics
    • Statistics
       – In statistical inference, the term statistics is used to
         denote a quantity that is a property of a sample.

        – Statistics are functions of a random sample. For example,
          the sample mean, sample variance, or a particular
          sample quantile.

        – Statistics are random variables whose observed values
          can be calculated from a set of observed data.


          Examples :
                             X1  X 2           Xn
          sample mean X 
                                      n
                                   i1
                                     n
                                        ( X i  X )2
          sample variance S 2 
                                         n 1
NIPRL
   7.1.3 Estimation

   • Estimation
      – A procedure of “guessing” properties of the population
        from which data are collected.
      – A point estimate of an unknown parameter  is
                    ˆ
       a statistic  that represents a “guess” at
       the value of  .

   • Example 1 (Machine breakdowns)
        – How to estimate
          P(machine breakdown due to operator misuse) ?

   • Example 2 (Rolling mill scrap)
        –    How to estimate the mean and variance of the probability
            distribution of % scrap ( D% scrap ) ?


NIPRL
   7.2 Properties of Point Estimates

   7.2.1. Unbiased Estimates (1/5)

   •     Definitions
       - A point estimate   ˆ for a parameter    is said to be
         unbiased if
                            E (ˆ)  
       - If a point estimate is not unbiased, then its bias is defined
         to be
                   bias  E (ˆ)  




NIPRL
   7.2.1. Unbiased Estimates (2/5)


   •    Point estimate of a success probability

                                   X
        If X     B(n, p), then p 
                               ˆ
                                   n
        -
            X    B (n, p )  E ( X )  np,
                               X      1      1
            hence E ( p )  E ( )  E ( X )  np  p
                       ˆ
                               n      n      n
            it means that p is anunbiased estimate
                            ˆ



NIPRL
   7.2.1. Unbiased Estimates(3/5)


   •    Point estimate of a population mean

        If X 1 , X 2 ,   , X n is a sample observations from a prob. dist.
        with a mean  , then the sample mean   X is an unbiased
                                             ˆ
        point estimate of the population mean .
        -
            since E ( X i )   , 1  i  n ,
                                    n
                               1             1 n           1
            E (  )  E ( X )  E (  X i )   E ( X i )  n  
                ˆ
                               n i 1        n i 1        n



NIPRL
   7.2.1. Unbiased Estimates(4/5)


   •    Point estimate of a population variance


        If X 1 , X 2 ,      , X n is a sample observations from a prob. dist.
        with a variance  2 , then the sample variance

                          i 1
                            n
                                ( X i  X )2
         2  S2 
         ˆ
                       n 1
        is an unbiased point estimate of the population variance  2




NIPRL
   7.2.1. Unbiased Estimates (5/5)

     i1 ( X i  X )2   i1 (( X i   )  ( X   ))2
          n                   n



                         i 1 ( X i   ) 2  2( X   ) i 1 ( X i   )  n( X   ) 2
                              n                                     n



                         i 1 ( X i   ) 2  n( X   ) 2
                              n



                    note E ( X i )   , E  ( X i   ) 2   Var ( X i )   2
                                                                               2
                            E( X )  , E ( X   )    2
                                                              Var ( X )      n

        E (S ) 
              2   1
                 n 1
                        
                      E  i 1 ( X i  X ) 
                             n            2
                                               1
                                               n 1
                                                    E             i1 ( X i   ) 2  n( X   ) 2
                                                                        n
                                                                                                       
                   1  n 2              2 
                       i 1  n     
                                                  2

                 n 1                 n 

NIPRL
   7.2.2. Minimum Variance Estimates (1/4)


   • Which is the better of two
     unbiased point estimates?
                                      Probability density
                                      function of ˆ
                                                    2
           Probability density
           function of ˆ
                        1




                                                x
                                  
NIPRL
   7.2.2. Minimum Variance Estimates (2/4)
              ˆ            ˆ       ˆ                                    ˆ
 SinceVar (1 Var ( 2 ), 2 is a better point estimate than1.
 This can be written
      ˆ                    ˆ
 P(| 1 | )P(|  2  | )for any value of   0.
                                         Probability density
                                         function of ˆ
                                                      2

                        Probability density
                        function of ˆ
                                     1




                                                         
NIPRL
     7.2.2. Minimum Variance Estimates (3/4)
 •   An unbiased point estimate whose variance is smaller than any other
     unbiased point estimate: minimum variance unbised estimate (MVUE)

 •   Relative efficiency

                                                                 ˆ
          The relative efficiency of an unbiased point estimate 1
                                                     ˆ
                                               Var ( 2 )
                                         ˆ
          to an unbiased point estimate  2 is
                                                     ˆ
                                               Var ( )
                                                    1
 •   Mean squared error (MSE)

                                        

        –
                                
                          MSE (? )  E (   ) 2
           How is it decomposed
        – Why is it useful ?




NIPRL
   7.2.2. Minimum Variance Estimates (4/4)


                                                                 Probability density
        Probability density                                      function of ˆ
                                                                              2

        function of ˆ
                     1




                                        ˆ1   ˆ2
                                                            x


                              Bias of ˆ1      Bias of ˆ2
NIPRL
   Example: two independent measurements
        XA    N (C , 2.97) \ and \ X B     N (C ,1.62)
   Point estimates of the unknown C
        C A  X A \ and \ C B  X B
   They are both unbiased estimates since
        E[C A ]  C \ and \ E[C B ]  C.
   The relative efficiency of C A to C B is
        Var (C B ) / Var (C A )  1.62/ 2.97  0.55.
   Let us consider a new estimate
        C  pC A  (1  p)C B .
   Then, this estimate is unbiased since
        E[C]  pE[C A ]  (1  p)E[C B ]  C.
   What is the optimal value of p that results in C having
   the smallest possible mean square error (MSE)?



NIPRL
   Let the variance of C be given by
        Var (C )  p 2Var (C A )  (1  p) 2Var (C B )
         p 2 12  (1  p)2  2 2 .
   Differentiating with respect to p yields that
       d
          Var (C )  2 p 12  2(1  p) 2 2 .
       dp
   The value of p that minimizes Var (C) :
       d                                 1/ 12
          Var (C ) | p  p  0  p 
       dp                            1/ 12  1/  2
                                                   2

   Therefore, in this example,
                1/ 2.97
        p                   0.35.
           1/ 2.97  1/1.62
   The variance of
                           1
        Var (C )                    1.05.
                   1/ 2.97  1/1.62

NIPRL
   The relative efficiency of C B to C is
        Var (C )    1.05
                         0.65.
        Var (C B ) 1.62
   In general, assuming that we have n independent and unbiased
   estimates  i , i  1, , n having variance  i , i  1, , n
                                                 2

   respectively for a parameter  , we can set the unbiased
                         n

                        
   estimator as                 2
                                i   /i
                      i 1
                           n
                                           .
                         1/  i2
                         i 1
   The variance of this estimator is
                                           1
                   Var ( )         n
                                                .
                                     1/  i2
                                    i 1




NIPRL
   Mean square error (MSE):
   Let us consider a point estimate  .
   Then, the mean square error is defined by
        MSE( )  E[(  )2 ].
   Moreover, notice that
        MSE ( )  E[(   ) 2 ]
         E[(  E[ ]  E[ ]   ) 2 ]
         E[(  E[ ]) 2  2(  E[ ])( E[ ]   )  ( E[ ]   ) 2
         E[(  E[ ]) 2 ]  ( E[ ]   ) 2
         Var ( )  bias 2




NIPRL
   7.3 Sampling Distribution

   7.3.1 Sample Proportion (1/2)
   •                                                          X
        If X   B(n, p ), then the sample propotion p 
                                                   ˆ             has
                                                               n
                                                   p (1  p ) 
                                     ˆ
        the approximate distribution p       N  p,            
                                                       n      

           E ( p )  p,
               ˆ
                                                          X    p (1  p )
            Var ( X )  np (1  p )  Var ( p )  Var (
                                            ˆ               )
                                                          n        n



NIPRL
   7.3.1 Sample Proportion (2/2)

   • Standard error of the sample mean

        Thestandard error of the sample mean is defined as
                     p (1  p)
        s.e.( p ) 
              ˆ                , but since p is usually unknown,
                         n
         wereplace p by the observed value p  x / nto have
                                                 ˆ
         s.e.( p )  p (1  p)  1 x(n  x)
               ˆ     ˆ      ˆ
                         n       n      n




NIPRL
   7.3.2 Sample Mean (1/3)


   • Distribution of Sample Mean


        If X 1 ,   , X n are observation from a population with mean 
        and variance  2 , then the Central Limit Theorem says,
                            2
        X
        ˆ          N ( ,        )
                            n




NIPRL
   7.3.2 Sample Mean (2/3)

   • Standard error of the sample mean


    The standard error of the sample mean is defined as
                      
        s.e.( X )        , but since  is usually unknown,
                 n
    wemaysafelyreplace  ,whennislarge,
    by the observed value s,
                s
    s.e.( X ) 
                 n


NIPRL
   7.3.2 Sample Mean (3/3)

        If n  20, prob. that   X lies within  / 4 of 
                              ˆ
                                        2      
        P     X      P     N ( , )    
              4         4         4        20      4
             20             20 
         P
                 N (0,1)        (1.12)   (1.12)  0.7372
             4              4 

               74%                         Probability density function
                                           of   X when n  20
                                              ˆ




                   / 4        / 4

NIPRL
   7.3.3 Sample Variance (1/2)


   • Distribution of Sample Variance

        If X 1 ,        , X n normally distributed with mean  and
        variance  2 , then the sample variance S 2 has the distribution
                          n1
                           2
        S   2
                   2

                        (n  1)




NIPRL
   Theorem: if X i , i  1, , n is a sample from a normal
      population having mean  and variance  , then X \ and \ S 2
                                                     2

      are independent random variables, with X being normal
      with mean  and variance  2 / n and (n  1) S 2 /  2 being
      chi-square with n-1 degrees of freedom.
   (proof)                      n               n

                                                    
                                                       2
    Let Yi  X i  . Then,        (Yi  Y )  Yi  nY
                                            2      2

                              i 1            i 1
    or equivalently,
            n                  n

           ( X i  X ) 2   ( X i   ) 2  n( X   ) 2 .
           i 1               i 1
        Dividing this equation by  , we get
                                            2

            n                         n

           (( X i   ) /  ) 2   (( X i  X ) /  ) 2  ( n ( X   ) /  ) 2 .
           i 1                      i 1
        Cf. Let X and Y be independent chi-square random variables
         with m and n degrees of freedom respectively. Then,
         Z=X+Y is a chi-square random variable with m+n degrees of
         freedom.
NIPRL
   In the previous equation,
         n

         ( n ( X   ) /  )2
        i 1
                                      12 \ and
         n

         (( X i   ) /  ) 2
        i 1
                                  n .
                                   2




   Therefore,
         n

         (( X i  X ) /  ) 2  ( n  1) S 2 /  2
        i 1
                                                       n 1.
                                                        2




NIPRL
   7.3.3 Sample Variance (2/2)


   • t-statistics
               2       n
        X   N  ,        (X  )                        N (0,1)
                  n    
                   S         n 1
                              2
        And also                     so that
                          (n  1)

                       n
                    ( X  )
        n( X  )                        N (0, 1)
                                                              tn 1
           S        S                            2
                                                  n 1
                                            (n  1)

NIPRL
   7.4 Constructing Parameter Estimates

   7.4.1 The Method of Moments (1/3)

   • Method of moments point estimate for One
     Parameter

        If a data set of observations x1 ,   , xn from a probabilty
        distribution that depends upon one unknown parameter  ,

                                               ˆ
        the method of moments point estimate  of the parameter
        is found by solving the equation x  E ( X )


NIPRL
   7.4.1 The Method of Moments (2/3)


   • Method of moments point estimates for Two Parameters


        For unknown two parameters 1 and  2 ,


        The method of moments point estimates are found by solving
        the equations x  E ( X ) and s 2  Var ( X )




NIPRL
   7.4.1 The Method of Moments (3/3)

   • Examples
     - For example of normally distributed data, since E ( X )  

           and Var ( X )   2 for N (  ,  2), the method of moments give
           x   and s 2   2
               ˆ

        - Suppose that the data observations 2.0 2.4 3.1 3.9 4.5
          4.8 5.7 9.9 are obtained from a U (0,  ).
                                           ˆ
          x  4.5375 and E ( X )           2  4.5375  9.075
                                     2

        – What if the distribution is exponential with the parameter    ?

NIPRL
   7.4.2 Maximum Likelihood Estimates (1/4)


   • Maximum Likelihood Estimate for One Parameter

        If a data set consists of observations x1 ,              , xn from a probability
        distrbution f ( x,  ) depending upon one unknown parameter  ,

                                           ˆ
        The maximum likelihood estimate  of the parameter is found by
        maximizing the likelihood function
        L( x1 ,   , xn ,  )  f ( x1 ,  )     f ( xn ,  )




NIPRL
   7.4.2 Maximum Likelihood Estimates (2/4)
    • Example
        - If x1 ,       , xn are a set of Bernoulli observations,
           with f (1, p )  p and f (0, p)  1  p, i.e. f ( xi , p)  p xi (1  p )1 xi

        - The likelihood function is
                                     n
           L( p; x1 ,     , xn )   p xi (1  p )1 xi  p x (1  p ) n  x ,
                                    i 1

           where x  x1            xn , 
                         ˆ
           and the m.l.e p is the value that maximizes this.

        - The log-likelihood is ln( L)  x ln( p)  ( n  x) ln(1  p)


                         d ln( L) x n  x         x
           and                   0  p
                                               ˆ
                            dp    p 1 p          n
NIPRL
   7.4.2 Maximum Likelihood Estimates (3/4)


   •    Maximum Likelihood Estimate for Two Parameters


        For two unknown parameters 1 and  2 ,
                                          ˆ
        the maximum likelihood estimates  and ˆ
                                                           1          2

        are found bymaximizing the likelihood function
        L(1 ,  2 ;x1 ,   , xn )  f ( x1; 1 , 2 )         f ( xn ; 1 , 2 )




NIPRL
   7.4.2 Maximum Likelihood Estimates (4/4)
   • Example
                                                     1
        The normal dist. f ( x,  ,  )     2
                                                        e  ( x   ) 2 / 2 2

                                                    2
                                                    n
        likelihood L( x1 ,       xn ,  ,  )   f ( xi ,  ,  2 )
                                             2

                                                   i 1
                                    n/2
                        1                    n                     
                          2 
                                          exp  ( xi   ) 2 / 2 2 
                        2                  i 1                  
                                                      i 1 ( xi   )2
                                                                             n
                                         n
        so that log-likelihood ln( L)   ln(2 ) 
                                                2

                                         2                 2 2
                     i 1 ( xi   )                         i 1 ( xi   )2
                       n                                                         n
        d ln( L)                         d ln( L)       n
                                  ,                2 
           d            2
                                         d ( )2
                                                       2          2 4
        setting two equations to zeros,
                             ( xi   )2  i 1 ( xi  x )2
                            n                    n
          x,
         ˆ                           ˆ
                    2  i 1
                    ˆ                     
                                n                   n
NIPRL
   7.4.3 Examples (1/6)


   • Glass Sheet Flaws

  At a glass manufacturing company, 30 randomly selected sheets
  of glass are inspected. If the dist. of the number of flaws per
  sheet is taken to have a Poisson dist. How should thepara.  be estimated?



        - The method of moment
                        ˆ
           E( X )      x



NIPRL
   7.4.3 Examples (2/6)


   • The maximum likelihood estimate:
                       e    xi
        f ( xi ,  )             , so that the likelihood is
                          xi !
                               n
                                                     e  n  ( x1   xn )
        L( x1 ,   , xn ,  )       f ( xi ,  ) 
                              i 1                  ( x1 !   xn !)
        The log-likelihood is therefore
        ln( L)  n  ( x1   xn ) ln( )  ln( x1 !                 xn !)
                d ln( L)        ( x1   xn )      ˆ
        so that           n                0    x
                  d                  


NIPRL
   7.4.3 Examples (3/6)

   • Example 26: Fish Tagging and Recapture

        Suppose fisherman wants to estimate the fish stock N of a lake
        and that 34 fish have been tagged and released back into the lake.
        If, over a period time, the fisherman catches 50 fish and 9 of them
        are tagged, then an intuitive estimate of total number of fish is
         ˆ 34  50  189
        N
                9
        this assumes that the proportion of tagged fish roughly equal to
        the proportion of the fisherman's catch that is tagged.



NIPRL
   7.4.3 Examples (4/6)

   Under the assumption that all the fish are equally likely to be caught,
   the dist. of the number of tagged fish X in the fisherman's catch of
   50 fish is a hypergeometric dist. with r  34, n  50 and N unknown.
                                           r  N  r 
                                                    
   hypergeometric P( X  x | N , n, r )   x n  x  ,   E( X ) 
                                                                     nr 50  34
                                                                       
                                               N                   N    N
                                                
                                               n
                                                                   nr
   Since there is only one obs. x and so x  x, equating E ( X )     x
                                                                   N
   Nˆ  50  34 188.89
           9


NIPRL
   7.4.3 Examples (5/6)
                                                                          r
           Under binomial approximation, in this case success prob. p 
                                                                          N
           is estimated to be p 
                              ˆ
                                    x 9        ˆ r 50  34
                                      , Hence N  
                                    n 50          ˆ
                                                  p   9


   • Example 36: Bee Colonies
        The data on the proportion of workers bees that leave a colony with
        a queen bee.
        0.28 0.32 0.09 0.35 0.45 0.41 0.06 0.16 0.16 0.46 0.35 0.29 0.31.
        If the entomologist wishes to model this proportion with a beta dist.,
        How should the parameters be estimated?


NIPRL
   7.4.3 Examples (6/6)
                            (a  b) a 1
           beta f ( x | a, b)          x (1  x)b 1 , 0  x  1, a  0, b  0
                            (a)(b)
                     a                         ab
           E( X )      , Var ( X ) 
                    ab               (a  b) 2 (a  b  1)

        x  0.3007, s 2  0.01966
                               ˆ     ˆ
        So the point estimates a and b are solution to the equations
         a                       ab
             0.3007 and                      0.01966
        ab              (a  b) (a  b  1)
                                2




                               ˆ
        which are a  2.92 and b  6.78
                  ˆ

NIPRL
                   MLE for U (0,  )


   •    For some distribution, the MLE may not be found by differentiation.
        You have to look at the curve of the likelihood function itself.

   •    The MLE of   max { X 1 ,    , X n}




NIPRL

								
To top