# Chapter 7. Statistical Estimation and Sampling by phf13063

VIEWS: 26 PAGES: 42

• pg 1
Chapter 7. Statistical Estimation
and Sampling Distributions
7.1   Point Estimates
7.2   Properties of Point Estimates
7.3   Sampling Distributions
7.4   Constructing Parameter Estimates
7.5   Supplementary Problems

NIPRL
7.1 Point Estimates

7.1.1 Parameters

• Parameters
– In statistical inference, the term parameter is used to
denote a quantity  , say, that is a property of an
unknown probability distribution.

– For example, the mean, variance, or a particular quantile
of the probability distribution

– Parameters are unknown, and one of the goals of
statistical inference is to estimate them.

NIPRL
Figure 7.1 The relationship between a point
estimate and an unknown parameter θ
NIPRL
Figure 7.2 Estimation of the population mean
by the sample mean
NIPRL
7.1.2 Statistics
• Statistics
– In statistical inference, the term statistics is used to
denote a quantity that is a property of a sample.

– Statistics are functions of a random sample. For example,
the sample mean, sample variance, or a particular
sample quantile.

– Statistics are random variables whose observed values
can be calculated from a set of observed data.

Examples :
X1  X 2           Xn
sample mean X 
n
 i1
n
( X i  X )2
sample variance S 2 
n 1
NIPRL
7.1.3 Estimation

• Estimation
– A procedure of “guessing” properties of the population
from which data are collected.
– A point estimate of an unknown parameter  is
ˆ
a statistic  that represents a “guess” at
the value of  .

• Example 1 (Machine breakdowns)
– How to estimate
P(machine breakdown due to operator misuse) ?

• Example 2 (Rolling mill scrap)
–    How to estimate the mean and variance of the probability
distribution of % scrap ( D% scrap ) ?

NIPRL
7.2 Properties of Point Estimates

7.2.1. Unbiased Estimates (1/5)

•     Definitions
- A point estimate   ˆ for a parameter    is said to be
unbiased if
E (ˆ)  
- If a point estimate is not unbiased, then its bias is defined
to be
bias  E (ˆ)  

NIPRL
7.2.1. Unbiased Estimates (2/5)

•    Point estimate of a success probability

X
If X     B(n, p), then p 
ˆ
n
-
X    B (n, p )  E ( X )  np,
X      1      1
hence E ( p )  E ( )  E ( X )  np  p
ˆ
n      n      n
it means that p is anunbiased estimate
ˆ

NIPRL
7.2.1. Unbiased Estimates(3/5)

•    Point estimate of a population mean

If X 1 , X 2 ,   , X n is a sample observations from a prob. dist.
with a mean  , then the sample mean   X is an unbiased
ˆ
point estimate of the population mean .
-
since E ( X i )   , 1  i  n ,
n
1             1 n           1
E (  )  E ( X )  E (  X i )   E ( X i )  n  
ˆ
n i 1        n i 1        n

NIPRL
7.2.1. Unbiased Estimates(4/5)

•    Point estimate of a population variance

If X 1 , X 2 ,      , X n is a sample observations from a prob. dist.
with a variance  2 , then the sample variance

 i 1
n
( X i  X )2
 2  S2 
ˆ
n 1
is an unbiased point estimate of the population variance  2

NIPRL
7.2.1. Unbiased Estimates (5/5)

 i1 ( X i  X )2   i1 (( X i   )  ( X   ))2
n                   n

  i 1 ( X i   ) 2  2( X   ) i 1 ( X i   )  n( X   ) 2
n                                     n

  i 1 ( X i   ) 2  n( X   ) 2
n

note E ( X i )   , E  ( X i   ) 2   Var ( X i )   2
2
E( X )  , E ( X   )    2
  Var ( X )      n

E (S ) 
2   1
n 1

E  i 1 ( X i  X ) 
n            2
 1
n 1
E             i1 ( X i   ) 2  n( X   ) 2
n

1  n 2              2 
        i 1  n     
2

n 1                 n 

NIPRL
7.2.2. Minimum Variance Estimates (1/4)

• Which is the better of two
unbiased point estimates?
Probability density
function of ˆ
2
Probability density
function of ˆ
1

x

NIPRL
7.2.2. Minimum Variance Estimates (2/4)
ˆ            ˆ       ˆ                                    ˆ
SinceVar (1 Var ( 2 ), 2 is a better point estimate than1.
This can be written
ˆ                    ˆ
P(| 1 | )P(|  2  | )for any value of   0.
Probability density
function of ˆ
2

Probability density
function of ˆ
1

                                              
NIPRL
7.2.2. Minimum Variance Estimates (3/4)
•   An unbiased point estimate whose variance is smaller than any other
unbiased point estimate: minimum variance unbised estimate (MVUE)

•   Relative efficiency

ˆ
The relative efficiency of an unbiased point estimate 1
ˆ
Var ( 2 )
ˆ
to an unbiased point estimate  2 is
ˆ
Var ( )
1
•   Mean squared error (MSE)

        

–

MSE (? )  E (   ) 2
How is it decomposed
– Why is it useful ?

NIPRL
7.2.2. Minimum Variance Estimates (4/4)

Probability density
Probability density                                      function of ˆ
2

function of ˆ
1

ˆ1   ˆ2
                          x

Bias of ˆ1      Bias of ˆ2
NIPRL
Example: two independent measurements
XA    N (C , 2.97) \ and \ X B     N (C ,1.62)
Point estimates of the unknown C
C A  X A \ and \ C B  X B
They are both unbiased estimates since
E[C A ]  C \ and \ E[C B ]  C.
The relative efficiency of C A to C B is
Var (C B ) / Var (C A )  1.62/ 2.97  0.55.
Let us consider a new estimate
C  pC A  (1  p)C B .
Then, this estimate is unbiased since
E[C]  pE[C A ]  (1  p)E[C B ]  C.
What is the optimal value of p that results in C having
the smallest possible mean square error (MSE)?

NIPRL
Let the variance of C be given by
Var (C )  p 2Var (C A )  (1  p) 2Var (C B )
 p 2 12  (1  p)2  2 2 .
Differentiating with respect to p yields that
d
Var (C )  2 p 12  2(1  p) 2 2 .
dp
The value of p that minimizes Var (C) :
d                                 1/ 12
Var (C ) | p  p  0  p 
dp                            1/ 12  1/  2
2

Therefore, in this example,
1/ 2.97
p                   0.35.
1/ 2.97  1/1.62
The variance of
1
Var (C )                    1.05.
1/ 2.97  1/1.62

NIPRL
The relative efficiency of C B to C is
Var (C )    1.05
       0.65.
Var (C B ) 1.62
In general, assuming that we have n independent and unbiased
estimates  i , i  1, , n having variance  i , i  1, , n
2

respectively for a parameter  , we can set the unbiased
n


estimator as                 2
i   /i
   i 1
n
.
 1/  i2
i 1
The variance of this estimator is
1
Var ( )         n
.
 1/  i2
i 1

NIPRL
Mean square error (MSE):
Let us consider a point estimate  .
Then, the mean square error is defined by
MSE( )  E[(  )2 ].
Moreover, notice that
MSE ( )  E[(   ) 2 ]
 E[(  E[ ]  E[ ]   ) 2 ]
 E[(  E[ ]) 2  2(  E[ ])( E[ ]   )  ( E[ ]   ) 2
 E[(  E[ ]) 2 ]  ( E[ ]   ) 2
 Var ( )  bias 2

NIPRL
7.3 Sampling Distribution

7.3.1 Sample Proportion (1/2)
•                                                          X
If X   B(n, p ), then the sample propotion p 
ˆ             has
n
    p (1  p ) 
ˆ
the approximate distribution p       N  p,            
        n      

E ( p )  p,
ˆ
X    p (1  p )
Var ( X )  np (1  p )  Var ( p )  Var (
ˆ               )
n        n

NIPRL
7.3.1 Sample Proportion (2/2)

• Standard error of the sample mean

Thestandard error of the sample mean is defined as
p (1  p)
s.e.( p ) 
ˆ                , but since p is usually unknown,
n
wereplace p by the observed value p  x / nto have
ˆ
s.e.( p )  p (1  p)  1 x(n  x)
ˆ     ˆ      ˆ
n       n      n

NIPRL
7.3.2 Sample Mean (1/3)

• Distribution of Sample Mean

If X 1 ,   , X n are observation from a population with mean 
and variance  2 , then the Central Limit Theorem says,
2
X
ˆ          N ( ,        )
n

NIPRL
7.3.2 Sample Mean (2/3)

• Standard error of the sample mean

The standard error of the sample mean is defined as

s.e.( X )        , but since  is usually unknown,
n
wemaysafelyreplace  ,whennislarge,
by the observed value s,
s
s.e.( X ) 
n

NIPRL
7.3.2 Sample Mean (3/3)

If n  20, prob. that   X lies within  / 4 of 
ˆ
                              2      
P     X      P     N ( , )    
    4         4         4        20      4
  20             20 
 P
      N (0,1)        (1.12)   (1.12)  0.7372
  4              4 

74%                         Probability density function
of   X when n  20
ˆ

  / 4        / 4

NIPRL
7.3.3 Sample Variance (1/2)

• Distribution of Sample Variance

If X 1 ,        , X n normally distributed with mean  and
variance  2 , then the sample variance S 2 has the distribution
 n1
2
S   2
   2

(n  1)

NIPRL
Theorem: if X i , i  1, , n is a sample from a normal
population having mean  and variance  , then X \ and \ S 2
2

are independent random variables, with X being normal
with mean  and variance  2 / n and (n  1) S 2 /  2 being
chi-square with n-1 degrees of freedom.
(proof)                      n               n

               
2
Let Yi  X i  . Then,        (Yi  Y )  Yi  nY
2      2

i 1            i 1
or equivalently,
n                  n

 ( X i  X ) 2   ( X i   ) 2  n( X   ) 2 .
i 1               i 1
Dividing this equation by  , we get
2

n                         n

 (( X i   ) /  ) 2   (( X i  X ) /  ) 2  ( n ( X   ) /  ) 2 .
i 1                      i 1
Cf. Let X and Y be independent chi-square random variables
with m and n degrees of freedom respectively. Then,
Z=X+Y is a chi-square random variable with m+n degrees of
freedom.
NIPRL
In the previous equation,
n

 ( n ( X   ) /  )2
i 1
12 \ and
n

 (( X i   ) /  ) 2
i 1
n .
2

Therefore,
n

 (( X i  X ) /  ) 2  ( n  1) S 2 /  2
i 1
 n 1.
2

NIPRL
7.3.3 Sample Variance (2/2)

• t-statistics
 2       n
X   N  ,        (X  )                        N (0,1)
    n    
S         n 1
2
And also                     so that
       (n  1)

n
( X  )
n( X  )                        N (0, 1)
                                             tn 1
S        S                            2
                              n 1
                        (n  1)

NIPRL
7.4 Constructing Parameter Estimates

7.4.1 The Method of Moments (1/3)

• Method of moments point estimate for One
Parameter

If a data set of observations x1 ,   , xn from a probabilty
distribution that depends upon one unknown parameter  ,

ˆ
the method of moments point estimate  of the parameter
is found by solving the equation x  E ( X )

NIPRL
7.4.1 The Method of Moments (2/3)

• Method of moments point estimates for Two Parameters

For unknown two parameters 1 and  2 ,

The method of moments point estimates are found by solving
the equations x  E ( X ) and s 2  Var ( X )

NIPRL
7.4.1 The Method of Moments (3/3)

• Examples
- For example of normally distributed data, since E ( X )  

and Var ( X )   2 for N (  ,  2), the method of moments give
x   and s 2   2
ˆ

- Suppose that the data observations 2.0 2.4 3.1 3.9 4.5
4.8 5.7 9.9 are obtained from a U (0,  ).
      ˆ
x  4.5375 and E ( X )           2  4.5375  9.075
2

– What if the distribution is exponential with the parameter    ?

NIPRL
7.4.2 Maximum Likelihood Estimates (1/4)

• Maximum Likelihood Estimate for One Parameter

If a data set consists of observations x1 ,              , xn from a probability
distrbution f ( x,  ) depending upon one unknown parameter  ,

ˆ
The maximum likelihood estimate  of the parameter is found by
maximizing the likelihood function
L( x1 ,   , xn ,  )  f ( x1 ,  )     f ( xn ,  )

NIPRL
7.4.2 Maximum Likelihood Estimates (2/4)
• Example
- If x1 ,       , xn are a set of Bernoulli observations,
with f (1, p )  p and f (0, p)  1  p, i.e. f ( xi , p)  p xi (1  p )1 xi

- The likelihood function is
n
L( p; x1 ,     , xn )   p xi (1  p )1 xi  p x (1  p ) n  x ,
i 1

where x  x1            xn , 
ˆ
and the m.l.e p is the value that maximizes this.

- The log-likelihood is ln( L)  x ln( p)  ( n  x) ln(1  p)

d ln( L) x n  x         x
and                   0  p
ˆ
dp    p 1 p          n
NIPRL
7.4.2 Maximum Likelihood Estimates (3/4)

•    Maximum Likelihood Estimate for Two Parameters

For two unknown parameters 1 and  2 ,
ˆ
the maximum likelihood estimates  and ˆ
1          2

are found bymaximizing the likelihood function
L(1 ,  2 ;x1 ,   , xn )  f ( x1; 1 , 2 )         f ( xn ; 1 , 2 )

NIPRL
7.4.2 Maximum Likelihood Estimates (4/4)
• Example
1
The normal dist. f ( x,  ,  )     2
e  ( x   ) 2 / 2 2

2
n
likelihood L( x1 ,       xn ,  ,  )   f ( xi ,  ,  2 )
2

i 1
n/2
 1                    n                     
    2 
exp  ( xi   ) 2 / 2 2 
 2                  i 1                  
 i 1 ( xi   )2
n
n
so that log-likelihood ln( L)   ln(2 ) 
2

2                 2 2
 i 1 ( xi   )                         i 1 ( xi   )2
n                                                         n
d ln( L)                         d ln( L)       n
                 ,                2 
d            2
d ( )2
2          2 4
setting two equations to zeros,
     ( xi   )2  i 1 ( xi  x )2
n                    n
  x,
ˆ                           ˆ
 2  i 1
ˆ                     
n                   n
NIPRL
7.4.3 Examples (1/6)

• Glass Sheet Flaws

At a glass manufacturing company, 30 randomly selected sheets
of glass are inspected. If the dist. of the number of flaws per
sheet is taken to have a Poisson dist. How should thepara.  be estimated?

- The method of moment
ˆ
E( X )      x

NIPRL
7.4.3 Examples (2/6)

• The maximum likelihood estimate:
e    xi
f ( xi ,  )             , so that the likelihood is
xi !
n
e  n  ( x1   xn )
L( x1 ,   , xn ,  )       f ( xi ,  ) 
i 1                  ( x1 !   xn !)
The log-likelihood is therefore
ln( L)  n  ( x1   xn ) ln( )  ln( x1 !                 xn !)
d ln( L)        ( x1   xn )      ˆ
so that           n                0    x
d                  

NIPRL
7.4.3 Examples (3/6)

• Example 26: Fish Tagging and Recapture

Suppose fisherman wants to estimate the fish stock N of a lake
and that 34 fish have been tagged and released back into the lake.
If, over a period time, the fisherman catches 50 fish and 9 of them
are tagged, then an intuitive estimate of total number of fish is
ˆ 34  50  189
N
9
this assumes that the proportion of tagged fish roughly equal to
the proportion of the fisherman's catch that is tagged.

NIPRL
7.4.3 Examples (4/6)

Under the assumption that all the fish are equally likely to be caught,
the dist. of the number of tagged fish X in the fisherman's catch of
50 fish is a hypergeometric dist. with r  34, n  50 and N unknown.
 r  N  r 
          
hypergeometric P( X  x | N , n, r )   x n  x  ,   E( X ) 
nr 50  34

N                   N    N
 
n
nr
Since there is only one obs. x and so x  x, equating E ( X )     x
N
Nˆ  50  34 188.89
9

NIPRL
7.4.3 Examples (5/6)
r
Under binomial approximation, in this case success prob. p 
N
is estimated to be p 
ˆ
x 9        ˆ r 50  34
 , Hence N  
n 50          ˆ
p   9

• Example 36: Bee Colonies
The data on the proportion of workers bees that leave a colony with
a queen bee.
0.28 0.32 0.09 0.35 0.45 0.41 0.06 0.16 0.16 0.46 0.35 0.29 0.31.
If the entomologist wishes to model this proportion with a beta dist.,
How should the parameters be estimated?

NIPRL
7.4.3 Examples (6/6)
(a  b) a 1
beta f ( x | a, b)          x (1  x)b 1 , 0  x  1, a  0, b  0
(a)(b)
a                         ab
E( X )      , Var ( X ) 
ab               (a  b) 2 (a  b  1)

x  0.3007, s 2  0.01966
ˆ     ˆ
So the point estimates a and b are solution to the equations
a                       ab
 0.3007 and                      0.01966
ab              (a  b) (a  b  1)
2

ˆ
which are a  2.92 and b  6.78
ˆ

NIPRL
MLE for U (0,  )

•    For some distribution, the MLE may not be found by differentiation.
You have to look at the curve of the likelihood function itself.

•    The MLE of   max { X 1 ,    , X n}

NIPRL

To top