Point Estimation and Confidence Intervals

Document Sample
Point Estimation and Confidence Intervals Powered By Docstoc
					Point Estimation and Confidence Intervals
     Estimation of Population Parameters

• Statistical inference refers to making inferences about a
  population parameter through the use of sample information

• The sample statistics summarize sample information and can be
  used to make inferences about the population parameters

• Two approaches to estimate population parameters
   – Point estimation: Obtain a value estimate for the population
     parameter
   – Interval estimation: Construct an interval within which the
     population parameter will lie with a certain probability
                     Point Estimation

• In attempting to obtain point estimates of population parameters,
  the following questions arise
    – What is a point estimate of the population mean?
    – How good of an estimate do we obtain through the methodology
      that we follow?


• Example: What is a point estimate of the average yield on ten-
  year Treasury bonds?

• To answer this question, we use a formula that takes sample
  information and produces a number
                      Point Estimation

• A formula that uses sample information to produce an estimate
  of a population parameter is called an estimator

• A specific value of an estimator obtained from information of a
  specific sample is called an estimate

• Example: We said that the sample mean is a good estimate of
  the population mean
    – The sample mean is an estimator
    – A particular value of the sample mean is an estimate
                    Point Estimation

• Note: An estimator is a random variable that takes many
  possible values (estimates)

• Question: Is there a unique estimator for a population
  parameter? For example, is there only one estimator for the
  population mean?

• The answer is that there may be many possible estimators

• Those estimators must be ranked in terms of some desirable
  properties that they should exhibit
           Properties of Point Estimators

• The choice of point estimator is based on the following criteria

    – Unbiasedness
    – Efficiency
    – Consistency


• A point estimator ˆ is said to be an unbiased estimator of the
  population parameter  if its expected value (the mean of its
  sampling distribution) is equal to the population parameter it is
  trying to estimate
                                ˆ
                               E  
           Properties of Point Estimators

• Interesting Results on Unbiased Estimators

    – The sample mean, variance and proportion are unbiased
      estimators of the corresponding population parameters
    – Generally speaking, the sample standard deviation is not an
      unbiased estimator of the population standard deviation


• We can also define the bias of an estimator as follows

                                  
                                   ˆ     ˆ
                             Bias   E   
           Properties of Point Estimators

• It is usually the case that, if there is an unbiased estimator of a
  population parameter, there exist several others, as well

• To select the “best unbiased” estimator, we use the criterion of
  efficiency

• An unbiased estimator is efficient if no other unbiased estimator
  of the particular population parameter has a lower sampling
  distribution variance
           Properties of Point Estimators

• If ˆ1 and ˆ2 are two unbiased estimators of the population
  parameter , then ˆ1 is more efficient than ˆ2 if

                                
                               ˆ      ˆ
                            V 1  V  2

• The unbiased estimator of a population parameter with the
  lowest variance out of all unbiased estimators is called the most
  efficient or minimum variance unbiased estimator

• In some cases, we may be interested in the properties of an
  estimator in large samples, which may not be present in the
  case of small samples
          Properties of Point Estimators

• We say that an estimator is consistent if the probability of
  obtaining estimates close to the population parameter increases
  as the sample size increases

• The problem of selecting the most appropriate estimator for a
  population parameter is quite complicated

• In some occasions, we may prefer to have some bias of the
  estimator at the gain of increases efficiency
          Properties of Point Estimators

• One measure of the expected closeness of an estimator to the
  population parameter is its mean squared error

                                
                                    
                        MSE   E    
                             ˆ
                                  
                                     ˆ      2 
                                               
                                               
                    Interval Estimation

• Point estimates of population parameters are prone to sampling
  error and are not likely to equal the population parameter in any
  given sample

• Moreover, it is often the case that we are interested not in a
  point estimate, but in a range within which the population
  parameter will lie

• An interval estimator for a population parameter is a formula that
  determines, based on sample information, a range within which
  the population parameter lies with certain probability
                      Interval Estimation

• The estimate is called an interval estimate

• The probability that the population parameter will lie within a
  confidence interval is called the level of confidence and is given
  by 1 - 

• Two interpretations of confidence intervals

    – Probabilistic interpretation
    – Practical interpretation
                     Interval Estimation

• In the probabilistic interpretation, we say that

    – A 95% confidence interval for a population parameter means that,
      in repeated sampling, 95% of such confidence intervals will include
      the population parameter


• In the practical interpretation, we say that

    – We are 95% confident that the 95% confidence interval will include
      the population parameter
        Constructing Confidence Intervals

• Confidence intervals have similar structures

        Point Estimate  Reliability Factor  Standard Error

    – Reliability factor is a number based on the assumed distribution of
      the point estimate and the level of confidence
    – Standard error of the sample statistic providing the point estimate
  Confidence Interval for Mean of a Normal
     Distribution with Known Variance

• Suppose we take a random sample from a normal distribution
  with unknown mean, but known variance

• We are interested in obtaining a confidence interval such that it
  will contain the population mean 90% of times

• The sample mean will follow a normal distribution and the
  corresponding standardized variable will follow a standard
  normal distribution
   Confidence Interval for Mean of a Normal
      Distribution with Known Variance

• If X is the sample mean, then we are interested in the
  confidence interval, such that the following probability is .9
                     .9  P 1.645  Z  1.645
                                    X          
                      P  1.645         1.645 
                                    / n         
                          1.645            1.645 
                      P             X           
                              n                  n 
                              1.645            1.645 
                      P X           X            
                                  n                 n 
  Confidence Interval for Mean of a Normal
     Distribution with Known Variance

• Following the above expression for the structure of a confidence
  interval, we rewrite the confidence interval as follows

                                        
                          X  1.645 
                                        n

• Note that from the standard normal density

                  PZ  1.65  FZ 1.65  0.95
                  P( Z  1.65)  FZ  1.65  0.05
   Confidence Interval for Mean of a Normal
      Distribution with Known Variance

• Given this result and that the level of confidence for this interval
  (1-) is .90, we conclude that

    – The area under the standard normal to the left of –1.65 is 0.05
    – The area under the standard normal to the right of 1.65 is 0.05


• Thus, the two reliability factors represent the cutoffs -z/2 and
  z/2 for the standard normal
  Confidence Interval for Mean of a Normal
     Distribution with Known Variance

• In general, a 100(1-)% confidence interval for the population
  mean  when we draw samples from a normal distribution with
  known variance 2 is given by
                                        
                           X  z / 2
                                        n

  where z/2 is the number for which

                                            
                         PZ  z / 2  
                                            2
  Confidence Interval for Mean of a Normal
     Distribution with Known Variance

• Note: We typically use the following reliability factors when
  constructing confidence intervals based on the standard normal
  distribution

    – 90% interval: z0.05 = 1.65
    – 95% interval: z0.025 = 1.96
    – 99% interval: z0.005 = 2.58


• Implication: As the degree of confidence increases the interval
  becomes wider
  Confidence Interval for Mean of a Normal
     Distribution with Known Variance

• Example: Suppose we draw a sample of 100 observations of
  returns on the Nikkei index, assumed to be normally distributed,
  with sample mean 4% and standard deviation 6%

• What is the 95% confidence interval for the population mean?

• The standard error is .06/ 100 = .006

• The confidence interval is .04  1.96(.006)
  Confidence Interval for Mean of a Normal
    Distribution with Unknown Variance

• In a more typical scenario, the population variance is unknown

• Note that, if the sample size is large, the previous results can be
  modified as follows
    – The population distribution need not be normal
    – The population variance need not be known
    – The sample standard deviation will be a sufficiently good estimator
      of the population standard deviation


• Thus, the confidence interval for the population mean derived
  above can be used by substituting s for 
  Confidence Interval for Mean of a Normal
    Distribution with Unknown Variance

• However, if the sample size is small and the population variance
  is unknown, we cannot use the standard normal distribution

• If we replace the unknown  with the sample st. deviation s the
  following quantity
                                   X 
                              t
                                   s/ n

   follows Student’s t distribution with (n – 1) degrees of freedom
  Confidence Interval for Mean of a Normal
    Distribution with Unknown Variance

• The t-distribution has mean 0 and (n – 1) degrees of freedom

• As degrees of freedom increase, the t-distribution approaches
  the standard normal distribution

• Also, t-distributions have fatter tails, but as degrees of freedom
  increase (df = 8 or more) the tails become less fat and resemble
  that of a normal distribution
  Confidence Interval for Mean of a Normal
    Distribution with Unknown Variance

• In general, a 100(1-)% confidence interval for the population
  mean  when we draw small samples from a normal distribution
  with an unknown variance 2 is given by
                                           s
                          X  tn1, / 2
                                            n

  where tn-1,/2 is the number for which

                                                  
                       Ptn 1  tn1, / 2  
                                                  2
  Confidence Interval for Mean of a Normal
    Distribution with Unknown Variance

• Example: Suppose we want to estimate a 95% confidence
  interval for the average quarterly returns of all fixed-income
  funds in the US

• We assume that those returns are normally distributed with an
  unknown variance

• We draw a sample of 150 observations and calculate the
  sample mean to be .05 and the standard deviation .03
   Confidence Interval for Mean of a Normal
     Distribution with Unknown Variance

• To find the confidence interval, we need tn-1,/2 = t149,0.025

• From the tables of the t-distribution, this is equal to 1.96

• The confidence interval is

                                        .03
                           .05  1.96
                                         150
    Confidence Interval for the Population
      Variance of a Normal Population

• Suppose we have obtained a random sample of n observations
  from a normal population with variance 2 and that the sample
  variance is s2. A 100(1 - )% confidence interval for the
  population variance is

                   n  1s 2       2     n  1s 2
                                   
                     2                     2
                    n 1, / 2           n 1,1 / 2
     Confidence Interval for the Population
       Variance of a Normal Population

• The values of the chi-squared distribution with n-1,/2 and
  n-1,1-/2 are determined as follows

                         2       2         
                      P  n1   n1, / 2 
                                                 2
                      P  n1   n1,1 / 2  
                           2       2               
                                                   2
            2
   where  n 1 follows the chi-squared distribution with (n-1)
   degrees of freedom