ratio

Document Sample
ratio Powered By Docstoc
					       Ratio and Regression Estimation
Basic ideas:

• The estimated ratio of one variable to other
  related variable (auxiliary variable) can be used
  to obtain more precise estimate (i.e., smaller
  sampling variance) of population mean or total
  than can be obtained based on one variable,
  when the population value of the auxiliary
  variable is known or can be measured easily.
• The ratio estimate is slightly biased, as
  studied before. But the bias in the ratio
  estimate is negligible for a large n.



• The ratio estimation strategy works well
  when the auxiliary variable is highly
  correlated with the variable under
  investigation.
• In ratio estimation, one auxiliary variable is
  used.


• In regression estimation, one or more
  auxiliary variables can be used.
Examples of ratio estimation:
1. In 1786, Laplace proposed a method of population
   estimation (Stigler, History of Statistics, pp.
   163-164). He suggested to conduct a census in a
   few carefully selected communities for determining
   the ratio of population to birth, and to multiply this
   ratio to the total number of births in France to
   estimate the population total. Vital statistics were
   readily available for most local areas at that time.
   This method was actually used in 1802, instead of
   taking a population census for the entire country.
   (Cf., the first US census was taken in 1790)
2. To estimate the amount of juice that can be
  produced from a truckload of oranges, a
  sample of oranges can be squeezed to
  determine the ratio of juice to weight. Then
  the estimated ratio is multiplied to the total
  weight of oranges to estimate the total amount
  of juice. It will be easier to weigh oranges in
  the truck than counting them.
3. To estimate the current population of a local
  area, population can be counted in randomly
  selected census tracks to determine the ratio of
  current population to the previous census. Then
  the current population of the entire area can be
  estimated by multiplying the estimated ratio to
  the previous census total.
Variance of sample ratio:

• Population ratio is R  X  X and sample ratio is
      x x            Y    Y
   r   . Since both the numerator and
      y y
  denominator of the ratio are random variables, an
  expression for its sampling variance cannot be
  derived easily.
• Using a Taylor series approximation, the
  following expression can be obtained:
                     N                     
             1  
                            X i  RYi 2    N  n 
  Var(r )  
                  i 1
                2 
                                                   
             nY              N            N  1 
                    
                                           
                                            

  This is an alternative expression of formula
  (7.1) on page 193 (ignoring the square root).
  This can be verified using the data in the
  illustrative example on that page.
• This can be estimated by
                      n
                                    2 
                       xi  ryi    N  n 
   ˆ (r )   1   i 1              
  Var          2                             ,
             nY        n 1         N 
                    
                                     
                                      

  where Y can be replaced by y . This is an
  alternative expression of formula (7.3) on
  page 196 (ignoring the square root).
Estimation of the population mean and total:
                                              _
• The population mean is estimated by ( r Y ) and
  the population total by    rY  .
• The sampling variance of the estimate of
  population mean can be estimated by multiplying
   _
  Y   to variance of r (see the above formulas) and
       2

  for the variance of the estimate of population
  total by multiplying Y2 to variance of r.
When to use the ratio estimation:

The ratio estimate will have smaller sampling variance
                                               Vy
than the simple inflation estimate, whenever          2  xy
                                             Vx
or   V y     . The correlation between x and y
     xy
          2Vx
should be larger than one-half of the ratio of the

coefficient of variation cv of y to cv of x. When cv of

two variables is the same, the correlation should be

greater than 0.5.
Ratio estimation in stratified random sampling:
1. Combined ratio estimate:
                  x st 
      x' Rc    Y
                 y    
                  st 
2. Separate ratio estimate:
               L        L     xh 
      x' Rs   Yh rh   Yh  
                             y 
              h 1      h 1  h
  Note that more information is required in the
  separate ratio estimate.
Illustrative example of ratio estimation in a
stratified sample:

Consider the following artificial population N=6 in
two strata with equal size (x =number with MPH
degree; y =total number of employees):
Stratum 1   x:   1    2    3    Total = 6
            y:   4    5    6    Total = 15




Stratum 2   x:   3     4    5   Total = 12
            y:   14   16   18   Total = 48


Totals:                          X=18
                                 Y=63
For a stratified random sample of 2 from each

stratum, there are 9 possible samples and the

combined and separate ratio estimates from

each sample are as follows (y values are in

parentheses):
Sample   Stratum 1          Stratum 2   Combined   Separate
  1.     1(4) 2(5)     3(14) 4(16)       16.15      16.20
  2.     1(4) 2(5)     3(14) 5(18)       16.90      17.00
  3.     1(4) 2(5)     4(16) 5(18)       17.58      17.71
  4.     1(4) 3(6)     3(14) 4(16)       17.33      17.20
  5.     1(4) 3(6)     3(14) 4(16)       18.00      18.00
  6.     1(4) 3(6)     4(16) 5(18)       18.61      18.71
  7.     2(5) 3(6)     3(14) 5(18)       18.44      18.02
  8.     2(5) 3(6)     3(14) 5(18)       19.05      18.82
  9.     2(5) 3(6)     4(16) 5(18)       19.60      19.52


                     Expected value:     17.96      17.91
                     Bias                -0.038    -0.091
                     Variance:            1.0526    0.9307
                     MSE:                 1.0542    0.9390
• Note that bias is smaller for the combined
     estimate, while the variance and MSE are
     larger for the combined estimate.

• For a small sample in each stratum, the
     combined ratio estimate is recommended
     (bias consideration). For a large sample in
     each stratum, the separate ratio estimate is
     recommended (variance/MSE consideration;
     the bias is negligible for a large sample)
Regression estimation:

• The ratio estimate is a special case of a
  regression estimate. The regression estimate
  of population mean has the following form of
  a linear estimate:   ˆ  x  b(Y  y )
                       X lr
  where b is a regression coefficient of x on y,
  either estimated from the sample or borrowed
  from external sources.
                                           x
• If b is substituted by sample ratio, r    then we
                                           y
                           ˆ
  have a ratio estimator, X  rY . Note that it is
                             R

  the case of regression line going through the origin.

• If b=1, then we have the difference estimator,
                   ˆ  Y  x  y 
                   Xd

• If b=0, then we have the estimator relevant to
  simple random sampling.
• Like ratio estimate, regression estimate is
  generally biased but consistent estimator.
       ˆ )  E ( x )  YE (b)  E (by )  X  Cov(by ).
   E ( X lr
• When the covariance is zero (when the joint
  distribution of x and y is bivariate normal),
  regression estimate is unbiased. If a sample plot
  of x against y appears approximately liner, there
  should be little risk of major bias in regression
  estimate.
• Sampling variance of a regression estimate can be
                                      xi  x   b y i  y 2   
  estimated by   Vˆar ( x )  1  f                                  
                                n              n2                   
                         lr
                                                                     
  (the divisor (n-2) is used when b is estimated from
  the sample; (n-1) is used when b is pre-assigned).
  For a large sample the following approximation is
          ˆar x   1  f s x 1   2 
                               2
  valid: V      lr
                                      ˆ xy
                         n
          (See formula 7.17 on page 215).
• When correlation between x and y is zero,
  variance of regression estimate is the same as
  variance of simple random sampling. As long as
  there is a measurable correlation between x and y
  regression estimate is more precise than the
  simple expansion type of estimate. Reduction in
  variance is large when correlation is high:
Correlation   Factor of
              Reduction
   0.95         1/10

   0.5           ¾
    0           none
• In general, regression estimate is more precise
  than ratio estimate (see the summary data on
  page 217; correct a typo, MSE of 82.97 for x'
  should be 8297).

• The regression estimator can be expanded to
  include more than one Y variable as follows:

      ˆ  x  b (Y  y )  b Y  y. 
      X lr     1 1    1     2  2   2
• Sampling variance of a difference estimator
  can be estimated by

                       xi  x    y i  y 2 
  Vˆar ( x )  1  f                               
                 n             n 1                
          d
                                                   
  which is similar to variance estimator for a
  regression estimate shown above.