# ratio

Document Sample

```					       Ratio and Regression Estimation
Basic ideas:

• The estimated ratio of one variable to other
related variable (auxiliary variable) can be used
to obtain more precise estimate (i.e., smaller
sampling variance) of population mean or total
than can be obtained based on one variable,
when the population value of the auxiliary
variable is known or can be measured easily.
• The ratio estimate is slightly biased, as
studied before. But the bias in the ratio
estimate is negligible for a large n.

• The ratio estimation strategy works well
when the auxiliary variable is highly
correlated with the variable under
investigation.
• In ratio estimation, one auxiliary variable is
used.

• In regression estimation, one or more
auxiliary variables can be used.
Examples of ratio estimation:
1. In 1786, Laplace proposed a method of population
estimation (Stigler, History of Statistics, pp.
163-164). He suggested to conduct a census in a
few carefully selected communities for determining
the ratio of population to birth, and to multiply this
ratio to the total number of births in France to
estimate the population total. Vital statistics were
readily available for most local areas at that time.
This method was actually used in 1802, instead of
taking a population census for the entire country.
(Cf., the first US census was taken in 1790)
2. To estimate the amount of juice that can be
produced from a truckload of oranges, a
sample of oranges can be squeezed to
determine the ratio of juice to weight. Then
the estimated ratio is multiplied to the total
weight of oranges to estimate the total amount
of juice. It will be easier to weigh oranges in
the truck than counting them.
3. To estimate the current population of a local
area, population can be counted in randomly
selected census tracks to determine the ratio of
current population to the previous census. Then
the current population of the entire area can be
estimated by multiplying the estimated ratio to
the previous census total.
Variance of sample ratio:

• Population ratio is R  X  X and sample ratio is
x x            Y    Y
r   . Since both the numerator and
y y
denominator of the ratio are random variables, an
expression for its sampling variance cannot be
derived easily.
• Using a Taylor series approximation, the
following expression can be obtained:
 N                     
 1  
 X i  RYi 2    N  n 
Var(r )  
        i 1
2 
       
 nY              N            N  1 

                       


This is an alternative expression of formula
(7.1) on page 193 (ignoring the square root).
This can be verified using the data in the
illustrative example on that page.
• This can be estimated by
  n
2 
   xi  ryi    N  n 
ˆ (r )   1   i 1              
Var          2                             ,
 nY        n 1         N 

                 


where Y can be replaced by y . This is an
alternative expression of formula (7.3) on
page 196 (ignoring the square root).
Estimation of the population mean and total:
_
• The population mean is estimated by ( r Y ) and
the population total by    rY  .
• The sampling variance of the estimate of
population mean can be estimated by multiplying
_
Y   to variance of r (see the above formulas) and
2

for the variance of the estimate of population
total by multiplying Y2 to variance of r.
When to use the ratio estimation:

The ratio estimate will have smaller sampling variance
Vy
than the simple inflation estimate, whenever          2  xy
Vx
or   V y     . The correlation between x and y
xy
2Vx
should be larger than one-half of the ratio of the

coefficient of variation cv of y to cv of x. When cv of

two variables is the same, the correlation should be

greater than 0.5.
Ratio estimation in stratified random sampling:
1. Combined ratio estimate:
 x st 
x' Rc    Y
y    
 st 
2. Separate ratio estimate:
L        L     xh 
x' Rs   Yh rh   Yh  
y 
h 1      h 1  h
separate ratio estimate.
Illustrative example of ratio estimation in a
stratified sample:

Consider the following artificial population N=6 in
two strata with equal size (x =number with MPH
degree; y =total number of employees):
Stratum 1   x:   1    2    3    Total = 6
y:   4    5    6    Total = 15

Stratum 2   x:   3     4    5   Total = 12
y:   14   16   18   Total = 48

Totals:                          X=18
Y=63
For a stratified random sample of 2 from each

stratum, there are 9 possible samples and the

combined and separate ratio estimates from

each sample are as follows (y values are in

parentheses):
Sample   Stratum 1          Stratum 2   Combined   Separate
1.     1(4) 2(5)     3(14) 4(16)       16.15      16.20
2.     1(4) 2(5)     3(14) 5(18)       16.90      17.00
3.     1(4) 2(5)     4(16) 5(18)       17.58      17.71
4.     1(4) 3(6)     3(14) 4(16)       17.33      17.20
5.     1(4) 3(6)     3(14) 4(16)       18.00      18.00
6.     1(4) 3(6)     4(16) 5(18)       18.61      18.71
7.     2(5) 3(6)     3(14) 5(18)       18.44      18.02
8.     2(5) 3(6)     3(14) 5(18)       19.05      18.82
9.     2(5) 3(6)     4(16) 5(18)       19.60      19.52

Expected value:     17.96      17.91
Bias                -0.038    -0.091
Variance:            1.0526    0.9307
MSE:                 1.0542    0.9390
• Note that bias is smaller for the combined
estimate, while the variance and MSE are
larger for the combined estimate.

• For a small sample in each stratum, the
combined ratio estimate is recommended
(bias consideration). For a large sample in
each stratum, the separate ratio estimate is
recommended (variance/MSE consideration;
the bias is negligible for a large sample)
Regression estimation:

• The ratio estimate is a special case of a
regression estimate. The regression estimate
of population mean has the following form of
a linear estimate:   ˆ  x  b(Y  y )
X lr
where b is a regression coefficient of x on y,
either estimated from the sample or borrowed
from external sources.
x
• If b is substituted by sample ratio, r    then we
y
ˆ
have a ratio estimator, X  rY . Note that it is
R

the case of regression line going through the origin.

• If b=1, then we have the difference estimator,
ˆ  Y  x  y 
Xd

• If b=0, then we have the estimator relevant to
simple random sampling.
• Like ratio estimate, regression estimate is
generally biased but consistent estimator.
ˆ )  E ( x )  YE (b)  E (by )  X  Cov(by ).
E ( X lr
• When the covariance is zero (when the joint
distribution of x and y is bivariate normal),
regression estimate is unbiased. If a sample plot
of x against y appears approximately liner, there
should be little risk of major bias in regression
estimate.
• Sampling variance of a regression estimate can be
  xi  x   b y i  y 2   
estimated by   Vˆar ( x )  1  f                                  
n              n2                   
lr
                                 
(the divisor (n-2) is used when b is estimated from
the sample; (n-1) is used when b is pre-assigned).
For a large sample the following approximation is
ˆar x   1  f s x 1   2 
2
valid: V      lr
ˆ xy
n
(See formula 7.17 on page 215).
• When correlation between x and y is zero,
variance of regression estimate is the same as
variance of simple random sampling. As long as
there is a measurable correlation between x and y
regression estimate is more precise than the
simple expansion type of estimate. Reduction in
variance is large when correlation is high:
Correlation   Factor of
Reduction
0.95         1/10

0.5           ¾
0           none
• In general, regression estimate is more precise
than ratio estimate (see the summary data on
page 217; correct a typo, MSE of 82.97 for x'
should be 8297).

• The regression estimator can be expanded to
include more than one Y variable as follows:

ˆ  x  b (Y  y )  b Y  y. 
X lr     1 1    1     2  2   2
• Sampling variance of a difference estimator
can be estimated by

  xi  x    y i  y 2 
Vˆar ( x )  1  f                               
n             n 1                
d
                              
which is similar to variance estimator for a
regression estimate shown above.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 12 posted: 5/13/2011 language: English pages: 25
How are you planning on using Docstoc?