# Outline for Class Meeting 3 (Chaper 2, Lohr, 9/11/00) by PV96CoQt

VIEWS: 8 PAGES: 3

• pg 1
```									           Outline for Class Meeting 7 (Chapter 3 (3.2,3.4), Lohr, 2/8/06)
Model Based sampling for auxiliary data, Intro to Stratified sampling

I.      Ratio estimation can be justified from a model-based point of view

A. Consider the following model. The population is a realization from a model of the
form
Yi  xi   i
N
where i ~ (0, xi2) and independent. Under this model, T y   Yi is a r.v. and the
i 1
parameter of interest is one realization of this r.v. Thus our original estimation problem
is a prediction problem.

B. A reasonable predictor of ty is
ˆ
t yr   yi   yi   yi   xi
ˆ              ˆ
iS       iS    iS             iS
ˆ
where  is the least squares estimator of . Observe that
  ( x V 1 x) 1 ( x V 1 y )
ˆ
 2        yi        ˆ
        iS        ty
         2             .
  xi              tˆx
 iS            
Thus
ˆ
ty
ˆ
t yr   yi  (t x   xi )
i S   ˆ
tx     iS
ˆ
ty
 tx.
ˆ
tx

So the estimator is the same as the randomization-based estimator.

ˆ                   ˆ
C. EM [T yr  T y ]  EM [   xi   yi ]  0 . The estimator is model-unbiased (even
iS       iS
though it is not randomization unbiased.

D. The model-based variance can be shown (see p. 82) to be
       x         2t x
2
VM [T yr  T y ]  1  iS i
ˆ
                
         .
     tx          i  S xi

This is different from the randomization based variance.
II. Other ways to make use of auxiliary data

A. Regression estimator

Suppose that the best model for the data is not that shown in I., but rather
Yi   0  1 xi   i ,
where i ~ (0,  ) and are independent. Then prediction as before, using the least squares
2

estimator for the parameters 0 and 1, leads to the regression estimator
ˆ
t yreg  t y  1 (t x  t x ) ,
ˆ        ˆ               ˆ
rs y
ˆ
where 1           .
sx

1. From a randomization-based point of view, this estimator is biased in small samples,
and an estimate of its approximate variance is
ˆ          N 2 (1  f ) 2
V (t yreg )              sd
n
ˆ     ˆ
where di  yi  [  0  1xi ] .

2. From a model-based point of view, this estimator is unbiased with a variance that
looks like variance of a regular simple regression predictor. (See p. 86).

B. Difference estimator

The difference estimator is often used in accounting populations. It is
ˆ         ˆ            ˆ
t ydiff  t y  (t x  t x ) .

This is an unbiased estimator of ty and its variance is
ˆ     N 2 (1  f ) 2
V (t ydiff )        Sd
n
where di = yi – xi. Under what model is this estimator the best linear unbiased predictor?

III. Stratified sampling
When separate samples are selected from each of several subsets of the population
(defined ahead of time, called strata), the sample is said to be a stratified sample. If the
samples from each strata are SRS, the the design is said to be a stratified random sample.

A.Estimators
1. Denote by t h the total for the hth stratum. Likewise, all other notation is
subscripted by h to indicate that it is for the hth of H strata. Thus an unbiased
estimator of population total from a stratified random sample is
H
ˆ
t str   N h y h .
h 1
and
H N
y str   h y h .
N
h 1

2. The variance of the stratified estimator is obtained as the sum of the variances
across the strata, since sampling is independent from one stratum to the next.
Likewise the estimate of the variance is obtained as the sum of the variance
estimates across the strata.
3. A confidence interval for the mean or total can be constructed based on the
normal approximation if either the sample sizes within each stratum are large or
there are many strata.

```
To top