Outline for Class Meeting 7 (Chapter 3 (3.2,3.4), Lohr, 2/8/06) Model Based sampling for auxiliary data, Intro to Stratified sampling I. Ratio estimation can be justified from a model-based point of view A. Consider the following model. The population is a realization from a model of the form Yi xi i N where i ~ (0, xi2) and independent. Under this model, T y Yi is a r.v. and the i 1 parameter of interest is one realization of this r.v. Thus our original estimation problem is a prediction problem. B. A reasonable predictor of ty is ˆ t yr yi yi yi xi ˆ ˆ iS iS iS iS ˆ where is the least squares estimator of . Observe that ( x V 1 x) 1 ( x V 1 y ) ˆ 2 yi ˆ iS ty 2 . xi tˆx iS Thus ˆ ty ˆ t yr yi (t x xi ) i S ˆ tx iS ˆ ty tx. ˆ tx So the estimator is the same as the randomization-based estimator. ˆ ˆ C. EM [T yr T y ] EM [ xi yi ] 0 . The estimator is model-unbiased (even iS iS though it is not randomization unbiased. D. The model-based variance can be shown (see p. 82) to be x 2t x 2 VM [T yr T y ] 1 iS i ˆ . tx i S xi This is different from the randomization based variance. II. Other ways to make use of auxiliary data A. Regression estimator Suppose that the best model for the data is not that shown in I., but rather Yi 0 1 xi i , where i ~ (0, ) and are independent. Then prediction as before, using the least squares 2 estimator for the parameters 0 and 1, leads to the regression estimator ˆ t yreg t y 1 (t x t x ) , ˆ ˆ ˆ rs y ˆ where 1 . sx 1. From a randomization-based point of view, this estimator is biased in small samples, and an estimate of its approximate variance is ˆ N 2 (1 f ) 2 V (t yreg ) sd n ˆ ˆ where di yi [ 0 1xi ] . 2. From a model-based point of view, this estimator is unbiased with a variance that looks like variance of a regular simple regression predictor. (See p. 86). B. Difference estimator The difference estimator is often used in accounting populations. It is ˆ ˆ ˆ t ydiff t y (t x t x ) . This is an unbiased estimator of ty and its variance is ˆ N 2 (1 f ) 2 V (t ydiff ) Sd n where di = yi – xi. Under what model is this estimator the best linear unbiased predictor? III. Stratified sampling When separate samples are selected from each of several subsets of the population (defined ahead of time, called strata), the sample is said to be a stratified sample. If the samples from each strata are SRS, the the design is said to be a stratified random sample. A.Estimators 1. Denote by t h the total for the hth stratum. Likewise, all other notation is subscripted by h to indicate that it is for the hth of H strata. Thus an unbiased estimator of population total from a stratified random sample is H ˆ t str N h y h . h 1 and H N y str h y h . N h 1 2. The variance of the stratified estimator is obtained as the sum of the variances across the strata, since sampling is independent from one stratum to the next. Likewise the estimate of the variance is obtained as the sum of the variance estimates across the strata. 3. A confidence interval for the mean or total can be constructed based on the normal approximation if either the sample sizes within each stratum are large or there are many strata.
Pages to are hidden for
"Outline for Class Meeting 3 (Chaper 2, Lohr, 9/11/00)"Please download to view full document