On Non-Response Adjustment via Calibration
Michail Sverchkov, Alan H. Dorfman, Lawrence R. Ernst, Thomas G. Moerhle, Steven P. Paben and Chester H. Ponikowski Bureau of Labor Statistics
August 2005
1. Linear Regression Estimator Introduction Let We consider estimation of finite population totals in the presence of non-response assuming that non-responses arise randomly within response classes. We compare in Section 1 two regression estimators: one of them is based on the adjusted for nonresponse probability weights and another is based on unadjusted weights. We show that when the auxiliary variables used for nonresponse adjustment are included in the estimators then they differ only very slightly. In this case the non-response adjustment step can be omitted from the estimation process without loss of generality (from Result 5 of Deville and Särndal 1992 it follows that the same remains correct for a wide class of calibration estimators). At the end of Section 1 we suggest a general idea of testing if regression estimators based on adjusted and unadjusted weights are significantly different. In Section 2 we consider a multivariate analog of a “regression through the origin" estimator, and show that the “adjusted" and “unadjusted" estimators coincide in this case. Then in Section 3 we consider the important practical case in which auxiliary variables are stratum indicators. We show that in this case all previous regression estimators coincide. In Section 4 we consider calibration estimators under restrictions on weights. We show that if there exists even one set of weights satisfying the calibration equations and restrictions then the regression through the origin estimator does not depend on the restrictions.
K S ∗ = ∪k =1 S k∗ ,
∗
S k∗ ∩ Sk∗′ = ∅
∗
when
k ≠ k ′ , and let s be a subset of S and put ∗ sk = s ∗ ∩ S k∗ . Let
( yi , x i = ( x1i ,..., xKi )T , di )i∈s∗ be such that
∗ xki = 0 if i ∈ sk for any k . Let c1 ,..., cK be /
some
T T i
xi = (1, x ) = (1, x1i ,..., xKi ) and
T
k =1
where 1 A = 1 if otherwise.
%
Let t x and t x be respectively K + 1 dimensional and K -dimensional vectors of x constants, corresponding to x i , ~i respectively.
i∈ s *
i ∈s *
vi x i = t x ,
v L < i 0 for any sk then
4. Bounds For Adjusted Weights Let us return to the general expression for the calibration estimator (see the beginning of section 1), t y ,reg (d) = i∈s∗ vi yi and ˆ
0 ˆ0 t y ,reg (d) = ∑ i∈s∗ vi yi .
i∈s∗
2 i
i∈s∗
2 i
k =1
ˆ0 t y , reg is a ratio estimator if K = 1 .
3. Calibration On Known Totals Under the practically important case in which auxiliary variables are strata
∑
where
σ i2 = σ 2
K
x ki . In particular
%
% %
β (d) = [∑
%
di
σ
xi xi ]
T −1
∑σ
di
x i yi ,
∑
One
requirements which we have to follow often in practice is bounds for the ratio between the final weight, v or v 0 , and the initial (frame sample) weight, d ∗ in our case, that is L < vi /di∗ < U . (14)
%
%
∑
yi [d i
t xk d i xki
]
~ ~ ~ ˆ ˆ0 t y , reg (d) = tˆy (d) + [ t x − t x (d)]T β (d) ,
(12)
where β (d) = arg inf
%
d i x ki
%
∑
t xk
i∈ s * k
=
can write xT A = xT A and thus (13) is i i equivalent to
%
d i yi
⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎞
∑ ∑
di yi
.
%
diagonal and thus [
∑
i∈s∗
di z i x T ]−1 is also a i
%
Note that the matrix
∑
i∈s∗
di z i x T i
is
indicators the Linear Regression estimators with and without intercept coincide and thus ∗ ˆ ˆ t y ,reg (d) = t y ,reg (d ) . To see
K k =1
this,
let for
xki = 1i∈S ∗ .
k
Then and
⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎛
%
∑
∑
∑
xki = 1
all
i
t x = (tx0 ,..., txK )T where txk = #{ units in a
∗ set S k } , the k -th total for k = 1,..., K and
tx0 = tx1 + ... + tx K (equals the number of
units where in the population.). is any Consider
~ ˆ0 ˆ t y , reg (d) = t y (d) + [ tx
B(d) = arg inf ∑
A
~ ~ ˆ − tx (d)]T β (d) ,
T i
B(d)
i∈s∗
solution to di ( yi − x A)2 . (13)
ˆ It can be shown that t x − t x (d ) B(d ) is unique (compare Valliant, et. al 2000,
T
[
]
Chapter 7.) Using
∑
K k =1
xki = 1 for all i
and denoting A = (a1 + a0 ,..., aK + a0 )T we
∑ ∑
∑
∑ d ( y − x b)
T i
2
of
the
Multiplying (14) by di∗ x i one can get
Ldi∗ x i < vi x i < Udi∗ x i which implies (by ˆ ˆ summing over s∗ ) Lt x < t x < U t x
(component wise). Thus
then t 0 ,reg (d ) does not depend on L and ˆy U . In particular if auxiliary variables are strata indicators, i.e. xki = 1i∈S ∗ then the
k
L<
x ki * i∈ S k d i* xki i ∈s * k
< U , k = 1,..., K .
(15)
On the other hand suppose (15) holds. Comparing this to (12), one can note that the central part of (15) is the benchmark factor, that is, is the multiplier of di used to get the Thus a set of calibration weights ν i . weights satisfying (14) exists if and only if (15) holds. Therefore the following statement is correct. Lemma 4. If there exists a set of weights satisfying
i∈s∗ i i
The opinions expressed in this paper are those of the authors and do not necessarily represent the policies of the Bureau of Labor Statistics
%
∑v x
%
= t x, L < vi /di∗ < U
%
%
%
%
%
%
∑
∑
%
same remains true for t y ,reg (d∗ ) . ˆ Remark 4. From Lemma 4 it follows that in the case of calibration on known totals the only way to get the restrictions (14) is to ∗ collapse cells sk ’s such that (15) is satisfied. References Deville, J. C. and Särndal, C. E. (1992), Calibration Estimators in Survey Sampling, JASA, v. 87, No. 418, pp. 376-382. Valliant, R., Dorfman, A. H., and Royall, R. M. (2000) Finite Population Sampling and Inference, A Prediction Approach. Wiley, New York.