THE SURVEY OF INCOME AND PROGRAM PARTICIPATION
AN INVESTIGATION OF MODEL-BASED IMPUTATION PROCEDURES USING DATA FROM FROM THE INCOME SURVEY DEVELOPMENT PROGRAM No. 8603 )
)
by Vicki J. Huggins and Lynn Weidman Bureau of the Census
June 1986
Acknowledgements T h i s p a p e r was p r e p a r e d by V i c k i J. Huggins and Lynn Aeidman of t h e S t a t i s t i c a l Research O i v i s i o n Bureau o f t h e Census.
Suggested C i t a t i o n H u g g i n s , V i c k i J. and L y n n Weidman. "An I n v e s t i g a t i o n o f Model-based I m p u t a t i o n P r o c e d u r e s U s i n g D a t a from t h e Income Survey Development Program," SIPP Working Paper S e r i e s No. 8603. Washington, 0.C.: U.S. Bureau o f t h e Census, 1986.
Survey of Income and Proqram Participation
AN I N V E S T I G A T I O N OF MODEL-BASED I M P U T A T I O N PROCEDURES U S I N G U A T A FROM THE I N C O M E SURVEY DEVELOPMENT PROGRAM No. 8603 by V i c k i J. H u g y i n s a n d Lynn W e i d r n a n Bureau of t h e Census
June 1986
Acknowledgements T h i s p a p e r was p r e p a r e d by V i c k i J. H u g g i n s and L y n n Weidnan of t h e S t a t i s t i c a l Research D i v i s i o n Bureau o f t h e Census.
Suggested C i t a t i o n H u g g i n s , V i c k i 3. and L y n n Weidman. "An I n v e s t i g a t i o n o f Model 3ased I m p u t a t i o n P r o c e d u r e s U s i n g D a t a from t h e Income Survey Devel - e n t Program," SIPP Working Paper S e r i e s No. 8603. Washington, 9.C.: 8. Bureau o f t h e Census, 1986.
TABLE
OF CONTENTS
.............................. Creation of Estimation F i l e s . . . . . . . . . . . . . . . . . . . . . . Model E s t i m a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . averview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oiscussion o f Estimation 2esults . . . . . . . . . . . . . . . . . . . R e c e i p t o f Wages . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction
1
3
4 4
4
5
6
......................... E a r n i n g s Amounts . . . . . . . . . . . . . . . . . . . . . . . . . Weeks w i t h Pay . . . . . . . . . . . . . . . . . . . . . . . . . . M i s s i n g Wage and S a l a r y R e c o r d s . . . . . . . . . . . . . . . . . .
Medicaid Receipt InputationResults Conclusions
6
7
7
8
............................. ................................
9
.....................9 AppendixesA.Mode1sFittotheData . . . . . . . . . . . . . . . . . . . 11 B . V a r i a b l e T r a n s f o r m a t i o n Used i n F i t t i n y M o d e l s . . . . . . . . 2 2 . . . . . . . . . . . . . . . . . . . . . . 26 C . E a r n i n g s Amounts
Kecon~nendationsf o r F u r t n e r Study
INTRODUCTION The purpose 3 f this study is t o investigate the feasibility of using m odel-based imputation m ethods f o r record nonresponse in a longitudinal survey. Record nonresponse means t h a t the responses t o an entire s e t of questions (record type) a r e missing for a wave. In this study we have selected four variables t o model and impute: Ci) r c p t
(ii) w pay
=
=
receipt of earnings;
=
weeks worked with pay; (iii)earn
=
earnings am aunt; and (iv) maid
Medicaid
coverage.
Maid is on the person ( ? ) r e c o r d and t h e others on t h e wage and salary (WS)
record. F o r any wave, a person may respond t o neither record type, t o P , or t o both. So t h e first three variables a r e reported or missing slmu'ltaneously and maid may or may not be missing a t t h e s a m e t i m e as t h e others.
In order t o reduce the amount of data manipulation required in this study, we want t o
select a subset of the avanable ISDP waves.
The methods we envision w i l l impute
a onthz in their order of occurrence, so t h a t a U previous months of data a r e available a t
t h e t i n e a given month is inputeci. Thus, we w i l l , a e t t r e e xaves of data--waves 1 and 2
w i l l Se complete data and t h e variables i n t h e months of wave 2 w i l l be modeled. Wave 3
will
include missing record types so that we nay model the relationship of missing We w i l l use only one rotation group in order t o reduce
variables t o responses in wave 2.
t h e amount of data m anipulatrion required and any corn plications which would be caused by waves overlapping for different rotation groups; i.e., a l data will cover the same l three waves and 9 months. Previous study of important factor. the relationship between de m ographic and em ploym ent-related
F a r t N s reason we will attem pt t o p u t the data into four race-sex
variables has shown t h a t t h e r a c e (white, nonwhite) and sex status of a person is an cells and model e a c h one separately. This, in effect, models the interaction of race-sex with all the other variables in the model. Because of the small number of records avaitahle f o r use a f t e r fLilfWng the data requirements introduced in the previous
r paragraph, we a ay not be able t o fit models for all four race-sex cells. O we may have
t o reduce t h e number of variables in some of the models.
For the data in each cell, we m ust estimate m odels and evaluate imputations which use
these models. The imputations are done by month and witNn month a specdied order of variables is used. When imputing a variable, the current
2
onth value of a l l previ0LLsly
imputed variables on t h e same and other record types are available, as a r e observed variables from other record types.
A l l previous month variables are available as a r e all
following month variables t h a t a r e s bserved.
O f the four variables we a r e rr, odeling, two of them w i l l be treated as continuou
w i t h pay and earnings) and t w o of them
4
eeks
as categorical keceipt of earnings and
P"
.dicaid
coverage). Each m onth for each variable will be rn odeled separately. The explanatory variables w i l l include those shown in Table 1 and values of s o n e demographic variables in wave 2.
F o r the categorical variables we w i 3 l f i t logit m odels and for the continuous
variables, linear regression models.
*
Table 1 : M onths of Variables Used in Fitting M odels
Variable in M odel
M onth M odeled
Variable M odeled rc p t pay earn m aid
rcpt
WPaY
earn
m aid
w
rc pt w Pay earn m aid r c pt WPaY earn
m aid
The num bers a r e t h e months f o r which the variable a t t h e t o p of the colum n is used in modeling t h e variable a t the l e f t .
W e w i l l discuss t h r e e major s t a g e s in tkis study:
1.
Creation of data f n e s t h a t include nonresponse t o be used for estim atini, para m eters.
E s t i m ating m odeLs and searching for those m ost applicable.
3
odel
2.
3.
Imputing values onto a data file for com parison with originally reported values.
Following t h a t , we w i l l present conclusions and recom aendations for further study. CREATION O F ESTIMATION F L E S
A file of records t o be used for m oael estimation w a s created for each of white m dies,
wnite females, and n ~ n w h i t e s . 3ecause of the smali number ~f records of nonwhites available in our selected data s e t , we were not able t o separate them by sex. When
S estim atinq models for variables in wave 2, w e m ust allow f o r record types WS, W and P ,
or neither being missing in each of waves 2 and 3. The records of corn plete respondents for wave 1 were separated into two sets.
-
i)
30th record types reported in wave 2. f o r wave 3. (R = r e p o r t e d , M record P
=
The following response patterm occurred
missing)
type W S
nu m ber
9 n e or both record types missing in waye 2. sccurred f o r waves 2 and 3. wave 2
The fallowing response patterns
wave 3 W S
P
P
W S
nu m ber
We wD1 not simulate records with the l a s t t h e e patterns because of their small frequencies of occurrence.
F o r each demographic group, each record in Ci) is assigned one of the first three patterns
i) from i i or not used, according t o a s e t of probabilities. The records selected for use are
written out t o form the estimation file for t h a t group.
The following a r e the counts of these patterns f s r t h e three estim ation files: wave 2
P
W S
?
wave 3
W S
white n ale
white 'emale
nonw Ute
M O D E L ESTIMATION
Proce dupe There are 36 cases in this s t u d y f o r w-hich models can be estimated--3 sex/race groups x
-
4 variables x 3 months.
than f o r months 2 and 3.
aecause of previously determined prevalance of change i n We have not had time t o examine in detail a l l t h e models
response t o questions from wave t o wave, more m odels were f i t for month 1 of wave 2
estim ated. These include:
a o n t h 1, wave 2: rcpt - white fem ale, nonw kite earn - white female wpay - white female, nsnwhite maid - nonwhite earn - white female wpay - white female, nsnwhite
manth 2, wave 2: 3 onth 3, wave 2:
also missingness for WS in wave 3 for all records com bined. Table 1 lists the rn onths of data f o r each of these variables used when estimating a m odel for one of these variables in a specific month. The actual terms in the models a r e given in appendix A and their definition3 in appendix B. The statistical package G L I M (Generalized Linear Interactive Modeling) w a s used for modeling.
It w i l l estimate both linear regression and logit models, as well a s many
- 2 user
others. There a r e t w o main reasons it w a s selected: 1 ) it telllinear dependencies among t h e independent variables and leave variables out of t h e model; and 2) it is easy t o add terms t o existing m odel interactively. variables and arrays.
F o r each case estimated, several models were fit by adding t o
:
when there are
?e Linearly dependent ?lete t e r m s from an
i
It also performs transform ations
calculations with
d subtracting from
independent variables used in a prior fit. This was done t o find r: lels t h a t used fewer
n terms without significantly decreasing the closeness of the model fit. L t h e case sf
Linear regression we can actually perform F-tests t o determine t h e effect of an ina-ease or decrease in t h e num ber of terms included. t o decide an a model t o use f a r imputation. For t h e logit models there a r e only asym ptoticaUy approximate chi-square tests (see appendix A ) , so we use our judgement The measwe of f i t given by GLIM is the
s scaled deviance, xhich i the residual sum ~f squares for linear regression models.
Appendix A inciudes c,a~ies af' ~ 3 Q e 2 - s fit t h a t include terms in t h e
i odel, i i
scaled get a
deviance, and degrees af freedom. Some of the cases were modeled extensively tried f a r m at cases.
D iscussion of E s t i m ation R esults
t3
good idea of how the different variables affected the f i t , but only a few rn odels were
Receipt of W agea
Logit m odelz were f i t in order t o estl m a t e the probability t h a t a person did or did not receive wages in a given month.
A
difficulty encountwed was t h a t only a small
percentage of persons reported no receipt of wages. For wave 2, m onth 1 , the counts are
i . ii.
10
3:
191
w'hite females
2 of 206 w kite rn ales
iii. 7 of 134 nonwhites.
Models f o r white females and nonwhites were estimated. It is difficult t o determine i f any individual variables significantly affect receipt. The variances of param e t e r estimates a r e fairly large for most cases, especially for nonwhites. The numbers of nonreceipt a r e really t o o s m a l l t o base any conclusions on them , but there are indications t h a t t h e m odeLs a r e sorn ew hat useful. Seven white females of the 10 nonreceipt cases have probability of nonreceipt ranging from .3433 t o .8927; ,0866 is t h e smallest. probability a s large as .l. Only 1 2 of the 181 receipt cases have a Five of these have probability greater than .3433 with .7060
t h e largest. An additional 40 cases have probability bet ween .O1 and . l . For t h e seven nonwNte nonreceipt cases we esumated P(no receipt) as .2523, .2607, .881 1, .9965, .9988, 1 .O.
.f
048,
9 f t h e 1 2 7 cases with receipt, snly 1 1 have P(no receipt) 2 .1 and 44 have -0bability essentially 0. These results suggest t h a t t h e r e a r e s e t s of variables Ngnly correlated w i t h nr wages. Further exa mination with more aata should be done. ?cei3-of
H edicaid R eceipt
0 nly t h e nonwhites had enough cases of Medicaid receipt t o attem
r
.J
odeling.
I f
Medicaid receipt was reported in a wave for a person, it was reported In a l months of l the wave. No one reported receiving_M edicaid a f t e r not receiving M edicaid in a previoils wave. Thus, x e were essentially modeling the probability of discontinuing Medicaid
=
-
receipt for t h e first m onth in a wave. Of the eight cases t h a t r e m ained on M edicaid in wave 2, seven have P(M edicaid)
1 .O and t h e other medicai aid) =
t h a t went off M edicaid, two have P ( M edicaid) =
.3333. Of the 6 Cases .3333 and t h e others, less than .0002. A l l
those not on Medicaid i n wave 3 have very small P(Medicald) in wave 2.
n This indicates so m e success in r odeling discontinuance of M edicaid, but m ore data is
required f a r f &her investigation.
E a r n i n g s A mounts
There a r e so rn e proble rn s that beco m e apparent f r o rn exa m ination of the data.
1.
Some people report amounts t h a t fluctuate with t h e number of pay periods or weeks in a month; others don't. (See figures C .I t o C . in appendix C .) 4 Do "weeks with paytt correspond dlrec'dy t o llmonthly a r n o ~ n t s ' ~ , can or ltamountsw be from the previous m onthls work while "weeks" is for t h e c u r e n t month?
2.
3.
There a r e l o t s of fluctuations in earnings for some pe others. W e cantt expect t o get good models by grouping tht suggest breaking down records into four types that can identified. a.
5.
?
but not for sgether. W e ther easily
constant earnings deter rninistic fluctuations (e.g., due t o num Ser of wee&) rando m fluctuations severe fluctuatiars
c.
d.
Types (a) and (b) a r e easily i m puted. editing extrem e values.
X
Type (c) can be modeled; (d) can be modeled but
some imputes w i l l have large errors. These cases can be m odeled together w i t h (c) a f t e r
hen using the residual sum of squares t o measure m odel g ~ o d n e s sof f i t , a few very For our longitudinal data large residuals w i l l
large residuals can Sistort t h i s measure.
occur when a person has earnings for a single m onth t h a t are m uch higher or lower than in other months. In f a c t , for month five one residual contributes a very large percentage of the t o t a l deviance for all cases. editing. earnings:
T h b problem can be tackled by the use of data
Ln appendix A models a r e included f a r two types of editing f o r month 4
-
(1)
not using 0 earnings when rn odeling; ( 2 ) editing all m onths according t o
It is apparent t h a t these procedures irn prove the overall f i t ,
m onth-to-month ratias. especially (2).
W e e b with Pay
'vJ eeks pay were scaled by dividing by the m axim um nun ber of work weeks in t h e month
jefore modeling. I n putes w s > i l d be a a a e by determiping the appropriate F a c t i o n the model, m ultiplying b y the
n! axin; urn
il"3n
weeks, and rounding t o the nearest integer.
The results for both white females and nonwhites followed t h e same general pattern in going from month 4 t o month 6. months 5 and 6. The f i t for month 4 was not significant, but was f o r An TNs can be seen b y looking a t t h e F-statistics in appendix A.
examination of residuals from these models gives the sam e story. In month 4 only one of t h e records with fewer than t h e maximum weeks reported was fitted correctly, while about 50 percent were fitted correctly for 3 of the 4 cases in months 5 and 6. The reason f o r this f i t pattern is probably t h e increase in inform ation available for use as S U C C ~ S S ~ V ~
m ontks a r e m odeled. A reason t h a t it is difficult in general t o m odel wpay h t h a t there
a r e not many cases of fewer t h a n maxim um weeks reported (less than 1 0 percent f o r white fern ales). Separately estimating rn odels for people whose w pay are lff'requently'' l e s s t h a n t h e rnaxirnun mayimprovethisfit. H isaing id age and S a l a y Records
W e wanted t o s e e i there was any inform ation t h a t would indicate when a person would f
not respond in wave 3. That i s , does one's response t o questions in wave 2 tell u s anything about the propensity t o respond in wave 3? New estimation data sets for white
-8-
males a n d f e m a l e s w e r e a - e a t e d by s e l e c t i n g s u b s e t s d i r e c t l y from r e c o r d s of t y p e
T h e f i t 9 f r o m t h i s m odeling w e r e v e r y poor, e s p e c i a l l y f o r t h o s e missing i n wave 3 . IMPUTATION RESULTS
u).
T h e i n p u t a t i o n o f v a r i a b l e s c n t o a d a t a fiLe is p e r f o r m e d by a F 0 R T R A N program t h a t uses t h e model param e t e r s e s t i m a t e d by GLIM. E a c h m o n t h t h a t is i m p u t e d r e q u i r e s a d i f f e r e n t m o d i f i c a t i o n of t b program b e c a u s e d i f f e r e n t months of t h e i n d e p e n d e n t v a r i a b l e s a r e used.
A version f o r i m puting month 4 was p r e p a r e d a n d used t o i m p u t e .
*
r c p t , wpay, a n d e a r n f o r w h i t e f e m a l e s . This i m p u t a t i o n was d o n e f o r a U t h e a p p r o p r i a t e r e c o r d s with c o m p l e t e wave 1 a n d wave 2 responses. The d i s t r i b u t i o n s s f i m p u t e d a n d
-
o b s e r v e d v a l u e s a r e c o m p a r e d below
.
rcpt Yes no
observed
i m puted
549 580
36
5
3 bserved
2
8
0
15
0
i m puted
0
10 0
36
3
5 4 1 582
E a r n i n g s were a r b i t r a r i l y placed i n t o c a t e g o r i e s f o r t h e purpose o f this com parison. earnings u p p e r bound observed i m puted
200 400 600 800 107 79 62 75
1000 1200 1500 2000 2500 85 8 5
3000 4000 4 1 5
+
99 102 109 108
49
48
30 30
26
31
l4 12
2
4
3
T h e r e s u l t s for r c p t a n d w p a y are n o t v e r y good. They f a l l 0 w t h e p a t t e r n s e x p e c t e d P o r n t h e m o d e l flta as discussed previous;ly. The a g r e e m e n t f o r e a r n i n g s is v e r y c l o s e , e s p e c i a l l y f o r a m o u n t s a b o v e $400. From our e x a m i n a t i o n of t h e e a r n i n g s models and r e s i d u a l s , we e x p e c t t h a t t h e r e a r e s o m e r e p o r t e d a m o u n t s c l a s e t o z e r o t h a t w i l l n o t be i m p u t e d a c c u r a t e l y by t h i s model. This d e f i n i t e l y shows u p o n t h e l o w e r tail of t h e a b o v e distributions. 4 d d i t i o n a l corn parisons f o r u n c a t e g o r i z e d e a r n i n g s a r e s h o w n i n appendix C .
CONCLUSIONS
1.
Not enough cases with no receipt of kages, Medicaid coverage, or weeks with pay
less than t h e mahm urn occurred t o be able t o model them well.
2.
We should t r y t y improve t h e f i t for w?ay in the finst month of a wave. Part of our difficulty might be t h a t month 4 can have 5 weeks, but months 2,3,5,5,7 and 8 a l l have 4 weeks. Another type of scaling than the one we used might be needed.
3.
Imputes for r c p t a r e based on Probkcpt).
Probkcpt) 1 .6567, and a s m a l l percentage of the receipt cases have probabilities t h a t a r e small. The distribution of imputed r c p t would better match t h a t of observed r c p t if we a d u s t e d t h e in putation gobabilities t o m ake use of this inform ation. One r reason for this result i the very small num ber of nonreceipt cases. s
-
Y o s t of the nonreceipt cases have
4 Before modeling earn, t h e records should be separated into groups according t o .
variability of amount reported. For the rn ost vmiable groups, 2ata editing rn ay dlso 3e needed t3 i m prove t h e model f i t .
5. 3ur a t t e rn p t t o m ode1 probability of norresponse in-w ave 3 failed co m pletely. If this
continues t o be t r u e with other data sets, it would tell us t h a t there a r e no identifiable differences between respondents and nonrespondents for this record type.
T h i s would support the application of models f i t t o respondents t o i m putation
of nonrespondents. RECOMMENDATIONS FOR F U R T H E R STUDY In t h e c u r e n t study we have accumulated knowledge about the longitudinal behavior of the variables we attem pted t o model, including the frequency of different responses.
M uch of this came about from examining the data in order t o see if t h w e were reasons
far t h e estimated m odels t o look a s they d i d . M uch of this knowledge is sum rn arized in the previous section. Based on what we have learned, we suggest our work continue XLong the following Lines.
1.
Use as our data s e t tlhree consecutive Gaves f r o n the Survey sf :nco m e and Program Participation.
2.
Construct our i m putation f a e more carefully so t h a t it has more records with infrequently occurring responses. (See ( 1 ) under Conclusions.)
3.
Look into ways f o r i m proving t h e estim ated models. interactions.
For exam ple, including more
response variables, different functions of previously used response variables, and
4.
Deter mine ways of classifying longitudinal patterns of observed values for earn and
w pay in order t~ f i t m ore accurate models.
5.
Investigate the feasibility of using prob(rcpt rcpt.
=
yes) differently for the imputation of
6.
Look further into estimating the probability of W S nonresponse. This can give more inform atisn about t h e nonresponse mechanism or lack thereof.
7.
Fit
models for all months and investigate the longitudinal ccsnsistency of the
i n putatians.
APPENDIX A
The models f i t t o the data a r e sum rnarized here. missing record type in wave 3.
Each model is f i t for a particular
dependent variable, month, and de m ograpnic group. The exceptian is the l a s t table for
Zacn table has four coium ns containing inform ation abcut tne xoael being f i t . the variables. For other rn odels, a line beginning w i t h a preceding model and a Line beginning with a
Under
variables a r e listed t h e expianatory variables in t h e model. F x nodel 1, this is a list of
11+11
gives variaoles added t o t h e
gives variables removed Porn the preneding model. Occasionally t h e r e w i l l be a listing of the form " ( 5 ) + - - - ;" (5) i s , Column 2 gives the scaled deviance for each model. If' 9,
is the likelihood of the full
"-"
t h e model which is being altered a t t& step, not the preceding model.
model (using a l l t h e inform ation in t h e observations) and P C is the likelihood of t h e c u r e n t model, t h e n scaled deviance is defined b y S ( c , f) ?or
th3
=
-2
log (Lc/2,)
.
rn inus num ber
line% regressior, msdels f i t t e a , this i s the s a n e ? the re%cu& suz s f squares. s
C alum n
3 gives the degrees of freedo rn (num ber af observations
of
parameters estimated) for each model. significance of the regression.
For wpay, column 4 has F-tests f a r t h e
For other m odels, this column has com m ents concerning
t h e correlation m atrix of t h e esti m ated param eters. In order t o deter mine whether adding terms t o a m odel i m proves 3r deleting t e r m s fr3m a model degrades t h e f i t , we can use an asymptotic t e s t similar t o those of analysis of variance. Let m odel 2 with r 2 d e p e e s of freedo rn be nested within m odel 1 with r degrees of f'reedom. Lf t h e f u l l model f has n d e r e e s of freedom, then
where the distribution is exact f o r normal error models and approxim ate for others. For co m paring rn odels 1 and 2, we can then look a t
R CPT
-
white f e m a l e s - month 4
variables
1.
deviance 41.55
df 150
co m m ents
rm3,rm2,rml,rp3, mm1, m O , wpm1, em I n , mp1, wpp3, e p 3
l o t s of aliasing 2 high c o r r e l a t i o r ~
2.
+ a g e , e d , mars, r e 1
32.32
172
no high correlations used f a r i m putation
I
I I I I I I II I I I I I I I I D I I
M E DIC AID
- nonwhites vari3bles
month 4
deviance 3.56 7
df 1 20
co rn- t s -m e n
1.
rrnl,rO,rp3,ma3, mm2, m m1, mp3, wpmi, wpO,
m w k 3 , em 1 , eO, me3, em l r , eOr
i o t s of a a s i n g high corr elatians
2.
- g m , m m 3 , n m 2 , mwk3, me3, em l r , eCv
3.567
122
o n e high carrelation
3.
b.
-wpm1
3.567 3.82 4
123
-wpO, -em1
-PO
125
; 26
-
3.
3.62a
used far i n p t a t i o n
EARNINGS
- white females
nonth 4
all 1.
cases
em 3, em 2, e n 1, ep3, me3 +age, ed,
s m sa
in
: 821
+
04
185
2.
ars, rel,
0 earnings omitted
1.
rn anth 4
. .
1268
+
em3, em3a, e3, em2,ern2a, e 2 , em 1, em l a , e l , ep3, ep3a, e3?, me3 -em 3a, em 2a, em l a , ep3a
04
168
2.
1372 1311 1316 1342
+
04
1 72
used for i m putation
+ 04
+
170 172 173 152
m se increased aver (4)
04
34
A
5. +ep3a, age, ed, mars,
r e l , smsa earnings edited
1.
1285
+
04
month 4 622 1435
+
+
em3, em2, e m l , ep3, me3 log(earn) dependent
all cases rn o n t h 5
04
170 170 muchworse
2.
35
1. 2.
em3, em2, e m l , ep3 +age, ed, m a m , rel, sm sa
a l cases l
rn onth 6
3348
3067
+
04
184
3ne very large r e s i d u a l
+
04
2.
+age, ed, m ars,rel, smsa
WPAY
- whitefemdles-
month 4 deviance
variables
1.
wprn3, w=,rn2,w p m l , W P P ep3, em 1, mep3, ~, age, ed, mar, eam l , r e l , cnt, smsa, region, em 1 , ern, em 2, em 3, m wp3
C annot reject hypothesis that
regression coefficients are 0.
1.41 9
1 69
m ode1 used for i m putation
zero cases out of 1 1 where H of w pay
.4
-9
u
U
( n
w
1 I
I
qn r
I
I
f- I Ln l
I
n n
I
I I I
:J.
B
I I
10.
I P
b l
t
1
1
R
.
1
1 0
C I
J
UIO I
rntd
.. n .
on
m
0
4
I.
. .
1
I
I
I
. ..
8
I no
I I I I 0
I J
.
I YI
.
1
I m I N I
a
I
I
: *a 5
I
I I
0
.
0 I
.
I 1
1
r. 1
1 d I
a
.
I
I I
I
1
H-----~mHH""""uUW
m
0
m
.
.
a I
**
r (
.
I I
I
r. l
. 0
.
I I I
.
.
I rI m I m I "
I
m
I
m
UI
I
m h
:? I h
0 1
b I I
f .
*.* * bI
I
I
I N
I I
m
1 d
w
I I
* u - - w * n r m m
0
NCI8 * n m w u * u u w w * w n w w * u u n u * c r nI w ~ c t n u w * u m - w * u u n m * I OW w .r--r 2
m e n m n ~ n w n n e u w I
nu* m n
8
.
I
I I
I
YI
m
Y ,
U
a
o
0
r (
m
w
0
e "
0
N
e
n
N
N
9
n
N
h
N
a
0
n
4
Id
' I . rn +
o
* .
*
9
r(
"!
$, ?
e
H
r(
J,
n
b
n h u
Ih
: t:
1 n
YI 0.
R
I
:
E
0
R
*
*
P
I
ru
I
"
2 a
Figure (2.6
Histogram of imputed amounts for amounts reported as zero
Figure C.7
Histogram of percentage error of impute for N01,~ero r e p o r t e - a o u n t s
00000000 00000000
OOOOOOCO
....
..
mmmmmmra
CImOodunrt