Embed
Email

Working Paper 11 An Investigation of Model-Based Imputation P

Document Sample
Working Paper 11 An Investigation of Model-Based Imputation P
THE SURVEY OF INCOME AND PROGRAM PARTICIPATION



AN INVESTIGATION OF MODEL-BASED IMPUTATION PROCEDURES USING DATA FROM FROM THE INCOME SURVEY DEVELOPMENT PROGRAM No. 8603 )



)



by Vicki J. Huggins and Lynn Weidman Bureau of the Census



June 1986



Acknowledgements T h i s p a p e r was p r e p a r e d by V i c k i J. Huggins and Lynn Aeidman of t h e S t a t i s t i c a l Research O i v i s i o n Bureau o f t h e Census.



Suggested C i t a t i o n H u g g i n s , V i c k i J. and L y n n Weidman. "An I n v e s t i g a t i o n o f Model-based I m p u t a t i o n P r o c e d u r e s U s i n g D a t a from t h e Income Survey Development Program," SIPP Working Paper S e r i e s No. 8603. Washington, 0.C.: U.S. Bureau o f t h e Census, 1986.



Survey of Income and Proqram Participation

AN I N V E S T I G A T I O N OF MODEL-BASED I M P U T A T I O N PROCEDURES U S I N G U A T A FROM THE I N C O M E SURVEY DEVELOPMENT PROGRAM No. 8603 by V i c k i J. H u g y i n s a n d Lynn W e i d r n a n Bureau of t h e Census



June 1986



Acknowledgements T h i s p a p e r was p r e p a r e d by V i c k i J. H u g g i n s and L y n n Weidnan of t h e S t a t i s t i c a l Research D i v i s i o n Bureau o f t h e Census.



Suggested C i t a t i o n H u g g i n s , V i c k i 3. and L y n n Weidman. "An I n v e s t i g a t i o n o f Model 3ased I m p u t a t i o n P r o c e d u r e s U s i n g D a t a from t h e Income Survey Devel - e n t Program," SIPP Working Paper S e r i e s No. 8603. Washington, 9.C.: 8. Bureau o f t h e Census, 1986.



TABLE



OF CONTENTS



.............................. Creation of Estimation F i l e s . . . . . . . . . . . . . . . . . . . . . . Model E s t i m a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . averview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oiscussion o f Estimation 2esults . . . . . . . . . . . . . . . . . . . R e c e i p t o f Wages . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction



1



3

4 4

4



5

6



......................... E a r n i n g s Amounts . . . . . . . . . . . . . . . . . . . . . . . . . Weeks w i t h Pay . . . . . . . . . . . . . . . . . . . . . . . . . . M i s s i n g Wage and S a l a r y R e c o r d s . . . . . . . . . . . . . . . . . .

Medicaid Receipt InputationResults Conclusions



6



7



7

8



............................. ................................



9



.....................9 AppendixesA.Mode1sFittotheData . . . . . . . . . . . . . . . . . . . 11 B . V a r i a b l e T r a n s f o r m a t i o n Used i n F i t t i n y M o d e l s . . . . . . . . 2 2 . . . . . . . . . . . . . . . . . . . . . . 26 C . E a r n i n g s Amounts

Kecon~nendationsf o r F u r t n e r Study



INTRODUCTION The purpose 3 f this study is t o investigate the feasibility of using m odel-based imputation m ethods f o r record nonresponse in a longitudinal survey. Record nonresponse means t h a t the responses t o an entire s e t of questions (record type) a r e missing for a wave. In this study we have selected four variables t o model and impute: Ci) r c p t

(ii) w pay

=



=



receipt of earnings;

=



weeks worked with pay; (iii)earn



=



earnings am aunt; and (iv) maid



Medicaid



coverage.



Maid is on the person ( ? ) r e c o r d and t h e others on t h e wage and salary (WS)



record. F o r any wave, a person may respond t o neither record type, t o P , or t o both. So t h e first three variables a r e reported or missing slmu'ltaneously and maid may or may not be missing a t t h e s a m e t i m e as t h e others.

In order t o reduce the amount of data manipulation required in this study, we want t o



select a subset of the avanable ISDP waves.



The methods we envision w i l l impute



a onthz in their order of occurrence, so t h a t a U previous months of data a r e available a t



t h e t i n e a given month is inputeci. Thus, we w i l l , a e t t r e e xaves of data--waves 1 and 2

w i l l Se complete data and t h e variables i n t h e months of wave 2 w i l l be modeled. Wave 3

will



include missing record types so that we nay model the relationship of missing We w i l l use only one rotation group in order t o reduce



variables t o responses in wave 2.



t h e amount of data m anipulatrion required and any corn plications which would be caused by waves overlapping for different rotation groups; i.e., a l data will cover the same l three waves and 9 months. Previous study of important factor. the relationship between de m ographic and em ploym ent-related

F a r t N s reason we will attem pt t o p u t the data into four race-sex



variables has shown t h a t t h e r a c e (white, nonwhite) and sex status of a person is an cells and model e a c h one separately. This, in effect, models the interaction of race-sex with all the other variables in the model. Because of the small number of records avaitahle f o r use a f t e r fLilfWng the data requirements introduced in the previous

r paragraph, we a ay not be able t o fit models for all four race-sex cells. O we may have



t o reduce t h e number of variables in some of the models.

For the data in each cell, we m ust estimate m odels and evaluate imputations which use



these models. The imputations are done by month and witNn month a specdied order of variables is used. When imputing a variable, the current

2



onth value of a l l previ0LLsly



imputed variables on t h e same and other record types are available, as a r e observed variables from other record types.

A l l previous month variables are available as a r e all



following month variables t h a t a r e s bserved.

O f the four variables we a r e rr, odeling, two of them w i l l be treated as continuou

w i t h pay and earnings) and t w o of them

4



eeks



as categorical keceipt of earnings and



P"



.dicaid



coverage). Each m onth for each variable will be rn odeled separately. The explanatory variables w i l l include those shown in Table 1 and values of s o n e demographic variables in wave 2.

F o r the categorical variables we w i 3 l f i t logit m odels and for the continuous



variables, linear regression models.

*



Table 1 : M onths of Variables Used in Fitting M odels



Variable in M odel

M onth M odeled



Variable M odeled rc p t pay earn m aid



rcpt



WPaY



earn



m aid



w



rc pt w Pay earn m aid r c pt WPaY earn



m aid

The num bers a r e t h e months f o r which the variable a t t h e t o p of the colum n is used in modeling t h e variable a t the l e f t .

W e w i l l discuss t h r e e major s t a g e s in tkis study:



1.



Creation of data f n e s t h a t include nonresponse t o be used for estim atini, para m eters.

E s t i m ating m odeLs and searching for those m ost applicable.



3



odel



2.



3.



Imputing values onto a data file for com parison with originally reported values.



Following t h a t , we w i l l present conclusions and recom aendations for further study. CREATION O F ESTIMATION F L E S

A file of records t o be used for m oael estimation w a s created for each of white m dies,



wnite females, and n ~ n w h i t e s . 3ecause of the smali number ~f records of nonwhites available in our selected data s e t , we were not able t o separate them by sex. When

S estim atinq models for variables in wave 2, w e m ust allow f o r record types WS, W and P ,



or neither being missing in each of waves 2 and 3. The records of corn plete respondents for wave 1 were separated into two sets.



-



i)



30th record types reported in wave 2. f o r wave 3. (R = r e p o r t e d , M record P

=



The following response patterm occurred



missing)



type W S



nu m ber



9 n e or both record types missing in waye 2. sccurred f o r waves 2 and 3. wave 2



The fallowing response patterns



wave 3 W S



P



P



W S



nu m ber



We wD1 not simulate records with the l a s t t h e e patterns because of their small frequencies of occurrence.

F o r each demographic group, each record in Ci) is assigned one of the first three patterns



i) from i i or not used, according t o a s e t of probabilities. The records selected for use are

written out t o form the estimation file for t h a t group.



The following a r e the counts of these patterns f s r t h e three estim ation files: wave 2

P

W S

?



wave 3

W S



white n ale



white 'emale



nonw Ute



M O D E L ESTIMATION



Proce dupe There are 36 cases in this s t u d y f o r w-hich models can be estimated--3 sex/race groups x



-



4 variables x 3 months.

than f o r months 2 and 3.



aecause of previously determined prevalance of change i n We have not had time t o examine in detail a l l t h e models



response t o questions from wave t o wave, more m odels were f i t for month 1 of wave 2



estim ated. These include:

a o n t h 1, wave 2: rcpt - white fem ale, nonw kite earn - white female wpay - white female, nsnwhite maid - nonwhite earn - white female wpay - white female, nsnwhite



manth 2, wave 2: 3 onth 3, wave 2:



also missingness for WS in wave 3 for all records com bined. Table 1 lists the rn onths of data f o r each of these variables used when estimating a m odel for one of these variables in a specific month. The actual terms in the models a r e given in appendix A and their definition3 in appendix B. The statistical package G L I M (Generalized Linear Interactive Modeling) w a s used for modeling.

It w i l l estimate both linear regression and logit models, as well a s many

- 2 user



others. There a r e t w o main reasons it w a s selected: 1 ) it telllinear dependencies among t h e independent variables and leave variables out of t h e model; and 2) it is easy t o add terms t o existing m odel interactively. variables and arrays.

F o r each case estimated, several models were fit by adding t o

:



when there are



?e Linearly dependent ?lete t e r m s from an

i



It also performs transform ations



calculations with



d subtracting from



independent variables used in a prior fit. This was done t o find r: lels t h a t used fewer



n terms without significantly decreasing the closeness of the model fit. L t h e case sf



Linear regression we can actually perform F-tests t o determine t h e effect of an ina-ease or decrease in t h e num ber of terms included. t o decide an a model t o use f a r imputation. For t h e logit models there a r e only asym ptoticaUy approximate chi-square tests (see appendix A ) , so we use our judgement The measwe of f i t given by GLIM is the

s scaled deviance, xhich i the residual sum ~f squares for linear regression models.



Appendix A inciudes c,a~ies af' ~ 3 Q e 2 - s fit t h a t include terms in t h e



i odel, i i



scaled get a



deviance, and degrees af freedom. Some of the cases were modeled extensively tried f a r m at cases.

D iscussion of E s t i m ation R esults



t3



good idea of how the different variables affected the f i t , but only a few rn odels were



Receipt of W agea



Logit m odelz were f i t in order t o estl m a t e the probability t h a t a person did or did not receive wages in a given month.

A



difficulty encountwed was t h a t only a small



percentage of persons reported no receipt of wages. For wave 2, m onth 1 , the counts are

i . ii.

10

3:



191



w'hite females



2 of 206 w kite rn ales



iii. 7 of 134 nonwhites.



Models f o r white females and nonwhites were estimated. It is difficult t o determine i f any individual variables significantly affect receipt. The variances of param e t e r estimates a r e fairly large for most cases, especially for nonwhites. The numbers of nonreceipt a r e really t o o s m a l l t o base any conclusions on them , but there are indications t h a t t h e m odeLs a r e sorn ew hat useful. Seven white females of the 10 nonreceipt cases have probability of nonreceipt ranging from .3433 t o .8927; ,0866 is t h e smallest. probability a s large as .l. Only 1 2 of the 181 receipt cases have a Five of these have probability greater than .3433 with .7060



t h e largest. An additional 40 cases have probability bet ween .O1 and . l . For t h e seven nonwNte nonreceipt cases we esumated P(no receipt) as .2523, .2607, .881 1, .9965, .9988, 1 .O.

.f



048,



9 f t h e 1 2 7 cases with receipt, snly 1 1 have P(no receipt) 2 .1 and 44 have -0bability essentially 0. These results suggest t h a t t h e r e a r e s e t s of variables Ngnly correlated w i t h nr wages. Further exa mination with more aata should be done. ?cei3-of



H edicaid R eceipt

0 nly t h e nonwhites had enough cases of Medicaid receipt t o attem



r



.J



odeling.



I f



Medicaid receipt was reported in a wave for a person, it was reported In a l months of l the wave. No one reported receiving_M edicaid a f t e r not receiving M edicaid in a previoils wave. Thus, x e were essentially modeling the probability of discontinuing Medicaid

=



-



receipt for t h e first m onth in a wave. Of the eight cases t h a t r e m ained on M edicaid in wave 2, seven have P(M edicaid)

1 .O and t h e other medicai aid) =



t h a t went off M edicaid, two have P ( M edicaid) =



.3333. Of the 6 Cases .3333 and t h e others, less than .0002. A l l



those not on Medicaid i n wave 3 have very small P(Medicald) in wave 2.



n This indicates so m e success in r odeling discontinuance of M edicaid, but m ore data is

required f a r f &her investigation.

E a r n i n g s A mounts



There a r e so rn e proble rn s that beco m e apparent f r o rn exa m ination of the data.

1.



Some people report amounts t h a t fluctuate with t h e number of pay periods or weeks in a month; others don't. (See figures C .I t o C . in appendix C .) 4 Do "weeks with paytt correspond dlrec'dy t o llmonthly a r n o ~ n t s ' ~ , can or ltamountsw be from the previous m onthls work while "weeks" is for t h e c u r e n t month?



2.



3.



There a r e l o t s of fluctuations in earnings for some pe others. W e cantt expect t o get good models by grouping tht suggest breaking down records into four types that can identified. a.

5.



?



but not for sgether. W e ther easily



constant earnings deter rninistic fluctuations (e.g., due t o num Ser of wee&) rando m fluctuations severe fluctuatiars



c.



d.



Types (a) and (b) a r e easily i m puted. editing extrem e values.

X



Type (c) can be modeled; (d) can be modeled but



some imputes w i l l have large errors. These cases can be m odeled together w i t h (c) a f t e r



hen using the residual sum of squares t o measure m odel g ~ o d n e s sof f i t , a few very For our longitudinal data large residuals w i l l



large residuals can Sistort t h i s measure.



occur when a person has earnings for a single m onth t h a t are m uch higher or lower than in other months. In f a c t , for month five one residual contributes a very large percentage of the t o t a l deviance for all cases. editing. earnings:

T h b problem can be tackled by the use of data



Ln appendix A models a r e included f a r two types of editing f o r month 4



-



(1)



not using 0 earnings when rn odeling; ( 2 ) editing all m onths according t o

It is apparent t h a t these procedures irn prove the overall f i t ,



m onth-to-month ratias. especially (2).



W e e b with Pay

'vJ eeks pay were scaled by dividing by the m axim um nun ber of work weeks in t h e month



jefore modeling. I n putes w s > i l d be a a a e by determiping the appropriate F a c t i o n the model, m ultiplying b y the

n! axin; urn



il"3n



weeks, and rounding t o the nearest integer.



The results for both white females and nonwhites followed t h e same general pattern in going from month 4 t o month 6. months 5 and 6. The f i t for month 4 was not significant, but was f o r An TNs can be seen b y looking a t t h e F-statistics in appendix A.



examination of residuals from these models gives the sam e story. In month 4 only one of t h e records with fewer than t h e maximum weeks reported was fitted correctly, while about 50 percent were fitted correctly for 3 of the 4 cases in months 5 and 6. The reason f o r this f i t pattern is probably t h e increase in inform ation available for use as S U C C ~ S S ~ V ~



m ontks a r e m odeled. A reason t h a t it is difficult in general t o m odel wpay h t h a t there

a r e not many cases of fewer t h a n maxim um weeks reported (less than 1 0 percent f o r white fern ales). Separately estimating rn odels for people whose w pay are lff'requently'' l e s s t h a n t h e rnaxirnun mayimprovethisfit. H isaing id age and S a l a y Records



W e wanted t o s e e i there was any inform ation t h a t would indicate when a person would f

not respond in wave 3. That i s , does one's response t o questions in wave 2 tell u s anything about the propensity t o respond in wave 3? New estimation data sets for white



-8-



males a n d f e m a l e s w e r e a - e a t e d by s e l e c t i n g s u b s e t s d i r e c t l y from r e c o r d s of t y p e

T h e f i t 9 f r o m t h i s m odeling w e r e v e r y poor, e s p e c i a l l y f o r t h o s e missing i n wave 3 . IMPUTATION RESULTS



u).



T h e i n p u t a t i o n o f v a r i a b l e s c n t o a d a t a fiLe is p e r f o r m e d by a F 0 R T R A N program t h a t uses t h e model param e t e r s e s t i m a t e d by GLIM. E a c h m o n t h t h a t is i m p u t e d r e q u i r e s a d i f f e r e n t m o d i f i c a t i o n of t b program b e c a u s e d i f f e r e n t months of t h e i n d e p e n d e n t v a r i a b l e s a r e used.

A version f o r i m puting month 4 was p r e p a r e d a n d used t o i m p u t e .

*



r c p t , wpay, a n d e a r n f o r w h i t e f e m a l e s . This i m p u t a t i o n was d o n e f o r a U t h e a p p r o p r i a t e r e c o r d s with c o m p l e t e wave 1 a n d wave 2 responses. The d i s t r i b u t i o n s s f i m p u t e d a n d



-



o b s e r v e d v a l u e s a r e c o m p a r e d below



.

rcpt Yes no



observed

i m puted



549 580



36

5



3 bserved



2



8

0



15

0



i m puted



0



10 0



36



3



5 4 1 582



E a r n i n g s were a r b i t r a r i l y placed i n t o c a t e g o r i e s f o r t h e purpose o f this com parison. earnings u p p e r bound observed i m puted



200 400 600 800 107 79 62 75



1000 1200 1500 2000 2500 85 8 5



3000 4000 4 1 5



+



99 102 109 108



49

48



30 30



26



31



l4 12



2



4



3



T h e r e s u l t s for r c p t a n d w p a y are n o t v e r y good. They f a l l 0 w t h e p a t t e r n s e x p e c t e d P o r n t h e m o d e l flta as discussed previous;ly. The a g r e e m e n t f o r e a r n i n g s is v e r y c l o s e , e s p e c i a l l y f o r a m o u n t s a b o v e $400. From our e x a m i n a t i o n of t h e e a r n i n g s models and r e s i d u a l s , we e x p e c t t h a t t h e r e a r e s o m e r e p o r t e d a m o u n t s c l a s e t o z e r o t h a t w i l l n o t be i m p u t e d a c c u r a t e l y by t h i s model. This d e f i n i t e l y shows u p o n t h e l o w e r tail of t h e a b o v e distributions. 4 d d i t i o n a l corn parisons f o r u n c a t e g o r i z e d e a r n i n g s a r e s h o w n i n appendix C .



CONCLUSIONS

1.



Not enough cases with no receipt of kages, Medicaid coverage, or weeks with pay



less than t h e mahm urn occurred t o be able t o model them well.

2.



We should t r y t y improve t h e f i t for w?ay in the finst month of a wave. Part of our difficulty might be t h a t month 4 can have 5 weeks, but months 2,3,5,5,7 and 8 a l l have 4 weeks. Another type of scaling than the one we used might be needed.



3.



Imputes for r c p t a r e based on Probkcpt).



Probkcpt) 1 .6567, and a s m a l l percentage of the receipt cases have probabilities t h a t a r e small. The distribution of imputed r c p t would better match t h a t of observed r c p t if we a d u s t e d t h e in putation gobabilities t o m ake use of this inform ation. One r reason for this result i the very small num ber of nonreceipt cases. s



-



Y o s t of the nonreceipt cases have



4 Before modeling earn, t h e records should be separated into groups according t o .

variability of amount reported. For the rn ost vmiable groups, 2ata editing rn ay dlso 3e needed t3 i m prove t h e model f i t .



5. 3ur a t t e rn p t t o m ode1 probability of norresponse in-w ave 3 failed co m pletely. If this

continues t o be t r u e with other data sets, it would tell us t h a t there a r e no identifiable differences between respondents and nonrespondents for this record type.

T h i s would support the application of models f i t t o respondents t o i m putation



of nonrespondents. RECOMMENDATIONS FOR F U R T H E R STUDY In t h e c u r e n t study we have accumulated knowledge about the longitudinal behavior of the variables we attem pted t o model, including the frequency of different responses.

M uch of this came about from examining the data in order t o see if t h w e were reasons



far t h e estimated m odels t o look a s they d i d . M uch of this knowledge is sum rn arized in the previous section. Based on what we have learned, we suggest our work continue XLong the following Lines.

1.



Use as our data s e t tlhree consecutive Gaves f r o n the Survey sf :nco m e and Program Participation.



2.



Construct our i m putation f a e more carefully so t h a t it has more records with infrequently occurring responses. (See ( 1 ) under Conclusions.)



3.



Look into ways f o r i m proving t h e estim ated models. interactions.



For exam ple, including more



response variables, different functions of previously used response variables, and



4.



Deter mine ways of classifying longitudinal patterns of observed values for earn and

w pay in order t~ f i t m ore accurate models.



5.



Investigate the feasibility of using prob(rcpt rcpt.



=



yes) differently for the imputation of



6.



Look further into estimating the probability of W S nonresponse. This can give more inform atisn about t h e nonresponse mechanism or lack thereof.



7.



Fit



models for all months and investigate the longitudinal ccsnsistency of the



i n putatians.



APPENDIX A



The models f i t t o the data a r e sum rnarized here. missing record type in wave 3.



Each model is f i t for a particular



dependent variable, month, and de m ograpnic group. The exceptian is the l a s t table for



Zacn table has four coium ns containing inform ation abcut tne xoael being f i t . the variables. For other rn odels, a line beginning w i t h a preceding model and a Line beginning with a



Under



variables a r e listed t h e expianatory variables in t h e model. F x nodel 1, this is a list of

11+11



gives variaoles added t o t h e



gives variables removed Porn the preneding model. Occasionally t h e r e w i l l be a listing of the form " ( 5 ) + - - - ;" (5) i s , Column 2 gives the scaled deviance for each model. If' 9,

is the likelihood of the full



"-"



t h e model which is being altered a t t& step, not the preceding model.



model (using a l l t h e inform ation in t h e observations) and P C is the likelihood of t h e c u r e n t model, t h e n scaled deviance is defined b y S ( c , f) ?or

th3



=



-2



log (Lc/2,)



.

rn inus num ber



line% regressior, msdels f i t t e a , this i s the s a n e ? the re%cu& suz s f squares. s



C alum n



3 gives the degrees of freedo rn (num ber af observations



of



parameters estimated) for each model. significance of the regression.



For wpay, column 4 has F-tests f a r t h e



For other m odels, this column has com m ents concerning



t h e correlation m atrix of t h e esti m ated param eters. In order t o deter mine whether adding terms t o a m odel i m proves 3r deleting t e r m s fr3m a model degrades t h e f i t , we can use an asymptotic t e s t similar t o those of analysis of variance. Let m odel 2 with r 2 d e p e e s of freedo rn be nested within m odel 1 with r degrees of f'reedom. Lf t h e f u l l model f has n d e r e e s of freedom, then



where the distribution is exact f o r normal error models and approxim ate for others. For co m paring rn odels 1 and 2, we can then look a t



R CPT



-



white f e m a l e s - month 4



variables

1.



deviance 41.55



df 150



co m m ents



rm3,rm2,rml,rp3, mm1, m O , wpm1, em I n , mp1, wpp3, e p 3



l o t s of aliasing 2 high c o r r e l a t i o r ~



2.



+ a g e , e d , mars, r e 1



32.32



172



no high correlations used f a r i m putation



I



I I I I I I II I I I I I I I I D I I



M E DIC AID



- nonwhites vari3bles



month 4



deviance 3.56 7



df 1 20



co rn- t s -m e n



1.



rrnl,rO,rp3,ma3, mm2, m m1, mp3, wpmi, wpO,

m w k 3 , em 1 , eO, me3, em l r , eOr



i o t s of a a s i n g high corr elatians



2.



- g m , m m 3 , n m 2 , mwk3, me3, em l r , eCv



3.567



122



o n e high carrelation



3.

b.



-wpm1



3.567 3.82 4



123



-wpO, -em1

-PO



125

; 26



-



3.



3.62a



used far i n p t a t i o n



EARNINGS



- white females

nonth 4



all 1.



cases



em 3, em 2, e n 1, ep3, me3 +age, ed,

s m sa

in



: 821



+



04



185



2.



ars, rel,



0 earnings omitted

1.



rn anth 4



. .

1268

+



em3, em3a, e3, em2,ern2a, e 2 , em 1, em l a , e l , ep3, ep3a, e3?, me3 -em 3a, em 2a, em l a , ep3a



04



168



2.



1372 1311 1316 1342



+



04



1 72



used for i m putation



+ 04

+



170 172 173 152

m se increased aver (4)



04

34



A



5. +ep3a, age, ed, mars,

r e l , smsa earnings edited

1.



1285



+



04



month 4 622 1435

+

+



em3, em2, e m l , ep3, me3 log(earn) dependent

all cases rn o n t h 5



04



170 170 muchworse



2.



35



1. 2.



em3, em2, e m l , ep3 +age, ed, m a m , rel, sm sa

a l cases l

rn onth 6



3348

3067



+



04



184



3ne very large r e s i d u a l



+



04



2.



+age, ed, m ars,rel, smsa



WPAY



- whitefemdles-



month 4 deviance



variables

1.



wprn3, w=,rn2,w p m l , W P P ep3, em 1, mep3, ~, age, ed, mar, eam l , r e l , cnt, smsa, region, em 1 , ern, em 2, em 3, m wp3

C annot reject hypothesis that



regression coefficients are 0.



1.41 9



1 69



m ode1 used for i m putation



zero cases out of 1 1 where H of w pay



.4



-9

u



U

( n



w



1 I



I



qn r



I



I



f- I Ln l



I



n n



I

I I I



:J.

B



I I



10.

I P



b l

t

1

1



R



.



1



1 0



C I



J



UIO I

rntd



.. n .

on



m



0

4



I.



. .

1

I

I



I



. ..

8



I no

I I I I 0

I J



.

I YI



.



1



I m I N I



a



I



I



: *a 5

I



I I



0



.



0 I



.

I 1

1



r. 1



1 d I



a



.

I

I I

I

1



H-----~mHH""""uUW



m

0



m



.



.

a I



**

r (



.



I I

I



r. l



. 0



.

I I I



.



.



I rI m I m I "

I



m



I



m

UI



I



m h



:? I h



0 1

b I I



f .



*.* * bI

I



I



I N

I I



m



1 d



w



I I



* u - - w * n r m m

0



NCI8 * n m w u * u u w w * w n w w * u u n u * c r nI w ~ c t n u w * u m - w * u u n m * I OW w .r--r 2



m e n m n ~ n w n n e u w I



nu* m n



8



.



I



I I



I



YI



m

Y ,

U



a



o

0



r (



m



w



0



e "

0



N



e



n

N



N



9



n

N



h



N



a



0



n

4



Id



' I . rn +



o



* .

*



9

r(



"!



$, ?

e

H

r(



J,

n

b



n h u

Ih



: t:

1 n

YI 0.

R

I



:

E



0



R



*



*

P

I



ru

I



"



2 a



Figure (2.6



Histogram of imputed amounts for amounts reported as zero



Figure C.7



Histogram of percentage error of impute for N01,~ero r e p o r t e - a o u n t s



00000000 00000000



OOOOOOCO



....



..



mmmmmmra



CImOodunrt




Related docs
Other docs by USCensus
Cumulative Population Change Excel[491]
Views: 0  |  Downloads: 0
Detailed Tables g[59]
Views: 0  |  Downloads: 0
October 1990 Table 6
Views: 0  |  Downloads: 0
EC97M-3323A
Views: 1  |  Downloads: 0
621991e
Views: 0  |  Downloads: 0
EC97TCF-ROS-MO
Views: 24  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!