CIQLE Workshop Introduction to longitudinal data analysis with ...

Reviews
CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Aisenbrey, Yale University CIQLE Workshop: Longitudinal data analysis, Silke Aisenbrey Goals for the workshop: -Intro to stata -Modeling Change over time: Panel Regression Models (fixed, between and random) -Modeling whether and/or when events occur: Event History Analysis (Data management for event history data, kaplan-meier, cox, piecewise constant) CIQLE Workshop: Longitudinal data analysis open stata: VARIABLES of open file RESULTS results and syntax REVIEW of syntax: commands or menu COMMAND CIQLE Workshop: Longitudinal data analysis open data, with menu (stata data--> eventex.dta) CIQLE Workshop: Longitudinal data analysis to see real data to make changes directly in data erase variables, cases, make single changes in cases --> CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis basic descriptive commands relational and logical operators in stata: == is equal to ~= is not equal (also !=) > greater than < less than >= greater than or equal <= less than or equal & and | or ~ not (also!) CIQLE Workshop: Longitudinal data analysis basic descriptive commands sum var tab var1 var2 tab var1 var2, col combine with: …… if var1==2 & var3>0 by var1: …………… sort ………… exercise: e.g.: tab abitur sex, col tab abitur sex if cohort==1930, col sort cohort by cohort: tab abitur sex CIQLE Workshop: Longitudinal data analysis basic commands for data management help “command” gen var1 = var2 recode var1 (0=.) (1/8=2) (9=3) rename var1 var100 **use the following variables: cohort (indicator of cohort membership) sex (1=male, 2=female) agemaryc (age @ first marriage) exercise: e.g.: sum agemaryc recode age @ married in groups -generate a new variable -recode new variable into groups -recode if marcens==0 CIQLE Workshop: Longitudinal data analysis possible break CIQLE Workshop: Longitudinal data analysis Intro to panel regression with stata: -panel data -fixed effects -between effects -random effects -fixed or random? CIQLE Workshop: Longitudinal data analysis panel data (panelex1.dta) CIQLE Workshop: Longitudinal data analysis Panel data: Panel data, also called cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods. Cross-sectional data: only information about variance between subjects Panel data: two kinds of information between and within subjects --> two sources of variance CIQLE Workshop: Longitudinal data analysis Janet: Basics of panel regression models CIQLE Workshop: Longitudinal data analysis cross sectional vs. panel analyses open panelex1.dta ignore the fact that we have repeated measures: regress childrn income conclusion: more children --> higher income CIQLE Workshop: Longitudinal data analysis Fixed effects model Answers the question: What is the effect of x when x changes within persons over time e.g. Person A has two children at first point of time and three children at second, what effect has this change on income? Information used: fixed effects estimates using the time-series information in the data Variance analyzed: within Problems: only time variant variables CIQLE Workshop: Longitudinal data analysis Fixed effects exercise: separate regression for each unit and then average it: regress income childrn if id==1 regress income childrn if id==2 CIQLE Workshop: Longitudinal data analysis ) + ( _____________________________ = - 2.5 2 conclusion: more children --> lower income exercise: generate dummy variable for person and regress with dummy variable tab id, g(iddum) reg income childrn iddum1 iddum2 CIQLE Workshop: Longitudinal data analysis Fixed effects -define data set as panel data tsset id t -regression with fixed effects command xtreg income chldrn, fe CIQLE Workshop: Longitudinal data analysis Between effects model Answers the question: What is the effect of x when x is different (changes) between persons: Person A has “on the average” three children and Person B has “on the average” five children, what effect has this difference on their income? In the between effects model we model the mean response, where the means are calculated for each of the units. Information used: cross-sectional information (between subjects) Variance analyzed: between variance Time variant and time invariant variables CIQLE Workshop: Longitudinal data analysis Between effects average ---> regress income childrn conclusion: more children --> more income define data as panel data xtreg dependent independent, be CIQLE Workshop: Longitudinal data analysis Random effects model: Assumption: no difference between the two answers to the questions: 1) what is the effect of x when x changes within the person: Person A has two children at first point of time and three children at second, what effect does this change have on their income? 2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income? Information used: panel and cross-sectional (between and within subjects) Variance analyzed: between variance and within variance Time variant and time invariant variables CIQLE Workshop: Longitudinal data analysis Random effects model: -matrix-weighted average of the fixed and the between estimates. -assumes b1 has the same effect in the cross section as in the time-series -requires that individual error terms treated as random variables and follow the normal distribution. use: xtreg dependent independent if var==x, re CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis possible break CIQLE Workshop: Longitudinal data analysis open data: panelex2.dta varlist: CIQLE Workshop: Longitudinal data analysis tell stata the structure of the data: tsset X Y X= caseid Y=time/wave summary statistics: xtdes xtsum CIQLE Workshop: Longitudinal data analysis use the effects xtreg dependent independent if sex==1, fe xtreg dependent independent if sex==1, be xtreg dependent independent if sex==1, re exercise: compare/discuss models e.g.: xtreg indvar1 indvar2 … if sex==1, fe try to include time invariant variables try to make theoretical/empirical argument why you use which model CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis Problems/Tests/Solutions: What’s the right model: fixed or random effects? Test: Hausman Test Null hypothesis: Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator. If same (insignificant P-value, Prob>chi2 larger than .05) --> safe to use random effects. If significant P-value --> use fixed effects. xtreg y x1 x2 x3 ... , fe estimates store fixed xtreg y x1 x2 x3 ... , re estimates store random hausman fixed random CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis Problems/Tests/Solutions: Autocorrelation? What is autocorrelation: Last time period’s values affect current values test: xtserial Install user-written program, type findit xtserial or net search xtserial xtserial depvar indepvars CIQLE Workshop: Longitudinal data analysis Significant test statistic indicates presence of serial correlation. Solution: use model correcting for autocorrelation xtregar instead of xtreg CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis possible break CIQLE Workshop: Longitudinal data analysis different data structure panel -waves -number of children @ wave1 / 2/ 3/ 4 -employed @ wave1 / 2/ 3/ 4 -income @ wave1 / 2/ 3/ 4 regression models: dependent variable continuous event -dates of events -birth of first child @ 1963 -birth of second child @ 1966… -start of first employment @… -start of unemployment @… -start of second employment @… time information in event data more precise: dependent variable event happens 0/1 CIQLE Workshop: Longitudinal data analysis Different Faces of Event History Data Time continuous discrete CIQLE Workshop: Longitudinal data analysis Types of censoring • Subject does not experience event of interest • Incomplete follow-up  Lost to follow-up  Withdraws from study • Left or right censored CIQLE Workshop: Longitudinal data analysis CIQLE Workshop: Longitudinal data analysis open data eventex.dta CIQLE Workshop: Longitudinal data analysis tell stata that our data is “survival data”  stset stset X, failure(Y) id(Z) X= time at which event happens or right censored, this is always needed Y= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failure Z= id three examples: 1) stset ageendsch event: end of school time: age @ end of school stset agemaryc, failure (marcens) id (caseid) event: marriage stset agestjob, failure (stjob) id (caseid) event: first job 2) 3) CIQLE Workshop: Longitudinal data analysis DATA MANGAGEMENT HANNAH CIQLE Workshop: Longitudinal data analysis Different Models of Event History Time continous non-parametric -kaplan-meier -nelson-aalen -log-rank test for comparison b/w groups semi-parametric -cox -piecewise constant parametric -exponential -weibull -log-logistic -lognormal -gompertz -generalized gamma -logistic -log-log discrete only qualitative covariates -compare survival experiences between groups (sex, cohorts) -univariate inclusion of covariates in models -multivariate Extended from Jenkins 2005 CIQLE Workshop: Longitudinal data analysis survivor function and hazard function • Survivor function, S(t) defines the probability of surviving longer than time t Survivor and hazard functions can be converted into each other Hazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time interval (Δ(t)) at time t, given that the subject did not experience the event before that time • • CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier List the Kaplan-Meier survivor function . sts list . sts list, by(sex) compare Graph the Kaplan-Meier survivor function . sts graph . sts graph, by(sex) CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier exercise: stset your data for marriage, endschool or first job e.g.: 1) sts list 2) sts graph 3) sts list, by (…) compare 4) sts graph, by (..) CIQLE Workshop: Longitudinal data analysis non-parametric: Nelson-Aalen List the Nelson-Aalen cumulative hazard function . sts list, na . sts list, na by(sex) compare Graph the Nelson-Aalen cumulative hazard function . sts graph, na . sts graph, na by(sex) CIQLE Workshop: Longitudinal data analysis non-parametric: Nelson-Aalen exercise: stset your data for marriage, endschool or first job 1) sts list, na 2) sts graph, na 3) sts list, na by (…) compare 4) sts graph, na by (..) CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier Comparing Kaplan-Meier curves Log-rank test can be used to compare survival curves Hypothesis test (test of significance) H0: the curves are statistically the same H1: the curves are statistically different Compares observed to expected cell counts for age@marr: CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier Comparing Kaplan-Meier curves exercise: Test equality of survivor functions e.g.: sts test abitur CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier Limit of Kaplan-Meier curves • • • • What happens when you have several covariates that you believe contribute to survival? Example  Education, marital status, children, gender contribute to job change Can use K-M curves – for 2 or maybe 3 covariates Need another approach – multivariate Cox proportional hazards model is most common -- for many covariates CIQLE Workshop: Longitudinal data analysis semi-parametric models: cox Cox proportional hazards model • Can handle both continuous and categorical predictor variables • Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio • Assumes multiplicative risk -->proportional hazard assumption CIQLE Workshop: Longitudinal data analysis semi-parametric models: cox example age of first marriage stcox sex Interpretation: because the cox model does not estimate a baseline, there is no intercept in the output. sex (male=1) (female=2) whatever the hazard rate at a particular time is for men, it is 1.5 times higher for women what does this mean in our case? women get married younger than men do. CIQLE Workshop: Longitudinal data analysis semi-parametric models: cox Interpretation of the regression coefficients • An estimated hazard rate ratio greater than 1 indicates the covariate is associated with an increased hazard of experiencing the event of interest • An estimated hazard rate ratio less than 1 indicates the covariate is associated with a decreased hazard of experiencing the event of interest • Estimated hazard rate ratio of 1 indicates no association between covariate and hazard. CIQLE Workshop: Longitudinal data analysis Graphically: estimates for functions: stcox sex, basehc (H0) stcurve, hazard at1(sex=0) at2(sex=1) stcox sex, basesurv (S0) stcurve, surviv at1(sex=0) at2(sex=1) CIQLE Workshop: Longitudinal data analysis exercise: make your own cox model and estimate the hazard and survival CIQLE Workshop: Longitudinal data analysis Assessing model adequacy • Proportional assumption: covariates are independent with respect to time and their hazards are constant over time Three general ways to examine model adequacy  Graphically: Do survival curves intersect?  Mathematically: Schoenfeld test  Computationally: Time-dependent variables (extended model) • CIQLE Workshop: Longitudinal data analysis compare with kaplan maier: stcoxkm, by (sex) exercise: do this with one of your estimates CIQLE Workshop: Longitudinal data analysis "log-log" plots stphplot, by (sex) exercise: do this with one of your estimates, stphplot can be adjusted --> look in stphplot help CIQLE Workshop: Longitudinal data analysis Mathematically: Schoenfeld Test tests if the log hazard function is constant over time, thus a rejection of the null hypothesis indicates a deviation from the proportional hazard assumption stcox sex, schoenfeld(sch*) scaledsch(sca*) estat phtest (if more var estat phtest, detail) exercise: do this with your model, try to find a model which fits CIQLE Workshop: Longitudinal data analysis Summary • • • • • Survival analyses quantifies time to a single, dichotomous event Handles censored data well Survival and hazard can be mathematically converted to each other Kaplan-Meier survival curves can be compared graphically Cox proportional hazards models help distinguish individual contributions of covariates to survival, provided certain assumptions are met. CIQLE Workshop: Longitudinal data analysis It can get a lot more complicated than this • The proportional hazards model as shown only works when the time to event data is relatively simple • Complications       non proportional hazard rates time dependent covariates competing risks multiple failures non-absorbing events etc. Extensive literature for these situations and software is available to handle them. CIQLE Workshop: Longitudinal data analysis Semi-parametric models: Piecewise constant -transition rate assumed to be not constant over observed time -splits data in user defined time pieces, -transition rates constant in each “time piece” -but: transition rates change between time pieces CIQLE Workshop: Longitudinal data analysis Semi-parametric models: piecewise constant in STATA a user written command, an “ado file” by J. Sorensen: stpiece net search stpiece install file stpiece abitur, tp(20 30 40) tv(sex) tp: time pieces, intervals tv: covariates whose influence might vary over time pieces CIQLE Workshop: Longitudinal data analysis the end CIQLE Workshop: Longitudinal data analysis

Related docs
RC28 Program--Updated 7-13-09
Views: 0  |  Downloads: 0
Alumni Association of Yale d i r e c
Views: 607  |  Downloads: 1
The-Less-They-Know,-the-Better
Views: 0  |  Downloads: 0
premium docs
Other docs by gregoria