VIEWS: 2 PAGES: 77 POSTED ON: 8/31/2012 Public Domain
Robust Estimation and Multilevel Modeling • Robust • Multilevel • Full lecture slides on web page if Dan had more hair House Prices http://www2.fiu.edu/~dwright/pdf/psychboot.pdf So SBFC is non-significantly higher, but the SDs so different this isn't good. But in opposite direction Medians higher for WLSA But should we care about the medians? So SBFC is significantly higher! Thinking about central tendency: The Median Isn't the Message by Stephen Jay Gould When I revived after surgery, I asked my first question of my doctor and chemotherapist: “What is the best technical literature about mesothelioma?” She replied, with a touch of diplomacy …that the medical literature contained nothing really worth reading. I realized with a gulp why my doctor had offered that humane advice. The literature couldn't have been more brutally clear: mesothelioma is incurable, with a median mortality of only eight months after discovery. Options when traditional methods suspect 1. Bootstrap (and hope a bit) 2. Transform: f1(yi) = f2(modeli) + ei Still minimize Σei2. The functions can be ranking. 3. GLMs (transform and it alters assumptions about ei). Finding something like least squares. 4. Do whatever transformations of the response variable and predictor variables, but minimize Σ f(ei). If the f squaring, least squares (L2), if absolute value (L1) more robust, can be least squares with trimming (e.g., 20% trim), or other functions. Robust Estimation • What is Robust? • Robust Estimators – Trimmed, winsorized, and the other estimators • Mostly on central tendency, but briefly on others. “it consists of making the sum of the squares of the errors a minimum. … it prevents the extremes from dominating, is appropriate for revealing the state of the system which most nearly approaches the truth” Adrien Marie Legendre (1805, pp. 72-73) “Everyone believes in the [Normal distribution], the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact.” Henri Poincaré Micceri (1989) sampled a large number of data sets and found that none were very Normal and that most were very un-Normal. But what if the histograms really look Normal, or at least the histogram looks symmetric. The textbooks seem to say that was okay. Mixture (heavy tailed) Distributions Normal Mixture of two Normals approx. approx. 5.85% under 2.5% under the mixed curve Normal curve John Tukey (1960) Mean and Standard Deviation x t Both mean and sd are affected by outliers sd n As outliers increase in how extreme they are, mean increases, But so does the standard deviation. A single datum in the direction predicted can make a test statistic less significant. Example Data - H0: mean = 0 0, 1, 2, 3, 4 mean = 2, sd = 1.6, t(4) = 2.82, p = .05 0, 1, 2, 3, 14 mean = 4, sd = 5.7, t(4) = 1.57, p = .16 0, 1, 2, 3, 44 mean = 10, sd = 19.0, t(4) = 1.17, p = .31 • 20% trimmed mean equal for all three (2) • Least absolute value also goes to 2 (absolute value produces the median) Fisher 1925 if in an agricultural experiment a first trial shows an apparent advantage of 8 tons to the acre, and a duplicate experiment shows an advantage of 9 tons … the results would justify some confidence that a real effect had been observed; but if the second experiment had shown an apparent advantage of 18 tons, although the mean is now higher, we should place not more but less confidence in the conclusion that the treatment was [p. 112] beneficial, for t has fallen. The apparent paradox may be explained by pointing out that the difference of 10 tons between the experiments indicates the existence of uncontrolled circumstances so influential that in both cases the apparent benefit may be due to chance, whereas in the former case the relatively close agreement of the results suggests that the uncontrolled factors are not so very influential. Mortality rates (per 1000 per year) Pre: 5.0 95% CI 3.7 – 6.3 Post: 7.9 95% CI 5.6-10.2 (without Falluja) Post: 12.3 95% CI 1.4-23.2 (with Falluja) Loss Functions (Note: using E(x) to mean estimate of location of x.) Least squares (L2): min Σ (xi - E(x))2 Least absolute value (L1): min Σ |xi - E(x)| x% Trimmed is OLS for middle (100-2x)% outliers “don’t count” Winsorized is OLS for middle (100-2x)% then levels off 1.0 Hampel Weight 0.8 Huber 0.6 Cauchy Y Andrews & 100 0.4 Bisquare Least Squares Fair 80 0.2 0.0 60 -2 0 2 4 6 8 Influence Distance Least 40 Absolute Winsorized 20 Trimmed 0 -10 0 10 20 Residuals Example in R: Robust measures of location library(mrt) data(chile) attach(chile) mean(LENGTH) 8.94 cm mean(LENGTH, tr=.2) 7.96 cm mean(LENGTH, tr=.5) 7.62 cm median(LENGTH) 7.62 cm library(MASS) huber(LENGTH) 8.30 cm rlm(LENGTH~1,psi=psi.bisquare)$coef 8.16 cm BUT BE CAREFUL, these M-estimators have tuning variables that differ between functions. rlm function • psi option: allows lots. • Huber default, Hampel and Tukey bisquare also popular. Big Questions • If you are trying to measure the mean (or whatever), should you use traditional or robust measures? – the robust measures are not estimating the mean (but are often more reliable at it!) • Should you be trying to measure the mean? Summary: Robust • The statistical revolution at the beginning of the 20th century was based in part on the Normal distribution. • Ranked based procedures began appearing in around 1950 and have achieved some popularity (though they have problems too) • But we also assume Normality! Aren’t all psychology datasets Normal? Extra Selling Point Robust methods tend to increase the likelihood that you will correctly reject a hypothesis. Robust Estimators • Lessen the impact of outliers – improves power – sometimes will want to also focus on outliers • Several available. Newer methods are developed frequently. • Beginning to appear in many statistics programs. HOW TO plus biography Take Messages Robust methods are often useful. Not difficult to run if available in your software. Looking at your data graphically. Multilevel Modeling Do these all means the same thing? • Bayesian hierarchical models • contextual models • hierarchical linear models • hierarchical modeling • mixed models no, but ... • random coefficient models • random effects models • subject specific models • variance component models • variance heterogeneity My first encounter with multilevel modeling London line-up study from early 1990s ML2 ML3 MLn http://www.cmm.bristol.ac.uk/index.shtml Multilevel Modelling Used when the data structures are nested. – Sampling pupils within schools – Sampling people within households – “Sampling” memories within memory – Cross-classified – When studies sampled (i.e., meta-analyses) – Repeated measures – Longitudinal methods Hierarchical Data Set Population Some Schools Some Pupils Longitudinal measures Suppose that you are interested in the relationship Between mathematics attainment and bullying. Probability of being bullied will vary between schools. Sample 10 schools and 10 pupils in each Creating the data set.seed(2007) child <- 1:100 school <- rep(1:10,10) schef <- rep(runif(10,0,1),10) bully <- rbinom(100,100,schef/2 + runif(100,0,.5)) maths <- rbinom(100,100,1-schef/2 -runif(100,0,.5)) Have one line per lowest level (so per trial if repeated measures, per student here) Can restructure in all stats packages (SPSS wizard) Four Approaches: In words • Ignore multilevel structure • Aggregate data, treating school as the unit under investigation • Treat “school” as fixed effect – Similar to ANCOVA, partialing out school • Treat “school” as random effect Four Approaches: In equations Let i be for the pupils, and j for the schools mathi 0 1bullyi ei mathj 0 1bully j u j mathij 0 1bullyi ei ( j) mathij 0 j 1bullyij eij 0 1bullyij u j ei Approach 1: Ignore • Assumption of independence not valid • Standard errors will tend to be too large – Too likely to get significant effects • Pretend a couple of classes happen to have bullies and good math teachers – Can get effects in opposite directions • Easiest to run – Use ordinary least squares olsreg1 <- lm(maths~bully) summary(olsreg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.79749 4.63696 14.837 < 2e-16 *** bully -0.38376 0.08779 -4.371 3.08e-05 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' Residual standard error: 17.74 on 98 degrees of freedom Multiple R-Squared: 0.1632, Adjusted R-squared: 0.1546 F-statistic: 19.11 on 1 and 98 DF, p-value: 3.076e-05 plot(bully,maths) abline(olsreg1) 80 60 maths 40 20 20 40 60 80 bully Approach 2: Level of Analyses is School • Unit of analysis is school, not individual • Hypotheses must be framed appropriately: School with lots of bullying have low math scores math j 0 1bully j u j mbully mmaths [1,] 63.3 41.7 [2,] 25.6 53.4 [3,] 35.0 69.4 New sample size only 10 [4,] 57.0 45.0 [5,] 42.1 55.3 [6,] 31.8 63.7 tapply(bully,school,mean) [7,] 66.2 26.0 [8,] 46.5 57.9 [9,] 54.9 56.5 [10,] 65.6 31.8 ols2 <- lm(mmaths~ mbully) summary(ols2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 86.7337 9.9038 8.758 2.26e-05 *** mbully -0.7513 0.1950 -3.852 0.00486 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ Residual standard error: 8.652 on 8 degrees of freedom Multiple R-squared: 0.6497, Adjusted R-squared: 0.6059 F-statistic: 14.84 on 1 and 8 DF, p-value: 0.004864 plot(bully,maths,cex.axis=1.6,cex.lab=1.6) abline(olsreg1,lwd=2) points(mbully,mmaths,pch=19,cex=1.5,col="DodgerBlue") abline(ols2,col="DodgerBlue",lwd=2) 80 60 maths 40 20 20 40 60 80 bully Lesson: Don't save a file called "draft" since searching for it is difficult. Approach 3a: School Intercepts Analysis of Covariance OLS regression but include k dummy variables for the k schools mathij 0( j ) 1bullyij e j mathij 0(1) 0(2) 0( k ) 1bullyij e j where k is the number of classrooms Estimates intercept for each school assuming the slopes are the same ols3 <- lm(maths~bully+as.factor(school)) summary(ols3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 44.37317 8.00870 5.541 3.03e-07 *** bully -0.04223 0.10234 -0.413 0.680849 as.factor(school)2 10.10792 7.69651 1.313 0.192454 as.factor(school)3 26.50488 7.26216 3.650 0.000442 *** as.factor(school)4 3.03395 6.69081 0.453 0.651328 as.factor(school)5 12.70472 7.00416 1.814 0.073066 . as.factor(school)6 20.66975 7.39885 2.794 0.006381 ** as.factor(school)7 -15.57753 6.66629 -2.337 0.021697 * as.factor(school)8 15.49053 6.87802 2.252 0.026771 * as.factor(school)9 14.44527 6.71493 2.151 0.034168 * as.factor(school)10 -9.80287 6.66384 -1.471 0.144803 Residual standard error: 14.89 on 89 degrees of freedom Multiple R-Squared: 0.4647, Adjusted R-squared: 0.4045 F-statistic: 7.726 on 10 and 89 DF, p-value: 8.8e-09 anova(olsreg1,ols3) Analysis of Variance Table Model 1: maths ~ bully Model 2: maths ~ bully + as.factor(school) Res.Df RSS Df Sum of Sq F Pr(>F) 1 98 30853 2 89 19736 9 11116 5.5697 4.502e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 plot(bully,maths) spredict <- split(predict(ols3),school) for (i in 1:10) lines(sbully[[i]],spredict[[i]],col=i) 80 60 maths 40 20 20 40 60 80 bully ols4 <- lm(maths~bully*as.factor(school)) plot(bully,maths) spredict <- split(predict(ols4),school) for (i in 1:10) lines(sbully[[i]],spredict[[i]],col=i) 80 60 Avoid yellow maths 40 20 20 40 60 80 bully Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 90.9199 28.6239 3.176 0.00212 ** bully -0.7776 0.4470 -1.740 0.08576 . as.factor(school)2 -39.4666 30.4781 -1.295 0.19907 as.factor(school)3 -38.9686 30.3789 -1.283 0.20328 as.factor(school)4 -47.6179 33.4518 -1.423 0.15849 as.factor(school)5 -50.1265 30.9291 -1.621 0.10902 as.factor(school)6 3.6652 30.9021 0.119 0.90589 as.factor(school)7 -88.3777 34.3854 -2.570 0.01202 * as.factor(school)8 -32.4313 31.3624 -1.034 0.30421 as.factor(school)9 -27.3559 33.2073 -0.824 0.41251 as.factor(school)10 -9.7977 34.8759 -0.281 0.77949 bully:as.factor(school)2 0.8536 0.5816 1.468 0.14608 bully:as.factor(school)3 1.2761 0.5186 2.461 0.01601 * bully:as.factor(school)4 0.8074 0.5350 1.509 0.13521 bully:as.factor(school)5 1.1221 0.5163 2.173 0.03271 * bully:as.factor(school)6 -0.1937 0.5615 -0.345 0.73105 bully:as.factor(school)7 1.1319 0.5275 2.146 0.03494 * bully:as.factor(school)8 0.7649 0.5167 1.480 0.14273 bully:as.factor(school)9 0.6489 0.5362 1.210 0.22980 bully:as.factor(school)10 0.0257 0.5363 0.048 0.96190 How about summarize the school intercepts as: B0j = B0 + uj and slopes as B1j = B1 + vj And assuming both uj and vj are normally distributed with means of zero. Then you only need to estimate their means and standard deviations (only two things). Two Libraries nlme – the first big one and so traditionally what was used – still needed for some non-linear models – better for longitudinal Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., and the R Development Core Team (2011). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-98. lme4 – uses S4 classes which are all the rage – builds on nlme so the library for the future – does generalized linear models Bates, D., Maechler, M., and Bolker, B. (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-38. library(nlme) lme1 <- lme(maths~bully,random=~1|school,method="ML") summary(lme1) Linear mixed-effects model fit by maximum likelihood AIC BIC logLik 849.8016 860.2222 -420.9008 Random effects: Formula: ~1 | school (Intercept) Residual StdDev: 10.58926 14.87886 Fixed effects: maths ~ bully Value Std.Error DF t-value p-value (Intercept) 56.72791 5.977633 89 9.490028 0.0000 bully -0.13643 0.096183 89 -1.418471 0.1595 Correlation: (Intr) bully -0.785 intervals(lme1) Approximate 95% confidence intervals Fixed effects: lower est. upper (Intercept) 44.969852 56.7279078 68.48596403 bully -0.325625 -0.1364325 0.05275994 attr(,"label") [1] "Fixed effects:" Random Effects: Level: school lower est. upper sd((Intercept)) 5.980183 10.58926 18.75067 Within-group standard error: lower est. upper 12.83830 14.87886 17.24375 lme0 <- lme(maths~1,random=~1|school,method="ML") anova(lme0,lme1) Model df AIC BIC logLik Test L.Ratio p-value lme0 1 3 849.54 857.35 -421.77 lme1 2 4 849.80 860.22 -420.90 1 vs 2 1.73 0.19 lme2 <- lme(maths~bully,random=bully|school,method="ML") anova(lme1,lme2) Model df AIC BIC logLik Test L.Ratio p-value lme1 1 4 849.80 860.22 -420.90 lme2 2 6 849.26 864.89 -418.63 1 vs 2 4.54 0.10 library(lme4) lme1 <- lmer(maths~bully + (1|school),REML=FALSE) summary(lme1) Linear mixed model fit by maximum likelihood Formula: maths ~ bully + (1 | school) AIC BIC logLik deviance REMLdev 849.8 860.2 -420.9 841.8 840.2 Random effects: Groups Name Variance Std.Dev. school (Intercept) 112.11 10.588 Residual 221.39 14.879 Number of obs: 100, groups: school, 10 Estimates slightly different than nlme Fixed effects: Estimate Std. Error t value (Intercept) 56.72876 5.91732 9.587 bully -0.13645 0.09522 -1.433 Correlation of Fixed Effects: (Intr) bully -0.785 Math and Bullying • Traditionally people have – ignored nesting – used complex and inflexible fixes – done analyses at aggregate level (ecological fallacy) – treated as a fixed effect and covaried out • Now multilevel modeling Multilevel ANOVA: "My randomization didn't work!" Chloe found group differences in baseline measures chl <- file.choose() # exercise.dat chloe <- read.table(chl,header=TRUE) attach(chloe) wcond <- as.factor(wcond) summary(aov(sqw1^2~wcond)) Df Sum Sq Mean Sq F value Pr(>F) wcond 3 37.67 12.56 2.8527 0.03683 * Residuals 499 2196.33 4.40 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 library(lme4) mod0 <- lmer(sqw1^2 ~ 1 + (1|class)) mod1 <- lmer(sqw1^2 ~ wcond + (1|class)) anova(mod0,mod1) Data: Models: mod0: sqw1^2 ~ 1 + (1 | class) mod1: sqw1^2 ~ wcond + (1 | class) Df AIC BIC logLik Chisq ChiDf Pr(>Chisq) mod0 3 2166.5 2179.2 -1080.2 mod1 6 2169.7 2195.0 -1078.8 2.8267 3 0.4191 Chloe happy with multilevel modeling now Hill et al.'s theory Baron and Kenny mediation analysis with lmer/nlme condition -> intention -> exercise intention ~ condition exercise ~ intention exercise ~ condition + exercise exercise ~ condition Can do the same sequence of regressions they suggest and use the same coefficients and standard errors in the Sobel test. Journal • In a paragraph, say why you would use robust methods and give an example. • Very briefly, describe a study that you would do (related to your area of research) that would require multilevel modeling. “Simple” linear multilevel modeling • Could have looked at assumptions (normal residuals for all random variables) • Plots of everything • If we were doing longitudinal methods with covariance structures, then continue with nlme and see Singer and Willett's longitudinal book http://www.ats.ucla.edu/stat/examples/alda/ Ignore Multilevel Structure: ANOVA What would happen with t.test(Post_QoL ~ Surgery) ? Ignore structure, ANCOVA Check interaction Treating Clinic as a Fixed Effect Now surgery is non-significant As a multilevel model (ANOVA, random intercept) Post ij 59 1.68 Surgery u j eij Multilevel ANCOVA Why no p values? Some implementations do. Baseline QoL significant anova(update(multi2, .~. - Surgery),multi2) Testing if variance of surgery effect same across clinics Includes by default variance and covariance Other effects added as with lm. Generalized Linear Multilevel Juror decision making set.seed(2008) juror <- 1:100 jury <- rep(1:10,each=10) juryef <- rep(runif(10,0,1),each=10) Eyewit <- rep(rbinom(10,1,.5),each=10) guilty <- rbinom(100,1,Eyewit/4 +juryef/2 + runif(100,0,.25)) For each jury mod3 <- glm(guilty~Eyewit + as.factor(jury),binomial) summary(mod3) Estimate Std. Error z value Pr(>|z|) (Intercept) 2.229e-16 6.325e-01 3.52e-16 1.000 Eyewit 4.055e-01 9.037e-01 0.449 0.654 as.factor(jury)2 -4.055e-01 9.037e-01 -0.449 0.654 as.factor(jury)3 -8.901e-17 8.944e-01 -9.95e-17 1.000 as.factor(jury)4 9.808e-01 1.021e+00 0.961 0.337 as.factor(jury)5 7.737e-16 9.129e-01 8.48e-16 1.000 as.factor(jury)6 -8.473e-01 9.361e-01 -0.905 0.365 as.factor(jury)7 -4.055e-01 9.037e-01 -0.449 0.654 as.factor(jury)8 8.473e-01 9.361e-01 0.905 0.365 as.factor(jury)9 9.808e-01 1.021e+00 0.961 0.337 as.factor(jury)10 NA NA NA NA Null deviance: 136.66 on 99 degrees of freedom Residual deviance: 126.42 on 90 degrees of freedom AIC: 146.42 mod4 <- lmer(guilty~Eyewit + (1|jury),family=binomial) summary(mod4) Generalized linear mixed model fit by the Laplace approximation Formula: guilty ~ Eyewit + (1 | jury) AIC BIC logLik deviance 139.3 147.2 -66.67 133.3 Random effects: Groups Name Variance Std.Dev. jury (Intercept) 1.8024e-19 4.2455e-10 Number of obs: 100, groups: jury, 10 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.08004 0.28307 -0.2828 0.7774 Eyewit 0.74329 0.41140 1.8067 0.0708 . Correlation of Fixed Effects: (Intr) Eyewit -0.688 Getting from coefficients to probabilities With glm use predict. Not available for lmer. Calculate values in logits: -0.08004 and 0.74329 - .08004 = .66325 Inverse logit: 1/(1+e-x) 48% and 66% Summary: Learning Points • What are multilevel models? – Nature is hierarchical (or at least multilevel) • Lots today! • lm, glm, rlm, and lmer have similar structures