Slide 1 by HC120831125415

VIEWS: 2 PAGES: 77

• pg 1
```									  Robust Estimation and Multilevel Modeling

• Robust
• Multilevel

• Full lecture slides on web page

House Prices
http://www2.fiu.edu/~dwright/pdf/psychboot.pdf
So SBFC is non-significantly higher,
but the SDs so different this isn't good.

But in opposite direction
Medians higher for WLSA

But should we care about the medians?
So SBFC is significantly higher!
The Median Isn't the Message
by Stephen Jay Gould

When I revived after surgery, I asked my first question of my doctor
and chemotherapist: “What is the best technical literature about
mesothelioma?” She replied, with a touch of diplomacy …that
the medical literature contained nothing really worth reading.

I realized with a gulp why my doctor had offered that humane advice.
The literature couldn't have been more brutally clear: mesothelioma is
incurable, with a median mortality of only eight months after discovery.

1. Bootstrap (and hope a bit)
2. Transform: f1(yi) = f2(modeli) + ei
Still minimize Σei2. The functions can be ranking.
3. GLMs (transform and it alters assumptions about ei).
Finding something like least squares.
4. Do whatever transformations of the response variable and
predictor variables, but minimize Σ f(ei). If the f squaring,
least squares (L2), if absolute value (L1) more robust, can
be least squares with trimming (e.g., 20% trim), or other
functions.
Robust Estimation

• What is Robust?
• Robust Estimators
– Trimmed, winsorized, and the other estimators
• Mostly on central tendency, but briefly on others.
“it consists of making the sum of
the squares of the errors a
minimum. … it prevents the
extremes from dominating, is
appropriate for revealing the
state of the system which most
nearly approaches the truth”

(1805, pp. 72-73)
“Everyone believes in the [Normal distribution], the
experimenters because they think it is a mathematical theorem, the
mathematicians because they think it is an experimental fact.”
Henri Poincaré

Micceri (1989) sampled a large number of data sets and found that
none were very Normal and that most were very un-Normal.

But what if the histograms really look Normal, or
at least the histogram looks symmetric.
The textbooks seem to say that was okay.
Mixture (heavy tailed) Distributions

Normal

Mixture of
two Normals
approx.               approx. 5.85% under
2.5% under            the mixed curve
Normal curve

John Tukey (1960)
Mean and Standard Deviation

x
t                 Both mean and sd are affected by outliers
sd
n

As outliers increase in how extreme they are, mean increases,
But so does the standard deviation.

A single datum in the direction predicted
can make a test statistic less significant.
Example Data - H0: mean = 0

0, 1, 2, 3, 4
mean = 2, sd = 1.6, t(4) = 2.82, p = .05
0, 1, 2, 3, 14
mean = 4, sd = 5.7, t(4) = 1.57, p = .16
0, 1, 2, 3, 44
mean = 10, sd = 19.0, t(4) = 1.17, p = .31

• 20% trimmed mean equal for all three (2)
• Least absolute value also goes to 2
(absolute value produces the median)
Fisher 1925

if in an agricultural experiment a first trial shows an apparent
advantage of 8 tons to the acre, and a duplicate experiment shows an
advantage of 9 tons … the results would justify some confidence that
a real effect had been observed; but if the second experiment had
shown an apparent advantage of 18 tons, although the mean is now
higher, we should place not more but less confidence in the
conclusion that the treatment was [p. 112] beneficial, for t has fallen.

The apparent paradox may be explained by pointing out that the
difference of 10 tons between the experiments indicates the existence
of uncontrolled circumstances so influential that in both cases the
apparent benefit may be due to chance, whereas in the former case
the relatively close agreement of the results suggests that the
uncontrolled factors are not so very influential.
Mortality rates
(per 1000 per year)

Pre: 5.0
95% CI 3.7 – 6.3

Post: 7.9
95% CI 5.6-10.2
(without Falluja)

Post: 12.3
95% CI 1.4-23.2
(with Falluja)
Loss Functions

(Note: using E(x) to mean estimate of location of x.)

Least squares (L2): min Σ (xi - E(x))2
Least absolute value (L1): min Σ |xi - E(x)|

x% Trimmed is OLS for middle (100-2x)% outliers “don’t count”
Winsorized is OLS for middle (100-2x)% then levels off
1.0
Hampel
Weight
0.8                 Huber

0.6
Cauchy

Y
Andrews &
100                                                 0.4                            Bisquare
Least Squares                             Fair
80
0.2

0.0
60                                                     -2    0      2    4       6     8
Influence

Distance
Least
40     Absolute                   Winsorized

20
Trimmed

0
-10          0                   10           20
Residuals
Example in R: Robust measures of location

library(mrt)
data(chile)
attach(chile)
mean(LENGTH)                        8.94 cm
mean(LENGTH, tr=.2)                 7.96 cm
mean(LENGTH, tr=.5)                 7.62 cm
median(LENGTH)                      7.62 cm

library(MASS)
huber(LENGTH)                       8.30 cm
rlm(LENGTH~1,psi=psi.bisquare)\$coef 8.16 cm

BUT BE CAREFUL, these M-estimators have tuning variables
that differ between functions.
rlm function
• psi option: allows lots.
• Huber default, Hampel and Tukey bisquare also popular.
Big Questions

• If you are trying to measure the mean (or whatever), should you
– the robust measures are not estimating the mean
(but are often more reliable at it!)
• Should you be trying to measure the mean?
Summary: Robust

• The statistical revolution at the beginning of the 20th century was
based in part on the Normal distribution.
• Ranked based procedures began appearing in around 1950 and
have achieved some popularity (though they have problems too)
• But we also assume Normality! Aren’t all psychology datasets
Normal?
Extra Selling Point

Robust methods tend to increase
the likelihood that you will
correctly reject a hypothesis.
Robust Estimators

• Lessen the impact of outliers
– improves power
– sometimes will want to also focus on outliers
• Several available. Newer methods are developed frequently.
• Beginning to appear in many statistics programs.

HOW TO plus biography
Take                   Messages

Robust methods are often useful. Not
difficult to run if available in your software.

Multilevel Modeling
Do these all means the same thing?

•   Bayesian hierarchical models
•   contextual models
•   hierarchical linear models
•   hierarchical modeling
•   mixed models
no, but ...
•   random coefficient models
•   random effects models
•   subject specific models
•   variance component models
•   variance heterogeneity
My first encounter with multilevel modeling

London line-up study from early 1990s

ML2
ML3
MLn

http://www.cmm.bristol.ac.uk/index.shtml
Multilevel Modelling

Used when the data structures are nested.
– Sampling pupils within schools
– Sampling people within households
– “Sampling” memories within memory
– Cross-classified
– When studies sampled (i.e., meta-analyses)
– Repeated measures
– Longitudinal methods
Hierarchical Data Set

Population
Some Schools
Some Pupils
Longitudinal measures

Suppose that you are interested in the relationship
Between mathematics attainment and bullying.

Probability of being bullied will vary between schools.

Sample 10 schools and 10 pupils in each
Creating the data

set.seed(2007)
child <- 1:100
school <- rep(1:10,10)
schef <- rep(runif(10,0,1),10)
bully <- rbinom(100,100,schef/2 + runif(100,0,.5))
maths <- rbinom(100,100,1-schef/2 -runif(100,0,.5))

Have one line per lowest level (so per trial if repeated
measures, per student here)
Can restructure in all stats packages (SPSS wizard)
Four Approaches: In words

• Ignore multilevel structure
• Aggregate data, treating school as the unit under
investigation
• Treat “school” as fixed effect
– Similar to ANCOVA, partialing out school
• Treat “school” as random effect
Four Approaches: In equations

Let i be for the pupils, and j for the schools

mathi         0   1bullyi  ei
mathj         0   1bully j  u j
mathij        0   1bullyi  ei
( j)

mathij        0 j   1bullyij  eij
  0   1bullyij  u j  ei
Approach 1: Ignore

• Assumption of independence not valid
• Standard errors will tend to be too large
– Too likely to get significant effects
• Pretend a couple of classes happen to have bullies
and good math teachers
– Can get effects in opposite directions
• Easiest to run
– Use ordinary least squares
olsreg1 <- lm(maths~bully)
summary(olsreg1)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.79749    4.63696 14.837 < 2e-16 ***
bully       -0.38376    0.08779 -4.371 3.08e-05 ***

Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.'

Residual standard error: 17.74 on 98 degrees of freedom
Multiple R-Squared: 0.1632,     Adjusted R-squared: 0.1546
F-statistic: 19.11 on 1 and 98 DF, p-value: 3.076e-05
plot(bully,maths)
abline(olsreg1)

80
60
maths

40
20

20   40           60   80

bully
Approach 2: Level of Analyses is School

• Unit of analysis is school, not individual
• Hypotheses must be framed appropriately:
School with lots of bullying have low math scores

math j   0   1bully j  u j
mbully   mmaths
[1,]    63.3    41.7
[2,]    25.6    53.4
[3,]    35.0    69.4     New sample size only 10
[4,]    57.0    45.0
[5,]    42.1    55.3
[6,]    31.8    63.7     tapply(bully,school,mean)
[7,]    66.2    26.0
[8,]    46.5    57.9
[9,]    54.9    56.5
[10,]    65.6    31.8
ols2 <- lm(mmaths~ mbully)
summary(ols2)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 86.7337      9.9038   8.758 2.26e-05 ***
mbully        -0.7513    0.1950 -3.852 0.00486 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

Residual standard error: 8.652 on 8 degrees of freedom
Multiple R-squared: 0.6497,     Adjusted R-squared: 0.6059
F-statistic: 14.84 on 1 and 8 DF, p-value: 0.004864
plot(bully,maths,cex.axis=1.6,cex.lab=1.6)
abline(olsreg1,lwd=2)
points(mbully,mmaths,pch=19,cex=1.5,col="DodgerBlue")
abline(ols2,col="DodgerBlue",lwd=2)

80
60
maths
40
20

20   40           60   80
bully
Lesson: Don't save a file called "draft" since searching for it is difficult.
Approach 3a: School Intercepts
Analysis of Covariance

OLS regression but include k dummy variables for the k schools

mathij      0( j )   1bullyij  e j
mathij      0(1)   0(2)     0( k )   1bullyij  e j
where k is the number of classrooms

Estimates intercept for each school
assuming the slopes are the same
ols3 <- lm(maths~bully+as.factor(school))
summary(ols3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)         44.37317    8.00870   5.541 3.03e-07 ***
bully               -0.04223    0.10234 -0.413 0.680849
as.factor(school)2  10.10792    7.69651   1.313 0.192454
as.factor(school)3  26.50488    7.26216   3.650 0.000442 ***
as.factor(school)4   3.03395    6.69081   0.453 0.651328
as.factor(school)5  12.70472    7.00416   1.814 0.073066 .
as.factor(school)6  20.66975    7.39885   2.794 0.006381 **
as.factor(school)7 -15.57753    6.66629 -2.337 0.021697 *
as.factor(school)8  15.49053    6.87802   2.252 0.026771 *
as.factor(school)9  14.44527    6.71493   2.151 0.034168 *
as.factor(school)10 -9.80287    6.66384 -1.471 0.144803

Residual standard error: 14.89 on 89 degrees of freedom
Multiple R-Squared: 0.4647,     Adjusted R-squared: 0.4045
F-statistic: 7.726 on 10 and 89 DF, p-value: 8.8e-09
anova(olsreg1,ols3)

Analysis of Variance Table

Model 1: maths ~ bully
Model 2: maths ~ bully + as.factor(school)
Res.Df   RSS Df Sum of Sq      F    Pr(>F)
1     98 30853
2     89 19736 9      11116 5.5697 4.502e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
plot(bully,maths)
spredict <- split(predict(ols3),school)
for (i in 1:10)
lines(sbully[[i]],spredict[[i]],col=i)

80
60
maths

40
20

20   40           60   80

bully
ols4 <- lm(maths~bully*as.factor(school))
plot(bully,maths)
spredict <- split(predict(ols4),school)
for (i in 1:10) lines(sbully[[i]],spredict[[i]],col=i)

80
60

Avoid yellow
maths

40
20

20   40           60   80

bully
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                90.9199    28.6239   3.176 0.00212 **
bully                      -0.7776     0.4470 -1.740 0.08576 .
as.factor(school)2        -39.4666    30.4781 -1.295 0.19907
as.factor(school)3        -38.9686    30.3789 -1.283 0.20328
as.factor(school)4        -47.6179    33.4518 -1.423 0.15849
as.factor(school)5        -50.1265    30.9291 -1.621 0.10902
as.factor(school)6          3.6652    30.9021   0.119 0.90589
as.factor(school)7        -88.3777    34.3854 -2.570 0.01202 *
as.factor(school)8        -32.4313    31.3624 -1.034 0.30421
as.factor(school)9        -27.3559    33.2073 -0.824 0.41251
as.factor(school)10        -9.7977    34.8759 -0.281 0.77949
bully:as.factor(school)2    0.8536     0.5816   1.468 0.14608
bully:as.factor(school)3    1.2761     0.5186   2.461 0.01601 *
bully:as.factor(school)4    0.8074     0.5350   1.509 0.13521
bully:as.factor(school)5    1.1221     0.5163   2.173 0.03271 *
bully:as.factor(school)6   -0.1937     0.5615 -0.345 0.73105
bully:as.factor(school)7    1.1319     0.5275   2.146 0.03494 *
bully:as.factor(school)8    0.7649     0.5167   1.480 0.14273
bully:as.factor(school)9    0.6489     0.5362   1.210 0.22980
bully:as.factor(school)10   0.0257     0.5363   0.048 0.96190
How about summarize the school intercepts as:

B0j = B0 + uj

and slopes as

B1j = B1 + vj

And assuming both uj and vj are normally distributed
with means of zero.

Then you only need to estimate their means and
standard deviations (only two things).
Two Libraries
nlme
– the first big one and so traditionally what was used
– still needed for some non-linear models
– better for longitudinal
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., and
the R Development Core Team (2011). nlme: Linear
and Nonlinear Mixed Effects Models. R package
version 3.1-98.
lme4
– uses S4 classes which are all the rage
– builds on nlme so the library for the future
– does generalized linear models
Bates, D., Maechler, M., and Bolker, B. (2011).
lme4: Linear mixed-effects models using S4 classes.
R package version 0.999375-38.
library(nlme)
lme1 <- lme(maths~bully,random=~1|school,method="ML")
summary(lme1)
Linear mixed-effects model fit by maximum likelihood
AIC      BIC    logLik
849.8016 860.2222 -420.9008

Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev:    10.58926 14.87886

Fixed effects: maths ~ bully
Value Std.Error DF  t-value p-value
(Intercept) 56.72791 5.977633 89 9.490028 0.0000
bully        -0.13643 0.096183 89 -1.418471 0.1595
Correlation:
(Intr)
bully -0.785
intervals(lme1)
Approximate 95% confidence intervals

Fixed effects:
lower       est.       upper
(Intercept) 44.969852 56.7279078 68.48596403
bully       -0.325625 -0.1364325 0.05275994
attr(,"label")
[1] "Fixed effects:"

Random Effects:
Level: school
lower     est.    upper
sd((Intercept)) 5.980183 10.58926 18.75067

Within-group standard error:
lower     est.    upper
12.83830 14.87886 17.24375
lme0 <- lme(maths~1,random=~1|school,method="ML")
anova(lme0,lme1)

Model df        AIC    BIC    logLik   Test     L.Ratio p-value
lme0      1   3 849.54 857.35 -421.77
lme1      2   4 849.80 860.22 -420.90   1 vs 2    1.73    0.19

lme2 <- lme(maths~bully,random=bully|school,method="ML")
anova(lme1,lme2)
Model df AIC     BIC    logLik   Test L.Ratio p-value
lme1     1 4 849.80 860.22 -420.90
lme2     2 6 849.26 864.89 -418.63 1 vs 2    4.54    0.10
library(lme4)
lme1 <- lmer(maths~bully + (1|school),REML=FALSE)
summary(lme1)
Linear mixed model fit by maximum likelihood
Formula: maths ~ bully + (1 | school)
AIC    BIC logLik deviance REMLdev
849.8 860.2 -420.9     841.8   840.2
Random effects:
Groups    Name        Variance Std.Dev.
school    (Intercept) 112.11   10.588
Residual              221.39   14.879
Number of obs: 100, groups: school, 10   Estimates    slightly
different than nlme
Fixed effects:
Estimate Std. Error t value
(Intercept) 56.72876    5.91732   9.587
bully       -0.13645    0.09522 -1.433

Correlation of Fixed Effects:
(Intr)
bully -0.785
Math and Bullying

– ignored nesting
– used complex and inflexible fixes
– done analyses at aggregate level (ecological fallacy)
– treated as a fixed effect and covaried out

• Now multilevel modeling
Multilevel ANOVA:
"My randomization didn't work!"
Chloe found group differences in baseline measures

chl <- file.choose()
# exercise.dat
attach(chloe)
wcond <- as.factor(wcond)
summary(aov(sqw1^2~wcond))

Df Sum Sq Mean Sq F value Pr(>F)
wcond         3   37.67  12.56 2.8527 0.03683 *
Residuals   499 2196.33   4.40
---
Signif. codes:   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
library(lme4)
mod0 <- lmer(sqw1^2 ~ 1 + (1|class))
mod1 <- lmer(sqw1^2 ~ wcond + (1|class))
anova(mod0,mod1)

Data:
Models:
mod0: sqw1^2 ~ 1 + (1 | class)
mod1: sqw1^2 ~ wcond + (1 | class)
Df    AIC     BIC logLik Chisq ChiDf Pr(>Chisq)
mod0 3 2166.5 2179.2 -1080.2
mod1 6 2169.7 2195.0 -1078.8 2.8267     3    0.4191

Chloe happy with multilevel modeling now
Hill et al.'s theory
Baron and Kenny mediation analysis
with lmer/nlme

condition -> intention -> exercise

intention ~ condition
exercise ~ intention
exercise ~ condition + exercise
exercise ~ condition

Can do the same sequence of regressions they
suggest and use the same coefficients and standard
errors in the Sobel test.
Journal
• In a paragraph, say why you would use
robust methods and give an example.
• Very briefly, describe a study that you
would do (related to your area of research)
that would require multilevel modeling.
“Simple” linear multilevel modeling

• Could have looked at assumptions (normal residuals for
all random variables)
• Plots of everything
• If we were doing longitudinal methods with covariance
structures, then continue with nlme and see Singer and
Willett's longitudinal book

http://www.ats.ucla.edu/stat/examples/alda/
Ignore Multilevel Structure: ANOVA

What would happen with t.test(Post_QoL ~ Surgery) ?
Ignore structure, ANCOVA
Check interaction
Treating Clinic as a Fixed Effect

Now surgery is non-significant
As a multilevel model
(ANOVA, random intercept)

Post ij  59  1.68 Surgery  u j  eij
Multilevel ANCOVA

Why no p values?
Some implementations do.
Baseline QoL significant

anova(update(multi2, .~. - Surgery),multi2)
Testing if variance of surgery effect same
across clinics

Includes by default variance and covariance
Other effects added as with lm.
Generalized Linear Multilevel
Juror decision making

set.seed(2008)
juror <- 1:100
jury <- rep(1:10,each=10)
juryef <- rep(runif(10,0,1),each=10)
Eyewit <- rep(rbinom(10,1,.5),each=10)
guilty <- rbinom(100,1,Eyewit/4 +juryef/2
+ runif(100,0,.25))
For each jury
mod3 <- glm(guilty~Eyewit + as.factor(jury),binomial)
summary(mod3)
Estimate Std. Error   z value Pr(>|z|)
(Intercept)        2.229e-16 6.325e-01 3.52e-16       1.000
Eyewit             4.055e-01 9.037e-01      0.449     0.654
as.factor(jury)2 -4.055e-01 9.037e-01      -0.449     0.654
as.factor(jury)3 -8.901e-17 8.944e-01 -9.95e-17       1.000
as.factor(jury)4   9.808e-01 1.021e+00      0.961     0.337
as.factor(jury)5   7.737e-16 9.129e-01 8.48e-16       1.000
as.factor(jury)6 -8.473e-01 9.361e-01      -0.905     0.365
as.factor(jury)7 -4.055e-01 9.037e-01      -0.449     0.654
as.factor(jury)8   8.473e-01 9.361e-01      0.905     0.365
as.factor(jury)9   9.808e-01 1.021e+00      0.961     0.337
as.factor(jury)10         NA         NA        NA        NA

Null deviance: 136.66   on 99   degrees of freedom
Residual deviance: 126.42   on 90   degrees of freedom
AIC: 146.42
mod4 <- lmer(guilty~Eyewit + (1|jury),family=binomial)
summary(mod4)
Generalized linear mixed model fit by the Laplace approximation
Formula: guilty ~ Eyewit + (1 | jury)
AIC   BIC logLik deviance
139.3 147.2 -66.67    133.3
Random effects:
Groups Name        Variance   Std.Dev.
jury   (Intercept) 1.8024e-19 4.2455e-10
Number of obs: 100, groups: jury, 10

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.08004    0.28307 -0.2828   0.7774
Eyewit       0.74329    0.41140 1.8067    0.0708 .

Correlation of Fixed Effects:
(Intr)
Eyewit -0.688
Getting from coefficients to probabilities

With glm use predict. Not available for lmer.
Calculate values in logits:
-0.08004 and 0.74329 - .08004 = .66325

Inverse logit: 1/(1+e-x)
48% and 66%
Summary: Learning Points

• What are multilevel models?
– Nature is hierarchical
(or at least multilevel)
• Lots today!
• lm, glm, rlm, and lmer have similar structures

```
To top