Slide 1 by HC120831125415

VIEWS: 2 PAGES: 77

									  Robust Estimation and Multilevel Modeling

• Robust
• Multilevel




• Full lecture slides on web page

                                    if Dan had more hair
                House Prices
http://www2.fiu.edu/~dwright/pdf/psychboot.pdf
So SBFC is non-significantly higher,
but the SDs so different this isn't good.




       But in opposite direction
    Medians higher for WLSA




But should we care about the medians?
So SBFC is significantly higher!
                Thinking about central tendency:
                 The Median Isn't the Message
                      by Stephen Jay Gould


When I revived after surgery, I asked my first question of my doctor
and chemotherapist: “What is the best technical literature about
mesothelioma?” She replied, with a touch of diplomacy …that
the medical literature contained nothing really worth reading.



I realized with a gulp why my doctor had offered that humane advice.
The literature couldn't have been more brutally clear: mesothelioma is
incurable, with a median mortality of only eight months after discovery.
      Options when traditional methods suspect

1. Bootstrap (and hope a bit)
2. Transform: f1(yi) = f2(modeli) + ei
   Still minimize Σei2. The functions can be ranking.
3. GLMs (transform and it alters assumptions about ei).
   Finding something like least squares.
4. Do whatever transformations of the response variable and
   predictor variables, but minimize Σ f(ei). If the f squaring,
   least squares (L2), if absolute value (L1) more robust, can
   be least squares with trimming (e.g., 20% trim), or other
   functions.
                       Robust Estimation



• What is Robust?
• Robust Estimators
   – Trimmed, winsorized, and the other estimators
• Mostly on central tendency, but briefly on others.
“it consists of making the sum of
the squares of the errors a
minimum. … it prevents the
extremes from dominating, is
appropriate for revealing the
state of the system which most
nearly approaches the truth”




                                    Adrien Marie Legendre
                                      (1805, pp. 72-73)
“Everyone believes in the [Normal distribution], the
experimenters because they think it is a mathematical theorem, the
mathematicians because they think it is an experimental fact.”
                                                         Henri Poincaré

Micceri (1989) sampled a large number of data sets and found that
none were very Normal and that most were very un-Normal.




         But what if the histograms really look Normal, or
         at least the histogram looks symmetric.
         The textbooks seem to say that was okay.
         Mixture (heavy tailed) Distributions


                   Normal




Mixture of
two Normals
                                 approx.               approx. 5.85% under
                                 2.5% under            the mixed curve
                                 Normal curve




                                                John Tukey (1960)
                   Mean and Standard Deviation




           x
 t                 Both mean and sd are affected by outliers
      sd
               n


As outliers increase in how extreme they are, mean increases,
But so does the standard deviation.

      A single datum in the direction predicted
      can make a test statistic less significant.
                   Example Data - H0: mean = 0


0, 1, 2, 3, 4
     mean = 2, sd = 1.6, t(4) = 2.82, p = .05
0, 1, 2, 3, 14
     mean = 4, sd = 5.7, t(4) = 1.57, p = .16
0, 1, 2, 3, 44
     mean = 10, sd = 19.0, t(4) = 1.17, p = .31

• 20% trimmed mean equal for all three (2)
• Least absolute value also goes to 2
  (absolute value produces the median)
                            Fisher 1925


if in an agricultural experiment a first trial shows an apparent
advantage of 8 tons to the acre, and a duplicate experiment shows an
advantage of 9 tons … the results would justify some confidence that
a real effect had been observed; but if the second experiment had
shown an apparent advantage of 18 tons, although the mean is now
higher, we should place not more but less confidence in the
conclusion that the treatment was [p. 112] beneficial, for t has fallen.

The apparent paradox may be explained by pointing out that the
difference of 10 tons between the experiments indicates the existence
of uncontrolled circumstances so influential that in both cases the
apparent benefit may be due to chance, whereas in the former case
the relatively close agreement of the results suggests that the
uncontrolled factors are not so very influential.
Mortality rates
(per 1000 per year)

Pre: 5.0
95% CI 3.7 – 6.3

Post: 7.9
95% CI 5.6-10.2
(without Falluja)

Post: 12.3
95% CI 1.4-23.2
(with Falluja)
                         Loss Functions



(Note: using E(x) to mean estimate of location of x.)

Least squares (L2): min Σ (xi - E(x))2
Least absolute value (L1): min Σ |xi - E(x)|

x% Trimmed is OLS for middle (100-2x)% outliers “don’t count”
Winsorized is OLS for middle (100-2x)% then levels off
                                                                1.0
                                                                                  Hampel
                                                       Weight
                                                                0.8                 Huber


                                                                0.6
                                                                                          Cauchy




                                                           Y
                                                                                            Andrews &
            100                                                 0.4                            Bisquare
                              Least Squares                             Fair
            80
                                                                0.2

                                                                0.0
            60                                                     -2    0      2    4       6     8
Influence




                                                                               Distance
                   Least
            40     Absolute                   Winsorized



            20
                                       Trimmed


             0
             -10          0                   10           20
                               Residuals
          Example in R: Robust measures of location

library(mrt)
data(chile)
attach(chile)
mean(LENGTH)                        8.94 cm
mean(LENGTH, tr=.2)                 7.96 cm
mean(LENGTH, tr=.5)                 7.62 cm
median(LENGTH)                      7.62 cm

library(MASS)
huber(LENGTH)                       8.30 cm
rlm(LENGTH~1,psi=psi.bisquare)$coef 8.16 cm

  BUT BE CAREFUL, these M-estimators have tuning variables
  that differ between functions.
                      rlm function
• psi option: allows lots.
• Huber default, Hampel and Tukey bisquare also popular.
                         Big Questions



• If you are trying to measure the mean (or whatever), should you
  use traditional or robust measures?
    – the robust measures are not estimating the mean
        (but are often more reliable at it!)
• Should you be trying to measure the mean?
                          Summary: Robust


• The statistical revolution at the beginning of the 20th century was
  based in part on the Normal distribution.
• Ranked based procedures began appearing in around 1950 and
  have achieved some popularity (though they have problems too)
• But we also assume Normality! Aren’t all psychology datasets
  Normal?
      Extra Selling Point


Robust methods tend to increase
   the likelihood that you will
 correctly reject a hypothesis.
                        Robust Estimators


• Lessen the impact of outliers
   – improves power
   – sometimes will want to also focus on outliers
• Several available. Newer methods are developed frequently.
• Beginning to appear in many statistics programs.



                     HOW TO plus biography
   Take                   Messages

Robust methods are often useful. Not
difficult to run if available in your software.

Looking at your data graphically.
Multilevel Modeling
        Do these all means the same thing?

•   Bayesian hierarchical models
•   contextual models
•   hierarchical linear models
•   hierarchical modeling
•   mixed models
                                   no, but ...
•   random coefficient models
•   random effects models
•   subject specific models
•   variance component models
•   variance heterogeneity
  My first encounter with multilevel modeling


London line-up study from early 1990s


                     ML2
                     ML3
                     MLn




      http://www.cmm.bristol.ac.uk/index.shtml
              Multilevel Modelling

Used when the data structures are nested.
   – Sampling pupils within schools
   – Sampling people within households
   – “Sampling” memories within memory
   – Cross-classified
   – When studies sampled (i.e., meta-analyses)
   – Repeated measures
   – Longitudinal methods
             Hierarchical Data Set


 Population
      Some Schools
            Some Pupils
                   Longitudinal measures

Suppose that you are interested in the relationship
Between mathematics attainment and bullying.

Probability of being bullied will vary between schools.

Sample 10 schools and 10 pupils in each
                   Creating the data

set.seed(2007)
child <- 1:100
school <- rep(1:10,10)
schef <- rep(runif(10,0,1),10)
bully <- rbinom(100,100,schef/2 + runif(100,0,.5))
maths <- rbinom(100,100,1-schef/2 -runif(100,0,.5))



Have one line per lowest level (so per trial if repeated
  measures, per student here)
Can restructure in all stats packages (SPSS wizard)
            Four Approaches: In words

• Ignore multilevel structure
• Aggregate data, treating school as the unit under
  investigation
• Treat “school” as fixed effect
   – Similar to ANCOVA, partialing out school
• Treat “school” as random effect
           Four Approaches: In equations

Let i be for the pupils, and j for the schools

     mathi         0   1bullyi  ei
     mathj         0   1bully j  u j
     mathij        0   1bullyi  ei
                          ( j)


     mathij        0 j   1bullyij  eij
                   0   1bullyij  u j  ei
                 Approach 1: Ignore

• Assumption of independence not valid
• Standard errors will tend to be too large
   – Too likely to get significant effects
      • Pretend a couple of classes happen to have bullies
        and good math teachers
   – Can get effects in opposite directions
• Easiest to run
   – Use ordinary least squares
olsreg1 <- lm(maths~bully)
summary(olsreg1)


Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.79749    4.63696 14.837 < 2e-16 ***
bully       -0.38376    0.08779 -4.371 3.08e-05 ***

Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.'

Residual standard error: 17.74 on 98 degrees of freedom
Multiple R-Squared: 0.1632,     Adjusted R-squared: 0.1546
F-statistic: 19.11 on 1 and 98 DF, p-value: 3.076e-05
plot(bully,maths)
abline(olsreg1)


             80
             60
     maths

             40
             20




                  20   40           60   80

                            bully
   Approach 2: Level of Analyses is School

• Unit of analysis is school, not individual
• Hypotheses must be framed appropriately:
   School with lots of bullying have low math scores



          math j   0   1bully j  u j
        mbully   mmaths
 [1,]    63.3    41.7
 [2,]    25.6    53.4
 [3,]    35.0    69.4     New sample size only 10
 [4,]    57.0    45.0
 [5,]    42.1    55.3
 [6,]    31.8    63.7     tapply(bully,school,mean)
 [7,]    66.2    26.0
 [8,]    46.5    57.9
 [9,]    54.9    56.5
[10,]    65.6    31.8
ols2 <- lm(mmaths~ mbully)
summary(ols2)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 86.7337      9.9038   8.758 2.26e-05 ***
mbully        -0.7513    0.1950 -3.852 0.00486 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

Residual standard error: 8.652 on 8 degrees of freedom
Multiple R-squared: 0.6497,     Adjusted R-squared: 0.6059
F-statistic: 14.84 on 1 and 8 DF, p-value: 0.004864
plot(bully,maths,cex.axis=1.6,cex.lab=1.6)
abline(olsreg1,lwd=2)
points(mbully,mmaths,pch=19,cex=1.5,col="DodgerBlue")
abline(ols2,col="DodgerBlue",lwd=2)

                80
                60
        maths
                40
                20




                     20   40           60   80
                               bully
Lesson: Don't save a file called "draft" since searching for it is difficult.
               Approach 3a: School Intercepts
                  Analysis of Covariance

OLS regression but include k dummy variables for the k schools

    mathij      0( j )   1bullyij  e j
    mathij      0(1)   0(2)     0( k )   1bullyij  e j
    where k is the number of classrooms

             Estimates intercept for each school
             assuming the slopes are the same
 ols3 <- lm(maths~bully+as.factor(school))
 summary(ols3)
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         44.37317    8.00870   5.541 3.03e-07 ***
bully               -0.04223    0.10234 -0.413 0.680849
as.factor(school)2  10.10792    7.69651   1.313 0.192454
as.factor(school)3  26.50488    7.26216   3.650 0.000442 ***
as.factor(school)4   3.03395    6.69081   0.453 0.651328
as.factor(school)5  12.70472    7.00416   1.814 0.073066 .
as.factor(school)6  20.66975    7.39885   2.794 0.006381 **
as.factor(school)7 -15.57753    6.66629 -2.337 0.021697 *
as.factor(school)8  15.49053    6.87802   2.252 0.026771 *
as.factor(school)9  14.44527    6.71493   2.151 0.034168 *
as.factor(school)10 -9.80287    6.66384 -1.471 0.144803

Residual standard error: 14.89 on 89 degrees of freedom
Multiple R-Squared: 0.4647,     Adjusted R-squared: 0.4045
F-statistic: 7.726 on 10 and 89 DF, p-value: 8.8e-09
anova(olsreg1,ols3)

Analysis of Variance Table

Model 1: maths ~ bully
Model 2: maths ~ bully + as.factor(school)
  Res.Df   RSS Df Sum of Sq      F    Pr(>F)
1     98 30853
2     89 19736 9      11116 5.5697 4.502e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
plot(bully,maths)
spredict <- split(predict(ols3),school)
for (i in 1:10)
lines(sbully[[i]],spredict[[i]],col=i)

              80
              60
      maths

              40
              20




                   20   40           60   80

                             bully
ols4 <- lm(maths~bully*as.factor(school))
plot(bully,maths)
spredict <- split(predict(ols4),school)
for (i in 1:10) lines(sbully[[i]],spredict[[i]],col=i)

               80
               60




                                                Avoid yellow
       maths

               40
               20




                    20   40           60   80

                              bully
Coefficients:
                          Estimate Std. Error t value Pr(>|t|)
(Intercept)                90.9199    28.6239   3.176 0.00212 **
bully                      -0.7776     0.4470 -1.740 0.08576 .
as.factor(school)2        -39.4666    30.4781 -1.295 0.19907
as.factor(school)3        -38.9686    30.3789 -1.283 0.20328
as.factor(school)4        -47.6179    33.4518 -1.423 0.15849
as.factor(school)5        -50.1265    30.9291 -1.621 0.10902
as.factor(school)6          3.6652    30.9021   0.119 0.90589
as.factor(school)7        -88.3777    34.3854 -2.570 0.01202 *
as.factor(school)8        -32.4313    31.3624 -1.034 0.30421
as.factor(school)9        -27.3559    33.2073 -0.824 0.41251
as.factor(school)10        -9.7977    34.8759 -0.281 0.77949
bully:as.factor(school)2    0.8536     0.5816   1.468 0.14608
bully:as.factor(school)3    1.2761     0.5186   2.461 0.01601 *
bully:as.factor(school)4    0.8074     0.5350   1.509 0.13521
bully:as.factor(school)5    1.1221     0.5163   2.173 0.03271 *
bully:as.factor(school)6   -0.1937     0.5615 -0.345 0.73105
bully:as.factor(school)7    1.1319     0.5275   2.146 0.03494 *
bully:as.factor(school)8    0.7649     0.5167   1.480 0.14273
bully:as.factor(school)9    0.6489     0.5362   1.210 0.22980
bully:as.factor(school)10   0.0257     0.5363   0.048 0.96190
How about summarize the school intercepts as:

    B0j = B0 + uj

and slopes as

    B1j = B1 + vj

And assuming both uj and vj are normally distributed
with means of zero.

Then you only need to estimate their means and
standard deviations (only two things).
                      Two Libraries
nlme
    – the first big one and so traditionally what was used
    – still needed for some non-linear models
    – better for longitudinal
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., and
   the R Development Core Team (2011). nlme: Linear
   and Nonlinear Mixed Effects Models. R package
   version 3.1-98.
lme4
    – uses S4 classes which are all the rage
    – builds on nlme so the library for the future
    – does generalized linear models
Bates, D., Maechler, M., and Bolker, B. (2011).
  lme4: Linear mixed-effects models using S4 classes.
   R package version 0.999375-38.
library(nlme)
lme1 <- lme(maths~bully,random=~1|school,method="ML")
summary(lme1)
Linear mixed-effects model fit by maximum likelihood
     AIC      BIC    logLik
  849.8016 860.2222 -420.9008

Random effects:
 Formula: ~1 | school
        (Intercept) Residual
StdDev:    10.58926 14.87886

Fixed effects: maths ~ bully
                Value Std.Error DF  t-value p-value
(Intercept) 56.72791 5.977633 89 9.490028 0.0000
bully        -0.13643 0.096183 89 -1.418471 0.1595
 Correlation:
      (Intr)
bully -0.785
intervals(lme1)
Approximate 95% confidence intervals

Fixed effects:
                lower       est.       upper
(Intercept) 44.969852 56.7279078 68.48596403
bully       -0.325625 -0.1364325 0.05275994
attr(,"label")
[1] "Fixed effects:"

Random Effects:
 Level: school
                   lower     est.    upper
sd((Intercept)) 5.980183 10.58926 18.75067

 Within-group standard error:
   lower     est.    upper
12.83830 14.87886 17.24375
lme0 <- lme(maths~1,random=~1|school,method="ML")
anova(lme0,lme1)

 Model df        AIC    BIC    logLik   Test     L.Ratio p-value
lme0      1   3 849.54 857.35 -421.77
lme1      2   4 849.80 860.22 -420.90   1 vs 2    1.73    0.19


lme2 <- lme(maths~bully,random=bully|school,method="ML")
anova(lme1,lme2)
     Model df AIC     BIC    logLik   Test L.Ratio p-value
lme1     1 4 849.80 860.22 -420.90
lme2     2 6 849.26 864.89 -418.63 1 vs 2    4.54    0.10
library(lme4)
lme1 <- lmer(maths~bully + (1|school),REML=FALSE)
summary(lme1)
Linear mixed model fit by maximum likelihood
Formula: maths ~ bully + (1 | school)
   AIC    BIC logLik deviance REMLdev
 849.8 860.2 -420.9     841.8   840.2
Random effects:
 Groups    Name        Variance Std.Dev.
 school    (Intercept) 112.11   10.588
 Residual              221.39   14.879
Number of obs: 100, groups: school, 10   Estimates    slightly
                                          different than nlme
Fixed effects:
            Estimate Std. Error t value
(Intercept) 56.72876    5.91732   9.587
bully       -0.13645    0.09522 -1.433

Correlation of Fixed Effects:
      (Intr)
bully -0.785
                  Math and Bullying

• Traditionally people have
   – ignored nesting
   – used complex and inflexible fixes
   – done analyses at aggregate level (ecological fallacy)
   – treated as a fixed effect and covaried out

• Now multilevel modeling
                  Multilevel ANOVA:
            "My randomization didn't work!"
Chloe found group differences in baseline measures

chl <- file.choose()
# exercise.dat
chloe <- read.table(chl,header=TRUE)
attach(chloe)
wcond <- as.factor(wcond)
summary(aov(sqw1^2~wcond))

             Df Sum Sq Mean Sq F value Pr(>F)
wcond         3   37.67  12.56 2.8527 0.03683 *
Residuals   499 2196.33   4.40
---
Signif. codes:   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
library(lme4)
mod0 <- lmer(sqw1^2 ~ 1 + (1|class))
mod1 <- lmer(sqw1^2 ~ wcond + (1|class))
anova(mod0,mod1)

Data:
Models:
mod0: sqw1^2 ~ 1 + (1 | class)
mod1: sqw1^2 ~ wcond + (1 | class)
      Df    AIC     BIC logLik Chisq ChiDf Pr(>Chisq)
mod0 3 2166.5 2179.2 -1080.2
mod1 6 2169.7 2195.0 -1078.8 2.8267     3    0.4191


      Chloe happy with multilevel modeling now
Hill et al.'s theory
  Baron and Kenny mediation analysis
            with lmer/nlme

      condition -> intention -> exercise

intention ~ condition
exercise ~ intention
exercise ~ condition + exercise
exercise ~ condition


Can do the same sequence of regressions they
suggest and use the same coefficients and standard
errors in the Sobel test.
                 Journal
• In a paragraph, say why you would use
  robust methods and give an example.
• Very briefly, describe a study that you
  would do (related to your area of research)
  that would require multilevel modeling.
        “Simple” linear multilevel modeling

• Could have looked at assumptions (normal residuals for
  all random variables)
• Plots of everything
• If we were doing longitudinal methods with covariance
  structures, then continue with nlme and see Singer and
  Willett's longitudinal book




                 http://www.ats.ucla.edu/stat/examples/alda/
       Ignore Multilevel Structure: ANOVA




What would happen with t.test(Post_QoL ~ Surgery) ?
Ignore structure, ANCOVA
Check interaction
Treating Clinic as a Fixed Effect




     Now surgery is non-significant
            As a multilevel model
         (ANOVA, random intercept)




Post ij  59  1.68 Surgery  u j  eij
Multilevel ANCOVA




           Why no p values?
           Some implementations do.
          Baseline QoL significant

anova(update(multi2, .~. - Surgery),multi2)
Testing if variance of surgery effect same
               across clinics




      Includes by default variance and covariance
Other effects added as with lm.
      Generalized Linear Multilevel
        Juror decision making

set.seed(2008)
juror <- 1:100
jury <- rep(1:10,each=10)
juryef <- rep(runif(10,0,1),each=10)
Eyewit <- rep(rbinom(10,1,.5),each=10)
guilty <- rbinom(100,1,Eyewit/4 +juryef/2
  + runif(100,0,.25))
                        For each jury
mod3 <- glm(guilty~Eyewit + as.factor(jury),binomial)
summary(mod3)
                    Estimate Std. Error   z value Pr(>|z|)
(Intercept)        2.229e-16 6.325e-01 3.52e-16       1.000
Eyewit             4.055e-01 9.037e-01      0.449     0.654
as.factor(jury)2 -4.055e-01 9.037e-01      -0.449     0.654
as.factor(jury)3 -8.901e-17 8.944e-01 -9.95e-17       1.000
as.factor(jury)4   9.808e-01 1.021e+00      0.961     0.337
as.factor(jury)5   7.737e-16 9.129e-01 8.48e-16       1.000
as.factor(jury)6 -8.473e-01 9.361e-01      -0.905     0.365
as.factor(jury)7 -4.055e-01 9.037e-01      -0.449     0.654
as.factor(jury)8   8.473e-01 9.361e-01      0.905     0.365
as.factor(jury)9   9.808e-01 1.021e+00      0.961     0.337
as.factor(jury)10         NA         NA        NA        NA

    Null deviance: 136.66   on 99   degrees of freedom
Residual deviance: 126.42   on 90   degrees of freedom
AIC: 146.42
mod4 <- lmer(guilty~Eyewit + (1|jury),family=binomial)
summary(mod4)
Generalized linear mixed model fit by the Laplace approximation
Formula: guilty ~ Eyewit + (1 | jury)
   AIC   BIC logLik deviance
 139.3 147.2 -66.67    133.3
Random effects:
 Groups Name        Variance   Std.Dev.
 jury   (Intercept) 1.8024e-19 4.2455e-10
Number of obs: 100, groups: jury, 10

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.08004    0.28307 -0.2828   0.7774
Eyewit       0.74329    0.41140 1.8067    0.0708 .

Correlation of Fixed Effects:
       (Intr)
Eyewit -0.688
    Getting from coefficients to probabilities

With glm use predict. Not available for lmer.
Calculate values in logits:
             -0.08004 and 0.74329 - .08004 = .66325

Inverse logit: 1/(1+e-x)
   48% and 66%
              Summary: Learning Points

• What are multilevel models?
   – Nature is hierarchical
      (or at least multilevel)
• Lots today!
• lm, glm, rlm, and lmer have similar structures

								
To top