Docstoc

One-way Between Subjects Analysis of Variance

Document Sample
One-way Between Subjects Analysis of Variance Powered By Docstoc
					One-way Between Subjects
  Analysis of Variance
     Multiple t-tests vs. ANOVA
▪ Example: Dr. Smith wants to compare the
perfectionism scores of grad students in
Psychology, Sociology, History and Math
▪ Why do we (usually) run a one-way ANOVA to
test for mean differences instead of running
C=k(k-1)/2 t-tests (k=number of groups,
j=1,...,k)?
▪ The answer you probably received was that it
controlled the overall type I error rate ... but is
this correct???
      Multiple t-tests vs. ANOVA
▪ It is true that if you run k(k-1)/2 = 4(4-1)/2 = 6
tests to compare the programs (each at level α)
that you have approximately a 1-(1-α)C (≃Cα)
chance of making a Type I error, but what if we
impose some type of multiplicity control?
    – For example, if we conduct each of the 6 t-tests at αPC = α/C, then
    the overall Type I error rate won‟t exceed α
 ▸ In fact, many pairwise multiple comparison
 procedures (that locate pairwise mean differences)
 are intended to be used without an omnibus test
 (more on that later)
     Multiple t-tests vs. ANOVA

▪ Another important distinction between the
omnibus ANOVA F test and multiple t tests is
that even though the ANOVA F is not
significant, there could still be significant
pairwise differences (even when controlling for
inflated Type I errors)
 ▸ The ANOVA F is pooling all of the mean differences
 and therefore it is not a precise test of whether there
 are any significant pairwise mean differences
             One-way ANOVA
▪ Regardless of the fact that the one-way
ANOVA may not be a necessary tool for
analysis, it remains popular because:
 ▸ It provides an easy way to summarize null findings
 ▸ It illustrates the use of a pooled error term
 ▸ It provides the basis for more complicated omnibus
 tests, such as tests of higher-order interactions
 ▸ Some multiple comparison procedures do require
 that an omnibus test be used as a preliminary test of
 the existence of mean differences
      One-way Between-Subjects
           ANOVA Model
▪ Yij = μ + τj + εij, where
    – Yij is the score of the ith subject (i=1,...,nj) in the jth group (j=1,...,k)
    – μ is the population grand mean
    – τj is the fixed treatment effect for the jth group (μj -μ)
    – εij is the random error component for the ith subject in the jth group
    (Yij - μj)

▪ Assumptions:
   – εij ∼NID (0, σ2) (more on that to come)
▪ Hypotheses:
   – Ho: μ1 = μ2 = ... = μk
   – H1: The population means are not all equal
    – Recall why Ho: μ1 ≠ μ2 ≠ ... ≠ μk is not correct
        Partitioning the Variability
▪ Between Group variability: differences
between the mean scores in each group
 ▸ Why do mean differences exist?
   – Effect of the IV on the DV (or relationship between the IV
   and DV if we are using naturally occurring groups)
   – Error
▪ Within Group variability: variability of the
scores within the groups
 ▸ Why do scores within the groups differ?
   – Error
     Understanding the One-way
              ANOVA

▪ ANOVA F (after Fisher) = ratio of between
group variability to within group variability

           Between Group Variability
        F=
           Within Group Variability
            s t2 + s E
                     2
                         MS treatment
        F =            =
                sE2
                          MS error
                  Calculations

                                n j  X j  X .. 
                                                         2
            SStreat          j
MStreat   =          =
            df treat                    k1

                                      X               
                                                             2
            SSerror          j      i        ij    Xj
MSerror   =          =
            df error                Nk


                           X                          
                                                             2
            SStotal          j      i        ij    X ..
MStotal   =          =
            df total                    N1
              ANOVA Summary Table
   Source         SS         df            MS                   F
  Treatment      SStreat   dftreat    MStreat =               F=
                                     SStreat / dftreat   MStreat / MSerror
    Error        SSerror   dferror    MSerror =
                                     SSerror / dferror
    Total        SStotal   dftotal


▪ Ho: μ1 = μ2 = ... = μk is rejected if F ≥Fα,dft, dfe
▪ Recall that the ANOVA F tests the global
hypothesis that there are no differences
between groups (although differences between
groups may still exist!)
                    Example
▪ A researcher is interested in determining if the
fatigue levels (0-15) of married women differ as
a function of how they classify their husbands
involvement in housework (not involved,
somewhat involved, involved)
▪ Data:
 ▸ Not involved: 9, 12, 4, 8, 7
 ▸ Somewhat involved: 4, 6, 8, 2, 10
 ▸ Involved: 1, 3, 4, 5, 2
               Example, cont’d

▪ Ho: μNI = μSI = μI
 ▸ SStot = ΣX2 - [(ΣX)2/N] = 629 - [(85)2/15] = 147.33
 ▸ SStreat = {Σj [(ΣXj)2/nj]} - [(ΣX)2/N] =
 [(402/5)+(302/5)+(152/5)] - [(85)2/15]= 63.33
 ▸ SSerror = ΣX2 - {Σj [(ΣXj)2/nj]} = 629 -
 [(402/5)+(302/5)+(152/5)] = 84.00
                Example, cont’d
 ▸ dftot = N - 1 = 15 - 1 = 14
 ▸ dftreat = k - 1 = 3 - 1 = 2
 ▸ dferror = N - k = 15 - 3 = 12
 ▸ MStreat = SStreat / dftreat = 63.33 / 2 = 31.67
 ▸ MSerror = SSerror / dferror = 84 / 12 = 7.00
▪ F = MStreat / MSerror = 31.67 / 7.00 = 4.52
 ▸ F.05,2,12 = 3.88
 ▸ R/SPSS p-value = .034
▪ Therefore, we reject the null hypothesis that
the means are all equal
     Strength of the Relationship
▪ Recall that a significant F test does not tell us
how „strong‟ the relationship is
▪ η2 = proportion of variability in the DV that can
be explained by the IV
▪ η2 = SStreat / SStotal
▪ For our example, η2 = 63.33 / 147.33 = .43
▪ 43% of the variability in fatigue is explained by
the husband‟s involvement in housework (large
effect!)
  Strength of the Relationship - ω2
▪ η2 provides a slightly biased (upwards)
estimate of the strength of the relationship
between an IV and a DV
▪ Therefore, several authors have
recommended the use of ω2 as an alternative to
η2
      SS treat  ( k  1) MS error 63.33  (2)(7)
   =
    2
                                  =               = .32
           SS total + MS error      147.33 + 7
              Effect Size – f/Φ'
▪ Another useful measure of effect size is
Cohen‟s f (which Howell calls Φ')
▪ f is a standardized mean difference statistic
that is very similar to the d-family based
RMSSE effect size measure that Howell also
presents
 ▸ We use f/Φ' because this measure will also be used
 in power calculations
                Effect Size – f/Φ'


                               
                                      2
                       j                /k
    f = ' =
                  j

                           s e2

▸ In the absence of useful information for interpreting
f, Cohen recommended:
 – Small = .1 - .25, med = .25 - .4, large = .4+
          Power Calculations
▪ When planning any study, it is important that
we investigate power a priori
▪ As in the two independent samples design, we
need an estimated effect size in order to
calculate power (note that j = 1, ..., k)

                         
                                2
                 j                /k
   ' =
          j

                     s e2
       Power Calculations, cont’d
▪ When calculating the effect size, it is important
to consider what differences among the groups
would be meaningful (relative to the variability of
the groups)
 ▸ If no information is available for calculating an
 effect size, Cohen suggests the following for ϕ‟:
  – .10 = small, .25 = medium, .40 = large
▪ Incorporating the sample size and effect size
gives:
               =  n
      Power Calculation Example
▪ Dr. Jones wants to compare three different
cultural groups on levels of conservativeness
▪ How much power would Dr. Jones have with
10 subjects per group, meaningful differences in
the means of: european = 4, south asian = 6,
middle east = 8, and average error variance of 3
                        
                               2
                  j              /k
  ' =
                      se
                       2



         (4  6) 2 + (6  6) 2 + (8  6) 2 / 3
  =                                            = .942
                          3
Power Calculation Example, cont’d

▪ Therefore, ϕ=.942 X sqrt(10) = 2.98
▪ From Appendix “ncF”
 ▸ ϕ = 2.98, dft = 2, dfe = N-k = 30-3 = 27, and α = .05
 ▸ Power = 1 -.01 = .99 (or 99% power)
▪ Note that in order to calculate the n required
for a given power we still calculate the
meaningful effect size, but then we reorganize
the formula for ϕ
            n =  / 
                  2      2
  Sample Size Calculation Example
▪ Dr. Jones wants to compare three different
cultural groups on levels of conservativeness
 ▸ How many subjects would Dr. Jones need in order
 to have 90% power with meaningful differences in
 the means of: european = 4, south asian = 6, middle
 east = 8, and average error variance of 3

                         
                                2
                   j              /k
   ' =
                       se
                        2



          (4  6) 2 + (6  6) 2 + (8  6) 2 / 3
    =                                           = .942
                           3
 Sample Size Calculation Example
    2    2.2 2
  n= 2 =       2 = 5.45
       .942
▪ Note that ϕ (2.2) comes from Appendix ncF,
and is the value of ϕ that most closely
approximates 90% power for an appropriate
error df
▪ With n = 6 subjects per group, and ϕ‟=.942,
G*Power gives us a power of .91
 ▸ Note in G*Power that f = ϕ‟
                Assumptions
▪ The assumptions required for obtaining a valid
F test are:
 ▸ Samples are randomly and independently selected
 from their respective populations
 ▸ Scores in each population are normally distributed
 ▸ Variances in each population are equal
▪ Note:
 ▸ The independence assumption is extremely
 important and should be considered in the design of
 the experiment
             Assumptions, cont’d
▪ Consequences of violating assumptions:
  – If sample sizes and variances are unequal, Type I error
  rates can deviate considerably from α
    – Positively paired ns and σ2s produce a conservative F
    – Negatively paired ns and σ2s produce a liberal F
    – However, with more than two groups it is often more difficult to
    identify a pattern
  – If data are nonnormal, Type I error rates may not deviate
  much from α, however the power of other procedures may
  be much higher than the F test
  – If data are nonnormal and variances are unequal the F
  test becomes severely biased with respect to both Type I
  and Type II error rates
     Alternatives to the ANOVA F


▪ Unequal Variances
 ▸ When variances are determined to be unequal
 (Levene/variance ratio tests) the omnibus Welch test
 can be adopted
 ▸ Transformations may also be useful when the
 means and variances/standard deviations are
 proportional
                               Welch Test
                         
                       wj X j  X .   
                                    ' 2


                           k1
F' =                                                    2
            2( k  2)      1                 wj 
       1+                      
                                   
                           n  1  1 
                                                    
             k2  1        j                 wj 
                                                    



   wj =
              nj
                                      X =
                                           '      w X      j       j

                                                   w
                                           .
              s2
               j                                                j




                              k2  1
       df ' =                                               2
                    1   wj 
                3       
                            
                    n  1 
                                   
                    j       wj 
                                   
     Alternatives to the ANOVA F

▪ Nonnormality
 ▸ When the distributions are nonnormal (but similar in
 shape) and the variances are equal a nonparametric
 test (e.g., Kruskal-Wallis) can provide much more
 power than the ANOVA F
 ▸ Transformations may also be useful to make the
 distribution shapes more normal
 ▸ Trimmed means also help to reduce the effects of
 extreme observations
           Kruskal-Wallis Test
▪ The Kruskal-Wallis H test is a nonparametric
procedure that can be used to compare k
independent populations
▪ All N = n1 + n2 + ... + nk observations are
jointly ranked (i.e., treated as one large sample
when applying the ranks)
 ▸ As with the Mann-Whitney test, tied observations
 are assigned the average of the ranks they occupy
▪ Calculate T1, T2, ... TK, where T is the sum of
the ranks for each group
               Kruskal-Wallis Test
▪ The null hypothesis is rejected if H is greater
than a critical χ2 value (df = k - 1)
 ▸ Ho: There are no differences between the groups
 ▸ H1: There are differences between the groups
   – Recall: Kruskal-Wallis is a test of mean differences only if
   we can assume that the distributions are the same shape
   and that the variances of the groups are equal

                                    2
                 12       T
         H =
             N ( N + 1)  n  3 N + 1
         Kruskal-Wallis Example
▪ Four groups of students were randomly
assigned to be taught with one of four different
techniques, and their achievement scores were
recorded. Are the distributions of test scores
the same? (i.e., are all the groups the same?)
 ▸ Data (ranks are in parentheses)
   Method 1   Method 2   Method 3   Method 4
   65 (3)     75 (7)     59 (1)     94 (16)
   87 (13)    69 (5)     78 (8)     89 (15)
   73 (6)     83 (12)    67 (4)     80 (10)
   79 (9)     81 (11)    62 (2)     88 (14)
          Kruskal-Wallis Example
▪ Sum of the Ranks for Each Group
 ▸ T1 = 31, T2 = 35, T3 = 15, T4 = 55
 ▸ χ2 critical (α = .05, df = k - 1 = 4 - 1 = 3) is 7.81
 (Appendix B.8)

            12          T2
        N ( N + 1) 
    H =                     3 N + 1
                        n
        12      31 2 35 2 15 2 55 2 
    =               +    +    +       3(17 )
      16 (17 )  4     4    4    4 
    = 8.96
         Kruskal-Wallis Example
▪ Therefore, since H (8.96) > χ2 critical (7.81) we
would reject the null hypothesis and conclude
that the teaching methods differ
▪ As with the one-way ANOVA, we would most
likely follow-up this result in order to determine
exactly where differences between the
conditions exist
 ▸ These tests can be conducted with the non-
 parametric two independent samples Mann- Whitney
 test
     Alternatives to the ANOVA F
▪ Nonnormality/Unequal Variances
 ▸ When distributions are nonnormal and variances
 are unequal the Welch test on trimmed means or the
 Welch test on ranks can provide accurate Type I
 error rates for most conditions
  – As we have dealt in depth with these tests in the past, we
  won‟t deal with them in detail now. The same procedures
  that were applied for the two independent-samples design
  are also applied here (also, as before, an R function for
  computing the one-way F test with trimmed means is
  available on my website)
     Alternatives to the ANOVA F
▪ Nonnormality/Unequal Variances
 ▸ Transformations may also be effective at equating
 the variances and normalizing the data, especially
 when the means and vars/sds are proportional
  – Transformations become complicated with k>2 groups
  because often the groups have different shapes and thus
  one transformation will not normalize all of the groups
 ▸ Resampling procedures may also be extremely
 effective, although their current use is limited by their
 availability
  – e.g., Howell only presents resampling procedures up to
  the two-group case
    Notes Regarding Effect Sizes
▪ Currently standardized effect size statistics for
the Welch tests are not widely available, and
therefore at this time simple mean differences
will suffice
▪ For the Kruskall-Wallis, it is important to
realize that that is just the ANOVA on ranks
(though the K-W has adjustments for ties, etc.)
 ▸ Therefore, performing an ANOVA on ranks will
 allow you to produce eta-squared (or omega-
 squared)
   Extension: Equivalence Tests
▪ Recall that if the goal of a study is to test
whether all group/conditions are EQUIVALENT
on the outcome measure, then the one-way
tests just discussed are not appropriate
 ▸ In other words, you would want to accept the null,
 which is not appropriate with standard null
 hypothesis testing procedures
▪ One-way tests of equivalence should be used
when the goal is to demonstrate that multiple
groups are equivalent on an outcome

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:42
posted:3/8/2011
language:English
pages:37