# One-way Between Subjects Analysis of Variance

Shared by:
Categories
Tags
-
Stats
views:
0
posted:
2/4/2013
language:
English
pages:
37
Document Sample

```							One-way Between Subjects
Analysis of Variance
Multiple t-tests vs. ANOVA
▪ Example: Dr. Smith wants to compare the
perfectionism scores of grad students in
Psychology, Sociology, History and Math
▪ Why do we (usually) run a one-way ANOVA to
test for mean differences instead of running
C=k(k-1)/2 t-tests (k=number of groups,
j=1,...,k)?
controlled the overall type I error rate ... but is
this correct???
Multiple t-tests vs. ANOVA
▪ It is true that if you run k(k-1)/2 = 4(4-1)/2 = 6
tests to compare the programs (each at level α)
that you have approximately a 1-(1-α)C (≃Cα)
chance of making a Type I error, but what if we
impose some type of multiplicity control?
– For example, if we conduct each of the 6 t-tests at αPC = α/C, then
the overall Type I error rate won’t exceed α
▸ In fact, many pairwise multiple comparison
procedures (that locate pairwise mean differences)
are intended to be used without an omnibus test
(more on that later)
Multiple t-tests vs. ANOVA

▪ Another important distinction between the
omnibus ANOVA F test and multiple t tests is
that even though the ANOVA F is not
significant, there could still be significant
pairwise differences (even when controlling for
inflated Type I errors)
▸ The ANOVA F is pooling all of the mean differences
and therefore it is not a precise test of whether there
are any significant pairwise mean differences
One-way ANOVA
▪ Regardless of the fact that the one-way
ANOVA may not be a necessary tool for
analysis, it remains popular because:
▸ It provides an easy way to summarize null findings
▸ It illustrates the use of a pooled error term
▸ It provides the basis for more complicated omnibus
tests, such as tests of higher-order interactions
▸ Some multiple comparison procedures do require
that an omnibus test be used as a preliminary test of
the existence of mean differences
One-way Between-Subjects
ANOVA Model
▪ Yij = μ + τj + εij, where
– Yij is the score of the ith subject (i=1,...,nj) in the jth group (j=1,...,k)
– μ is the population grand mean
– τj is the fixed treatment effect for the jth group (μj -μ)
– εij is the random error component for the ith subject in the jth group
(Yij - μj)

▪ Assumptions:
– εij ∼NID (0, σ2) (more on that to come)
▪ Hypotheses:
– Ho: μ1 = μ2 = ... = μk
– H1: The population means are not all equal
– Recall why Ho: μ1 ≠ μ2 ≠ ... ≠ μk is not correct
Partitioning the Variability
▪ Between Group variability: differences
between the mean scores in each group
▸ Why do mean differences exist?
– Effect of the IV on the DV (or relationship between the IV
and DV if we are using naturally occurring groups)
– Error
▪ Within Group variability: variability of the
scores within the groups
▸ Why do scores within the groups differ?
– Error
Understanding the One-way
ANOVA

▪ ANOVA F (after Fisher) = ratio of between
group variability to within group variability

Between Group Variability
F=
Within Group Variability
s t2 + s E
2
MS treatment
F =            =
sE2
MS error
Calculations

       n j  X j  X .. 
2
SStreat          j
MStreat   =          =
df treat                    k1

             X               
2
SSerror          j      i        ij    Xj
MSerror   =          =
df error                Nk

  X                          
2
SStotal          j      i        ij    X ..
MStotal   =          =
df total                    N1
ANOVA Summary Table
Source         SS         df            MS                   F
Treatment      SStreat   dftreat    MStreat =               F=
SStreat / dftreat   MStreat / MSerror
Error        SSerror   dferror    MSerror =
SSerror / dferror
Total        SStotal   dftotal

▪ Ho: μ1 = μ2 = ... = μk is rejected if F ≥Fα,dft, dfe
▪ Recall that the ANOVA F tests the global
hypothesis that there are no differences
between groups (although differences between
groups may still exist!)
Example
▪ A researcher is interested in determining if the
fatigue levels (0-15) of married women differ as
a function of how they classify their husbands
involvement in housework (not involved,
somewhat involved, involved)
▪ Data:
▸ Not involved: 9, 12, 4, 8, 7
▸ Somewhat involved: 4, 6, 8, 2, 10
▸ Involved: 1, 3, 4, 5, 2
Example, cont’d

▪ Ho: μNI = μSI = μI
▸ SStot = ΣX2 - [(ΣX)2/N] = 629 - [(85)2/15] = 147.33
▸ SStreat = {Σj [(ΣXj)2/nj]} - [(ΣX)2/N] =
[(402/5)+(302/5)+(152/5)] - [(85)2/15]= 63.33
▸ SSerror = ΣX2 - {Σj [(ΣXj)2/nj]} = 629 -
[(402/5)+(302/5)+(152/5)] = 84.00
Example, cont’d
▸ dftot = N - 1 = 15 - 1 = 14
▸ dftreat = k - 1 = 3 - 1 = 2
▸ dferror = N - k = 15 - 3 = 12
▸ MStreat = SStreat / dftreat = 63.33 / 2 = 31.67
▸ MSerror = SSerror / dferror = 84 / 12 = 7.00
▪ F = MStreat / MSerror = 31.67 / 7.00 = 4.52
▸ F.05,2,12 = 3.88
▸ R/SPSS p-value = .034
▪ Therefore, we reject the null hypothesis that
the means are all equal
Strength of the Relationship
▪ Recall that a significant F test does not tell us
how ‘strong’ the relationship is
▪ η2 = proportion of variability in the DV that can
be explained by the IV
▪ η2 = SStreat / SStotal
▪ For our example, η2 = 63.33 / 147.33 = .43
▪ 43% of the variability in fatigue is explained by
the husband’s involvement in housework (large
effect!)
Strength of the Relationship - ω2
▪ η2 provides a slightly biased (upwards)
estimate of the strength of the relationship
between an IV and a DV
▪ Therefore, several authors have
recommended the use of ω2 as an alternative to
η2
SS treat  ( k  1) MS error 63.33  (2)(7)
 =
2
=               = .32
SS total + MS error      147.33 + 7
Effect Size – f/Φ'
▪ Another useful measure of effect size is
Cohen’s f (which Howell calls Φ')
▪ f is a standardized mean difference statistic
that is very similar to the d-family based
RMSSE effect size measure that Howell also
presents
▸ We use f/Φ' because this measure will also be used
in power calculations
Effect Size – f/Φ'

               
2
j                /k
f = ' =
j

s e2

▸ In the absence of useful information for interpreting
f, Cohen recommended:
– Small = .1 - .25, med = .25 - .4, large = .4+
Power Calculations
▪ When planning any study, it is important that
we investigate power a priori
▪ As in the two independent samples design, we
need an estimated effect size in order to
calculate power (note that j = 1, ..., k)

               
2
j                /k
' =
j

s e2
Power Calculations, cont’d
▪ When calculating the effect size, it is important
to consider what differences among the groups
would be meaningful (relative to the variability of
the groups)
▸ If no information is available for calculating an
effect size, Cohen suggests the following for ϕ’:
– .10 = small, .25 = medium, .40 = large
▪ Incorporating the sample size and effect size
gives:
 =  n
Power Calculation Example
▪ Dr. Jones wants to compare three different
cultural groups on levels of conservativeness
▪ How much power would Dr. Jones have with
10 subjects per group, meaningful differences in
the means of: european = 4, south asian = 6,
middle east = 8, and average error variance of 3
             
2
j              /k
' =
se
2

(4  6) 2 + (6  6) 2 + (8  6) 2 / 3
=                                            = .942
3
Power Calculation Example, cont’d

▪ Therefore, ϕ=.942 X sqrt(10) = 2.98
▪ From Appendix “ncF”
▸ ϕ = 2.98, dft = 2, dfe = N-k = 30-3 = 27, and α = .05
▸ Power = 1 -.01 = .99 (or 99% power)
▪ Note that in order to calculate the n required
for a given power we still calculate the
meaningful effect size, but then we reorganize
the formula for ϕ
n =  / 
2      2
Sample Size Calculation Example
▪ Dr. Jones wants to compare three different
cultural groups on levels of conservativeness
▸ How many subjects would Dr. Jones need in order
to have 90% power with meaningful differences in
the means of: european = 4, south asian = 6, middle
east = 8, and average error variance of 3

             
2
j              /k
' =
se
2

(4  6) 2 + (6  6) 2 + (8  6) 2 / 3
=                                           = .942
3
Sample Size Calculation Example
2    2.2 2
n= 2 =       2 = 5.45
   .942
▪ Note that ϕ (2.2) comes from Appendix ncF,
and is the value of ϕ that most closely
approximates 90% power for an appropriate
error df
▪ With n = 6 subjects per group, and ϕ’=.942,
G*Power gives us a power of .91
▸ Note in G*Power that f = ϕ’
Assumptions
▪ The assumptions required for obtaining a valid
F test are:
▸ Samples are randomly and independently selected
from their respective populations
▸ Scores in each population are normally distributed
▸ Variances in each population are equal
▪ Note:
▸ The independence assumption is extremely
important and should be considered in the design of
the experiment
Assumptions, cont’d
▪ Consequences of violating assumptions:
– If sample sizes and variances are unequal, Type I error
rates can deviate considerably from α
– Positively paired ns and σ2s produce a conservative F
– Negatively paired ns and σ2s produce a liberal F
– However, with more than two groups it is often more difficult to
identify a pattern
– If data are nonnormal, Type I error rates may not deviate
much from α, however the power of other procedures may
be much higher than the F test
– If data are nonnormal and variances are unequal the F
test becomes severely biased with respect to both Type I
and Type II error rates
Alternatives to the ANOVA F

▪ Unequal Variances
▸ When variances are determined to be unequal
(Levene/variance ratio tests) the omnibus Welch test
▸ Transformations may also be useful when the
means and variances/standard deviations are
proportional
Welch Test
      
wj X j  X .   
' 2

k1
F' =                                                    2
2( k  2)      1                 wj 
1+                      

 n  1  1 

k2  1        j                 wj 


wj =
nj
X =
'      w X      j       j

w
.
s2
j                                                j

k2  1
df ' =                                               2
 1   wj 
3       

 n  1 

 j       wj 

Alternatives to the ANOVA F

▪ Nonnormality
▸ When the distributions are nonnormal (but similar in
shape) and the variances are equal a nonparametric
test (e.g., Kruskal-Wallis) can provide much more
power than the ANOVA F
▸ Transformations may also be useful to make the
distribution shapes more normal
▸ Trimmed means also help to reduce the effects of
extreme observations
Kruskal-Wallis Test
▪ The Kruskal-Wallis H test is a nonparametric
procedure that can be used to compare k
independent populations
▸ It is identical to an ANOVA F test on the ranks
▪ All N = n1 + n2 + ... + nk observations are
jointly ranked (i.e., ignore group membership)
▸ As with the Mann-Whitney test, tied observations
are assigned the average of the ranks they occupy
▪ Calculate T1, T2, ... TK, where T is the sum of
the ranks for each group
Kruskal-Wallis Test
▪ The null hypothesis is rejected if H is greater
than a critical χ2 value (df = k - 1)
▸ Ho: There are no differences between the groups
▸ H1: There are differences between the groups
– Recall: Kruskal-Wallis is a test of mean differences only if
we can assume that the distributions are the same shape
and that the variances of the groups are equal

2
12       T
H =
N ( N + 1)  n  3 N + 1
Kruskal-Wallis Example
▪ Four groups of students were randomly
assigned to be taught with one of four different
techniques, and their achievement scores were
recorded. Are the distributions of test scores
the same? (i.e., are all the groups the same?)
▸ Data (ranks are in parentheses)
Method 1   Method 2   Method 3   Method 4
65 (3)     75 (7)     59 (1)     94 (16)
87 (13)    69 (5)     78 (8)     89 (15)
73 (6)     83 (12)    67 (4)     80 (10)
79 (9)     81 (11)    62 (2)     88 (14)
Kruskal-Wallis Example
▪ Sum of the Ranks for Each Group
▸ T1 = 31, T2 = 35, T3 = 15, T4 = 55
▸ χ2 critical (α = .05, df = k - 1 = 4 - 1 = 3) is 7.81
(Appendix B.8)

12          T2
N ( N + 1) 
H =                     3 N + 1
n
12      31 2 35 2 15 2 55 2 
=               +    +    +       3(17 )
16 (17 )  4     4    4    4 
= 8.96
Kruskal-Wallis Example
▪ Therefore, since H (8.96) > χ2 critical (7.81) we
would reject the null hypothesis and conclude
that the teaching methods differ
▪ As with the one-way ANOVA, we would most
likely follow-up this result in order to determine
exactly where differences between the
conditions exist
▸ These tests can be conducted with the non-
parametric two independent samples Mann- Whitney
test
Alternatives to the ANOVA F
▪ Nonnormality/Unequal Variances
▸ When distributions are nonnormal and variances
are unequal the Welch test on trimmed means or the
Welch test on ranks can provide accurate Type I
error rates for most conditions
– As we have dealt in depth with these tests in the past, we
won’t deal with them in detail now. The same procedures
that were applied for the two independent-samples design
are also applied here (also, as before, an R function for
computing the one-way F test with trimmed means is
available on my website)
Alternatives to the ANOVA F
▪ Nonnormality/Unequal Variances
▸ Transformations may also be effective at equating
the variances and normalizing the data, especially
when the means and vars/sds are proportional
– Transformations become complicated with k>2 groups
because often the groups have different shapes and thus
one transformation will not normalize all of the groups
▸ Resampling procedures may also be extremely
effective, although their current use is limited by their
availability
– e.g., Howell only presents resampling procedures up to
the two-group case
Notes Regarding Effect Sizes
▪ Currently standardized effect size statistics for
the Welch tests are not widely available, and
therefore at this time simple mean differences
will suffice
▪ For the Kruskall-Wallis, it is important to recall
that it is just the ANOVA on ranks (though the
K-W has adjustments for ties, etc.)
▸ Therefore, performing an ANOVA on ranks will
allow you to produce eta-squared (or omega-
squared)
Extension: Equivalence Tests
▪ Recall that if the goal of a study is to test
whether all group/conditions are EQUIVALENT
on the outcome measure, then the one-way
tests just discussed are not appropriate
▸ In other words, you would want to accept the null,
which is not appropriate with standard null
hypothesis testing procedures
▪ One-way tests of equivalence should be used
when the goal is to demonstrate that multiple
groups are equivalent on an outcome

```
Other docs by hcj