# Analysis of Variance.ppt

Document Sample

T-TESTS AND
ANALYSIS OF VARIANCE
Jennifer Kensler
Laboratory for Interdisciplinary Statistical Analysis
Virginia Tech’s source for expert statistical analysis since 1948
www.lisa.stat.vt.edu

Collaboration:
Personalized statistical advice
Great advice right now:
Meet with LISA before
collecting your data

Short Courses:
Designed to help
graduate students apply
statistics in their research

Walk-In Consulting:
Monday—Friday* 12-2PM
for questions <30 minutes
* Mon—Thurs in summer
* We help with research—not
class projects or homework
Laboratory for Interdisciplinary Statistical Analysis
Virginia Tech’s source for expert statistical analysis since 1948
www.lisa.stat.vt.edu

Collaboration:
Personalized statistical advice
Great advice right now:
Meet with LISA before
collecting your data

Short Courses:
Designed to help
graduate students apply
statistics in their research

Walk-In Consulting:
Monday—Friday* 12-2PM
for questions <30 minutes
* Mon—Thurs in summer
* We help with research—not
class projects or homework
T-TESTS AND
ANALYSIS OF VARIANCE
ONE SAMPLE T-TEST
5
ONE SAMPLE T-TEST
   Used to test whether the population mean is
different from a specified value.

   Example: Is the mean height of 12 year old girls
greater than 60 inches?

6
STEP 1: FORMULATE THE HYPOTHESES
   The population mean is not equal to a specified
value.
H0: μ = μ0
Ha: μ ≠ μ0
   The population mean is greater than a specified
value.
H0: μ = μ0
Ha: μ > μ0
   The population mean is less than a specified value.
H0: μ = μ0
Ha: μ < μ0                                            7
STEP 2: CHECK THE ASSUMPTIONS
   The sample is random.

   The population from which the sample is drawn
is either normal or the sample size is large.

8
STEPS 3-5
   Step 3: Calculate the test statistic:

y  0
t
s/ n
n

  yi  y 2
Where     s   i 1
n 1

   Step 4: Calculate the p-value based on the
appropriate alternative hypothesis.

9
   Step 5: Write a conclusion.
IRIS EXAMPLE
   A researcher would like to know whether the
mean sepal width of a variety of irises is different
from 3.5 cm.

   The researcher randomly measures the sepal
width of 50 irises.

   Step 1: Hypotheses
H0: μ = 3.5 cm
Ha: μ ≠ 3.5 cm
10
JMP
   Steps 2-4:
JMP Demonstration
Analyze  Distribution
Y, Columns: Sepal Width

Test Mean
Specify Hypothesized Mean: 3.5

11
JMP OUTPUT

 Step 5 Conclusion: The mean sepal width is not
significantly different from 3.5 cm.

12
TWO SAMPLE T-TEST
13
TWO SAMPLE T-TEST
   Two sample t-tests are used to determine
whether the population mean of one group is
equal to, larger than or smaller than the
population mean of another group.

   Example: Is the mean cholesterol of people taking
drug A lower than the mean cholesterol of people
taking drug B?

14
STEP 1: FORMULATE THE HYPOTHESES
   The population means of the two groups are not
equal.
H0: μ1 = μ2
Ha: μ1 ≠ μ2
   The population mean of group 1 is greater than the
population mean of group 2.
H0: μ1 = μ2
Ha: μ1 > μ2
   The population mean of group 1 is less than the
population mean of group 2.
H0: μ1 = μ2                                          15
Ha: μ1 < μ2
STEP 2: CHECK THE ASSUMPTIONS
   The two samples are random and independent.

   The populations from which the samples are
drawn are either normal or the sample sizes are
large.

   The populations have the same standard
deviation.

16
STEPS 3-5
   Step 3: Calculate the test statistic
y1  y2
t
1 1
sp      
n1 n2

(n1  1) s12  (n2  1) s2
2

where     sp 
n1  n2  2

 Step 4: Calculate the appropriate p-value.
 Step 5: Write a Conclusion.

17
TWO SAMPLE EXAMPLE
   A researcher would like to know whether the
mean sepal width of setosa irises is different from
the mean sepal width of versicolor irises.

   Step 1 Hypotheses:
H0: μsetosa = μversicolor
Ha: μsetosa ≠ μversicolor

18
JMP
   Steps 2-4:
JMP Demonstration:
Analyze  Fit Y By X
Y, Response: Sepal Width
X, Factor: Species

19
JMP OUTPUT

 Step 5 Conclusion: There is strong evidence (p-
value < 0.0001) that the mean sepal widths for
the two varieties are different.

20
PAIRED T-TEST
21
PAIRED T-TEST
   The paired t-test is used to compare the means of
two dependent samples.

   Example:
A researcher would like to determine if
background noise causes people to take longer to
complete math problems. The researcher gives 20
subjects two math tests one with complete silence
and one with background noise and records the
time each subject takes to complete each test.
22
STEP 1: FORMULATE THE HYPOTHESES
   The population mean difference is not equal to zero.
H0: μdifference = 0
Ha: μdifference ≠ 0
   The population mean difference is greater than
zero.
H0: μdifference = 0
Ha: μdifference > 0
   The population mean difference is less than a zero.
H0: μdifference = 0
Ha: μdifference < 0
23
STEP 2: CHECK THE ASSUMPTIONS
   The sample is random.

   The data is matched pairs.

   The differences have a normal distribution or the
sample size is large.

24
STEPS 3-5
   Step 3: Calculate the test Statistic:

d 0
t
sd / n
Where d bar is the mean of the differences and sd
is the standard deviations of the differences.

   Step 4: Calculate the p-value.

   Step 5: Write a conclusion.
25
PAIRED T-TEST EXAMPLE
   A researcher would like to determine whether a
fitness program increases flexibility. The
researcher measures the flexibility (in inches) of
12 randomly selected participants before and
after the fitness program.

   Step 1: Formulate a Hypothesis
H0: μAfter - Before = 0
Ha: μ After - Before > 0

26
PAIRED T-TEST EXAMPLE
   Steps 2-4:
JMP Analysis:
Create a new column of After – Before
Analyze  Distribution
Y, Columns: After – Before

Test Mean
Specify Hypothesized Mean: 0

27
JMP OUTPUT

Step 5 Conclusion: There is not evidence that
the fitness program increases flexibility.

28
ONE-WAY ANALYSIS OF
VARIANCE
29
ONE-WAY ANOVA
   ANOVA is used to determine whether three or
more populations have different distributions.

A       B            C
30
Medical Treatment
ANOVA STRATEGY

The   first step is to use the ANOVA F test to
determine if there are any significant differences
among means.

   If the ANOVA F test shows that the means are
not all the same, then follow up tests can be
performed to see which pairs of means differ.

31
ONE-WAY ANOVA MODEL
yij  i   ij
Where
yij is the responseof the jth trial on the ith factor level
i is the mean of the ith group
 ij ~ N (0,  2 )
i  1,, r
j  1, , ni

In other words, for each group the observed
value is the group mean plus some random
variation.
32
ONE-WAY ANOVA HYPOTHESIS
   Step 1: We test whether there is a difference in
the means.

H 0 : 1  2    r
H a : The i are not all equal.

33
STEP 2: CHECK ANOVA ASSUMPTIONS
 The samples are random and independent of each
other.
 The populations are normally distributed.

 The populations all have the same variance.

   The ANOVA F test is robust to the assumptions
of normality and equal variances.
34
STEP 3: ANOVA F TEST

A      B        C             A     B      C

Medical Treatment

Compare the variation within the samples to the   35

variation between the samples.
ANOVA TEST STATISTIC

Variation between Groups MSG
F                            
Variation within Groups   MSE

Variation within groups           Variation within groups
small compared with               large compared with
variation between groups          variation between groups   36
→ Large F                         → Small F
MSG
   The mean square for groups, MSG, measures the
variability of the sample averages.
   SSG stands for sums of squares groups.

SSG
MSG 
r -1
n1 ( y1  y ) 2  n 2 ( y2  y ) 2    n r ( y1  y ) 2

r -1
37
MSE
 Mean square error, MSE, measures the variability
within the groups.
 SSE stands for sums of squares error.

SSE
MSE 
n-r
(n1 - 1)s1  (n 2 - 1)s2    (n r - 1)s2
2
                        2                 r
n-r
Where
ni

(y
j 1
ij    yi  )
si 
ni  1                                38
STEPS 4-5
   Step 4: Calculate the p-value.

   Step 5: Write a conclusion.

39
ANOVA EXAMPLE
 A researcher would like to determine if three
drugs provide the same relief from pain.
 60 patients are randomly assigned to a treatment
(20 people in each treatment).

   Step 1: Formulate the Hypotheses
H0: μDrug A = μDrug B = μDrug C
Ha : The μi are not all equal.

40
STEPS 2-4
   JMP demonstration
Analyze  Fit Y By X
Y, Response: Pain
X, Factor: Drug

41
JMP OUTPUT AND CONCLUSION

 Step 5 Conclusion: There is strong evidence
that the drugs are not all the same.

42
FOLLOW-UP TEST
 The p-value of the overall F test indicates that
the level of pain is not the same for patients
taking drugs A, B and C.
 We would like to know which pairs of treatments
are different.
 One method is to use Tukey’s HSD (honestly
significant differences).

43
TUKEY TESTS
   Tukey’s test simultaneously tests
H 0 : i  i '
H a : i  i '
for all pairs of factor levels. Tukey’s HSD
controls the overall type I error.

JMP demonstration
Oneway Analysis of Pain By Drug 
Compare Means  All Pairs, Tukey HSD              44
JMP OUTPUT

 The JMP output shows that drugs A and C are
significantly different.
45
TWO-WAY ANALYSIS OF
VARIANCE
46
TWO-WAY ANOVA
 We are interested in the effect of two categorical
factors on the response.
 We are interested in whether either of the two
factors have an effect on the response and
whether there is an interaction effect.
   An interaction effect means that the effect on the
response of one factor depends on the level of the
other factor.

47
INTERACTION

No Interaction                                         Interaction

Factor B Low                                          Factor B Low
Factor B High                                         Factor B High
Response

Low              High                   Response   Low                 High
Factor A                                            Factor A

48
TWO-WAY ANOVA MODEL

yijk     i   j  (  ) ij   ijk
Where
yijk is the responseof the kth trial on the ith factor A level and the jth factor B level
 is the overall mean
 i is the main effect of the ith level of factor A
 j is the main effect of the jth level of factor B
(  ) ij is the interaction effect of the ith level of factor A and the jth level of factor B
 ijk ~ N (0,  2 )
i  1, , a
j  1,, b
k  1,...,nij
49
TWO-WAY ANOVA EXAMPLE
   We would like to determine the effect of two
alloys (low, high) and three cooling temperatures
(low, medium, high) on the strength of a wire.

   JMP demonstration
Analyze  Fit Model
Y: Strength
Highlight Alloy and Temp and click Macros 
Factorial to Degree

50
JMP OUTPUT

Conclusion: There is strong evidence of an
interaction between alloy and temperature.
51
ANALYSIS OF COVARIANCE
52
ANALYSIS OF COVARIANCE (ANCOVA)
 Covariates are variables that may affect the
response but cannot be controlled.
 Covariates are not of primary interest to the
researcher.
 We will look at an example with two covariates,
the model is

yij  i  covariates   ij

53
ANCOVA EXAMPLE
   Consider the one-way ANOVA example where we
tested whether the patients receiving different
drugs reported different levels of pain. Perhaps
age and gender may influence the pain. We can
use age and gender as covariates.

   JMP demonstration
Analyze  Fit Model
Y: Pain
Add: Drug
Age                                           54

Gender
JMP OUTPUT

55
CONCLUSION
   The one sample t-test allows us to test whether
the population mean of a group is equal to a
specified value.

   The two-sample t-test and paired t-test allow us
to determine if the population means of two
groups are different.

   ANOVA and ANCOVA methods allow us to
determine whether the population means of
several groups are statistically different.
56
SAS AND SPSS
   For information about using SAS and SPSS to do
ANOVA:

http://www.ats.ucla.edu/stat/sas/topics/anova.htm
http://www.ats.ucla.edu/stat/spss/topics/anova.htm

57
REFERENCES
   Fisher’s Irises Data (used in one sample and two
sample t-test examples).

   Flexibility data (paired t-test example):
Michael Sullivan III. Statistics Informed
Decisions Using Data. Upper Saddle River, New
Jersey: Pearson Education, 2004: 602.

58

DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 5 posted: 5/16/2012 language: Latin pages: 58
shensengvf http://
About