Basic principles of probability theory

Document Sample
Basic principles of probability theory Powered By Docstoc
					                            Basics of ANOVA
•   Why ANOVA
•   Assumptions used in ANOVA
•   Various forms of ANOVA
•   Simple ANOVA tables
•   Interpretation of values in the table
•   R commands for ANOVA
•   Exercises
                                      Why ANOVA
If we have two samples then under mild conditions we can use t-test to test if difference between
       means is significant. When there are more than two sample then using t-test might become
ANalysis Of VAriances – ANOVA is designed to test differences between means in many
       sample cases.
Examples of ANOVA: Suppose that we want to test effect of various exercises on weight loss.
       We want to test 5 different exercises. We recruit 20 men and assign for each exercises four
       of them. After few weeks we record weight loss. Let us denote i=1,2,3,4,5 as exercise
       number and j=1,2,3,4 person’s number. Then Yij is weight loss for jth person on the ith
       exercise programme. It is one-way balanced ANOVA. One way because we have only one
       category (exercise programme). Balanced because we have exactly same number of men
       on each exercise programme.
Another example: Now we want to subdivide each exercises into 4 subcategories. For each
       subcategory of the exercise we recruit four men. We measure weight loss after few weeks.
       i – exercise category
      j – exercise subcategory
      k – kth men.
Then Yijk is weight loss for kth men in the jth subcategory of ith category. Number of
       observations is 5x4x4 = 80. It is two-fold nested ANOVA.
We want to test: a) There is no significant differences between categories; b) there is no
       significant difference between different subcategories
                                  Examples of ANOVA
One more example: We have 5 categories of exercises and 4 categories of diets. We hire for each
        exercise and category 4 persons. There will be 5x4x4=80 men. It is two way crossed
        ANOVA. Two-way because we have categorised men in two ways: exercises and diets.
        This model is also balanced: we have exactly same number of men for each exercise-diet.
       i – exercise number
       j – diet number
       k – kth person
       Yijk – kth person in the ith exercise and jth diet.
In this case we can have two different types of hypothesis testing. Assume that mean for each
        exercise-diet combination is ij. If we assume that model is additive, i.e. effects of exercise
        and diet add up then we have: ij = i+j. i is the effect of ith exercise and j is the effect
        of diet. Then we want to test following hypotheses: a) ij does not depend on exercise and
        b) ij does not depend on diet.
Sometimes we do not want to assume additivity. Then we want to test one more hypothesis:
        model is additive. If model is not additive then there might be some problems of
        interpretations with other hypotheses. In this case it might be useful to use transformation
        to make the model additive.
Models used for ANOVA can be made more and more complicated. We can design three, four
        ways crossed models or nested models. We can combine nested and crossed models
        together. Number of possible ANOVA models is very large.
ANOVA models are special cases of the linear models. We can write the model as:
                          Y με

Where Y is the observation vector,  -is vector of the means composed of the treatment means
     and  is the error vector. Basic assumptions in ANOVA models are:
1.   Expected values of the errors are 0
2.   Variance of all errors are equal to each other
3.   Errors are independent
4.   Errors are normally distributed

All ANOVA treatments are very sensitive to assumptions 1)-3). F-tests meant to be robust
     against the assumption 4). If assumptions 1)-3) are valid then 4) will always be valid at
     least asymptotically. I.e. for large number of the observations.
                                        ANOVA tables
Standard ANOVA tables look like

             effect       df     SSh    MS                 F                  prob
             v1           d1     SS1    MS1=SS1/d1         MS1/MSe            pr1
             ...          ...    ...    ...                …                  …
             vp           dp     SSp    MSp=SSp/dp         MSp/MSe            prp
             error        de     SSe    MSe=SSe/de
             total        N      SSt
Where v1,,,vp are values we want to test if they are 0. df is degrees of freedom corresponding to
      this value. SSh is sum of the squares corresponding to this value (h denotes hypothesis). F
      is F-value used for F distribution. Its degrees of freedom is (di,de). Prob is corresponding
      probability. If probability is very low then we reject hypothesis that this value is 0. If the
      value for prob is small enough then we not reject null-hypothesis.
These values are calculated using likelihood ratio test. Let us say we want to test hypothesis:
                                       H0: vi=0 vs H1:vi0
Then we maximise likelihood under null hypothesis find corresponding variance then we
      maximise the likelihood under alternative hypothesis and find corresponding variance.
      Then we calculate sum of the squares for null and alternative hypotheses and find F-
                                         LR test for ANOVA
Suppose variances are:
                2 for null hypothesis  2 for thealternative hypothesis
               ˆ                       ˆ

Then mean sum of the squares for the null and alternative hypotheses as:
                               2  2
                               ˆ    ˆ
                      SSh               and for the alternative hypothesis
                      SSe 

Since first sum of the squares is 2 with degrees of freedom dfh and the second sum of squares is
       2 with degrees of freedom dfe and they are independent then their ratio has F-distribution
       with degrees of freedom (dfh,dfe). Degrees of freedom of hypothesis is found using
       number of elements in the category-1 in the simplest case.
Using this type of ANOVA tables we can only tell if there is significant differences between
       means. It does not tell which one is significantly different.
This ratio has F distribution if null-hypothesis is true. Otherwise it has non-central F-distribution.
Degree of freedom of hypothesis is defined by number of constraints it implies. Degree of
       freedom of error is as usual number of observations minus number of parameters
                                Example: Two way ANOVA
Let us consider an example taken from Box, Hunter and Hunter. Experiment was done
      on animals. Survival times of the animals for various poisons and treatment was
      tested. Table is:
          A       B      C      D
 I        0.31    0.82    0.43      0.45
          0.45    1.10     0.45     0.71
          0.46    0.88    0.63      0.66
          0.43    0.72    0.76      0.62

II        0.36    0.92       0.44   0.56
           0.29   0.61       0.35   1.02
           0.40   0.49       0.31   0.71
          0.23    1.24       0.40   0.38

III       0.22     0.30      0.23    0.30
          0.21     0.37      0.25    0.36
          0.18     0.38      0.24    0.31
          0.23    0.29       0.22    0.33
                                      ANOVA table
ANOVA table produced by R:
           Df Sum Sq           Mean Sq      F value   Pr(>F)
pois        2 1.03828          0.51914      22.5135   4.551e-07 ***
treat      3  0.92569          0.30856      13.3814   5.057e-06 ***
pois:treat 6  0.25580          0.04263       1.8489    0.1170
Residuals 36 0.83013           0.02306

Most important values are F and Pr(>F).
In this table we have tests for pois. and treat. Moreover we have “interaction” between these
       two categories. Interaction means that it would be difficult to separate effects of these
       two categories. They should be considered simultaneously. Pr. for interaction is not very
       small and it is not large enough to discard interaction effects. In these situations
       transformation of the variables might help. Let us consider ANOVA table for the
       transformed observations. Let us use transformation 1/y. Now ANOVA table looks like:
           Df    Sum Sq Mean Sq            F value    Pr(>F)
pois        2    34.903     17.452        72.2347      2.501e-13 ***
treat      3     20.449      6.816        28.2131      1.457e-09 ***
pois:treat 6      1.579      0.263         1.0892       0.3874
Residuals 36      8.697      0.242
                                     ANOVA table
According to this table Pr. corresponding to the interaction term is high. It means that
     interaction for the transformed variables is not significant. We could reject interaction
     terms. We can build the ANOVA table without the interactions. It will look like:

         Df Sum Sq        Mean Sq          F value      Pr(>F)
pois      2 34.903        17.452           71.326       3.124e-14 ***
treat     3 20.449         6.816           27.858       4.456e-10 ***
Residuals 42 10.276       0.245

Now we can say that there is significant differences between poisons as well as treatments.

Sometimes it is wise to use transformation to reduce effect of interactions. For this several
     different transformations (inverse, inverse square, log) could be used. For each of them
     ANOVA tables could be built. Then by inspection you can decide which transformation
     gives better results. Following argument could be used to justify transformation. If
     effects of two different categories is multiplicative then log of them will have additive
     effect. It is easier to interpret additive effects than others.
                              R commands for ANOVA
There are basically two type of commands in R. First is to fit general linear model and second is analyse
Command to fit linear model is lm and is used
Formula defines design matrix. See help for formula. For example for PlantGrowth data (available in R) we
    can use
data(PlantGrowth)       - load data into R from standard package
lmPlant = lm(PlantGrowth$weight~PlantGrowth$group)

Then linear model will be fitted into data and result will be stored in lmPlant
Now we can analyse them
anova(lmPlant) will give ANOVA table.
If there are more than one factor (category) then for two-way crossed we can use
lm(data~f1*f2) - It will fit complete model with interactions
lm(data~f1+f2) - It will fit only additive model
lm(data~f1+f1:f2) - It will fit f1 and interaction between f1 and f2. It is used for nested models.
Other useful commands for linear model and analysis are
summary(lmPlant) – give summary after fitting
plot(lmPlant)       - plot several useful plots

Please let me know if any of the results is not clear then we can discuss and try sort out the problems.
                    R commands for ANOVA

Another useful command for ANOVA is

This command gives confidence intervals for some of the coefficients and
   therefore differences between effects of different factors.
To find confidence intervals between any two given effects one can use
                           Bootstrap for ANOVA
Algorithm for bootstrap:

1)   Use lm to fit the model into the data
2)   Resample residuals and add to the fitted values of observations
3)   Use lm to fit model into the data
4)   Save coefficients
5)   Repeat steps 2-4 (for around 200-2000 times)
6)   Build distributions for each coefficients and other statistics of interest

Then for confidence intervals one can use:
lm1 = sort(lmBoot$coefficients[2,])
lmlow = lm1(round(lm1[0.025*l]))
lmhigh = lm1(round(lm1(0.975*l]))
It will give lower and upper limits of 95% confidence intervals
example of implementation is:
                                   Exercise 2
a)     Analyse these data using ANOVA
What do you think about the differences.

What do you think about differences?

1.   Stuart, A., Ord, KJ, Arnold, S (1999) Kendall’s advanced theory of
     statistics, Volume 2A
2.   Box, GEP, Hunter, WG, Hunter, JS (1978) Statistics for