Homework 2 - PowerPoint

Document Sample
Homework 2 - PowerPoint Powered By Docstoc
					Psychology 215:
Statistics for Social Science



        Kate Bezrukova
 bezrukov@camden.rutgers.edu

         Introduction
               Introduction
• Our Increasingly Quantitative World!!!
• Data are not just numbers! but numbers
  that carry information
• Purposes:
  – producing trustworthy data
  – analyzing data to make their meaning clear,
    and
  – drawing practical conclusions from data
     Basic Idea of Statistics:

To make inferences about a population using data from only
a sample

          population



                            inference about population
                            (using statistical tools)


          sample of data
      Overview of Course
• Methods of data collection
  – summarizing data
• Visualizing data
  – correlation and association
• Standard scores and Normal Curve
  – statistical inference
• Statistical tests
  – necessary skills
Variables
       Statistics vs. Parameters
•   sample – statistics
    c – carries uncertainty
•   population – parameter
     – carries no uncertainty

is there any interest in a group larger than one you
    have?
1. if “yes” - sample
2. if “no” - population
                      Terms
• Data - numbers collected for purpose in a
  particular context
  – Case/observational unit
• Variability – fundamental principle.
• Variables – any characteristics of a person/thing
  that can be assigned a number or category
  – measurement (continuous) variables – assumes a
    range of numerical values
  – categorical variables – simply records a category
    designation.
            Types of Variables
Variables = Aspect of a testing condition that can
            change or take on different
            characteristics with different conditions.

           • Dependent Variable (DV)
           • Independent Variable (IV)
           • Extraneous Variable
              – Confounded Variables
Graphs
            Distributions
• data distribution – pattern of
  variability.
  – the center of a distribution
  – the ranges
  – the shapes
• simple frequency distributions
• grouped frequency distributions
  – midpoint
Graphic Presentation of Data
  the frequency polygon
  (quantitative data)



  the histogram
  (quantitative data)



  the bar graph
  (qualitative data)
                  Distributions
• Bell-Shaped (also
  known as symmetric”
  or “normal”)

• Skewed:
  – positively (skewed to
    the right) – it tails off
    toward larger values
  – negatively (skewed to
    the left) – it tails off
    toward smaller values
Measures of Central
    Tendency
                                  Mean
          Sc
c=
            N
S       an instruction “to add”     c    a score of observations


c        mean                       N    number of observations



    characteristics:

1.     S (c c ) = 0
2.     S  (c-c )2= minimum, “least squares”
            Median & Mode
• MEDIAN divides a distribution of scores
  (always arrange in order!) into two parts
  that are equal in size
• median location:
  – if uneven N – actual score
  – if even N – mean of the middle two scores
• MODE is the most common value, i.e., the
  most frequently occurring score.
Simple Frequency Distributions
    raw-score distribution      frequency distribution

name                   X             f     X
Student1               20            3     15
Student2               23            2     20
Student3               15            2     21
Student4               21            1     23
Student5               15
                                             Sfc
Student6               21    Mean     c=
                                               N
Student7               15
Student8               20
Measures of Variability
 Measures of Spread: Range
is simply a numerical distance b/w the
highest score and the lowest score

– how many hours did you sleep last night?


– how much sleep do you usually get on a
  typical night?
         Interquartile range
  tells the range of scores that make up
  the middle 50 percent of the
  distribution.
• IQR = 75th percentile – 25th percentile
• a location value (.25 x N)
  – similar to finding the median
• interpretation: “the middle 50 percent of
  the scores have values from XXX to
  YYY”
               Deviation Scores
      = raw score – mean
     for samples: X – X
     for populations: X – 


1.   if X > X, positive deviation scores
2.   if X < X, negative deviation scores
3.   if X = X, deviation scores = 0

     interpretation: the number of points that a particular
     score deviates from the mean (S (X – X) = 0).
             Standard Deviation
    s - is used to describe the variability of population (s is
    parameter)
    s - is used to estimate s from a sample of the
    population (s is statistics)
    S - is used to describe the variability of a sample when
    we have no desire to estimate s. (S – statistics)

–      Choosing the correct SD:
      1.   how the data was gathered -- was sampling used?
      2.   generalization – purpose of the data?
–      Formulas:
      •    deviation-score formula
      •    raw-score formula
        Deviation-Score Method
                S (X- )2                               S (X- X )2
       s=                                          S=
                    N                                      N
where        s = standard deviation of population
            S = standard deviation of sample
            N = number of scores
  1.   find a deviation score for each raw score
  2.   square the deviation score
  3.   add them up
  4.   divide this sum by N and find a square
       root
         Raw-Score Method
           SX2- (SX)2
s=                     N
                  N
where S X2 = sum of the squared scores

        (S X)2 = square of the sum of the raw scores
           N = number of scores
1.   square the sum of the raw scores and divide by N
2.   square each score and add them up
3.   subtract 1 from 2
4.   divide this difference by N and find a square root
Standard Deviation (Sample)
       S (X- X     )2                    SX2- (SX)2
S=                              S=             N
         N-1                                 N-1

                   S    fX2- ( SfX)2
           S=               N
                          N-1
 - SfX2 -squaring, multiplying, summing
 - (SfX)2 - multiplying, summing, squaring
                 Variance
• the number before taking the square root
        S(X-)2                S (x-x)2
   s2 =    N             S 2= N-1

•   analysis of variance
•   N – population
•   N – 1 – sample
•   S(x-x)2 – sum of squares
Other Descriptive Statistics
Combination Statistics: z scores
• standard scores are usually used to compare two scores
  from different distributions
• positive z scores = raw scores > mean
• negative z scores = raw scores < mean
• the absolute value of the z score |z score| tells the
  number of standard deviations the score is from the
  mean
          X-X
  z=
           S
Sz = 0 because S (X – X) = 0
  How much difference is there?
      Effect Size Index “d”
• describes the size of the difference between
  two distributions

    |1- 2|
 d=                   always a positive number!
       s
       small effects           d = .20
       medium effects          d = .50
       large effects           d = .80
      “huge”, “half the size of small”,
      “somewhat larger than”,           spooled = (s12+ s22) / 2]
      “intermediate between”
   Descriptive statistics report:
             Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean
- the skew of the distribution:
   positive skew: mean > median & high-score whisker is longer
   negative skew: mean < median & low-score whisker is longer

Report:
1) boxplot (draw)
2) effect size index (calculate)
3) story (write): a) central tendency, b) form of the distributions,
   c) overlap of distributions, d) interpretation of the effect size index
 EXAMPLE: A descriptive statistics
    report on “Marriage Ages”
1.   The graph shows boxplots of marriage ages of women and men.

                    +++++++++insert your boxplot here+++++++++


2.   The mean age of women is 35 years old; the median is 33 years
     old. The mean age of men is 38.64 years old; the median is 34
     years old.
3.   The marriage ages for both women and men are positively
     skewed. More women and men get married at the younger age*.
4.   Although the two distributions overlap, the middle 50% of the men
     are somewhat older than the middle 50% of the women.
5.   The difference in means produces an effects size index of .32.
     This value is somewhat larger than a small effect size.
Correlation and Regression
             Correlation Coefficient
a correlation coefficient (r) provides a quantitative way to express the
degree of linear relationship between two variables.

• Range: r is always between -1 and 1
• Sign of correlation indicates direction:
       - high with high and
         low with low -> positive
       - high with low and
         low with high -> negative
      - no consistent pattern -> near zero

• Magnitude (absolute value) indicates
strength (-.9 is just as strong as .9)
         .10 to .40 weak
         .40 to .80 moderate
         .80 to .99 high
         1.00 perfect
Pearson product – moment correlation
        coefficient: Formulas
Invented by Carl Pearson and used to describe the strength of the linear
relationship between two variables that are both either ratio or interval
variables.

Definitional Formula
          S ( zx zy )
r=            N
where      r = Pearson product-moment correlation coefficient
          zx = a z score for variable X
          zy = a z score for variable Y
           N = number of pairs of X and Y values

Computational Formulas

                                        raw-score formula

The rule of thumb
Correlation coefficient should be based on at least 30 observations
Pearson product – moment correlation
   coefficient: Blanched Formula
            ΣΧΥ - (Χ)(Υ)
             N
 r=
                (Sx) (Sy)

 where
 X and Y are paired observations
 XY = product of each X value multiplied by its paired Y value
 X =mean of variable X
 Y =mean of variable Y
 Sx =standard deviation of X distribution
 Sy =standard deviation of Y distribution
 N = number of paired observations
 Correlation Coefficient: Limitations
1. Correlation coefficient is appropriate measure of
   association only when relationship is linear
2. Correlation coefficient is appropriate measure of
   association when equal ranges of scores in the sample
   and in the population (truncated range)
3. Correlation doesn't imply causality
   –   Using U.S. cities a cases, there is a strong positive correlation
       between the number of churches and the incidence of violent
       crime
   –   Does this mean churches cause violent crime, or violent crime
       causes more churches to be built?
   –   More likely, both related to population of city (3d variable --
       lurking or confounding variable)
 Coefficient of Determination
r2 tells the proportion of variance that two
variables in a bivariate distribution have in
common.
Introduction to Linear Regression
  linear regression is the statistical procedure of
  estimating the linear relationship between Y and X

Y = a + bX,
  where X and Y are variables representing scores on the Y and X
  axes,
  b = slope of the regression line
  a = intercept of the line with Y axes

  positive slope -- if the highest point on the line is to the right of the
  lowest point
  negative slope – if the highest point on the line is to the left of the
  lowest point
   Regression Coefficients
      Sy
b = r Sx, where
  r = correlation coefficient for X and Y
  Sy = standard deviation of the Y variable
  Sx = standard deviation of the X variable
a = Y – bX, where
  Y = mean of the Y scores
  b = regression coefficient
  X = mean of the X scores
  Introduction to Probability
What are the odds that a card drawn at random
from a deck of cards will be an ace?




the probability is 4/52
         Theoretical Distributions and
           Theoretical Probabilities
• probabilities range from .000 to 1
• the expression p = .077 means that there are 7.7 chances in 100 of
  the event to occur.
 4


 3


 2


 1



           Ace 2    3   4   5   6    7   8   9   10   J   Q   K
Theoretical and Empirical
      Distributions


• theoretical distributions
  (logic, math)
• empirical distributions
  (observation)
  The Standard Normal Distribution
1. the mean, median and mode
   are the same score on the X
   axis where the curve peaks
2. the area to the left or right of
   the line = 50% of the total
   area
3. the tails of the curve are
   asymptotic to the X axis –
   they never cross the axis but
   continue in both directions
   indefinitely
4. the inflection points are
   where the curve is the
   steepest (-1s and +1s)
     The Normal Distribution table
       (Table C in Appendix C)

  The Table C can be used to determine areas
  (proportions) of a normal distributions and obtain the
  probability figures.
• Column A contains a z score
• Column B -- the area b/w the mean and the z-score
• Column C -- the area beyond the z score

• all the proportions hold for - z because the curve is
  symmetrical
EXAMPLE: The College Frisbee Golf Course
    On the average, it takes 27 throws to complete the
    College Frisbee Golf Course. The standard deviation
    about this mean is 4.
 Questions that we can answer:
   1. finding the proportion of a population that has
        scores of a particular size (e.g., what proportion of
        population would be expected to score 22 or less?)
       – interpretation: there are XX.XX chances in 100
           that players would have 22 or less throws.
   2. finding the score that separates the population
        into two proportions (e.g., if 950 students played
        and a prize was given for scoring 20 or less, how
        many would get prizes?)
       – interpretation: XX of the 950 people would get
           prizes.
     Example: The College Frisbee Golf
              Course. Cont’d
3.  finding the extremes scores in a Population (e.g.,
    suppose an experimenter wanted to find some very
    good Frisbee players to use in an experiment. She
    decided to use the top 10 percent from the CFGC.
    What is the cut-off score?)
   • interpretation: in order to be considered as a very
       good Frisbee player, you need to have a XXX score
       or higher.
4. finding the proportion of the population between
    two scores (e.g., what proportion would score
    between 25 and 30?)
   • interpretation: there are XX.XX chances in 100 that
       players would score between 25 and 30.
Samples, Sampling Distributions,
   and Confidence Intervals
                        Sampling
     The essential idea of sampling is to learn about
     the whole by studying a part
     Two important terms:
    1.   population – the entire group of people or objects
    2.   sample -- a (typically small) part of the population


•    Biased Sampling -- a tendency to systematically
     overrepresent / underrepresent certain segments
     of the population
    – convenience samples
    – voluntary response
           Random Samples
• Random sample – any method that allows every
  possible sample of size N an equal chance to be
  selected.
• A method of getting a random sample -- a table of
  random numbers (Table B in Appendix C):
   – each position is equally likely to be occupied by any one of
     the digits 0,1,2,3,4,5,6,7,8,9
   – the occupant of any one position has no impact on the
     occupant of any other position
   Sampling Distributions: Describing
     the Relationships b/w X and 
• expected value is the mean of a
  sampling distribution
• standard error is the standard
  deviation

• the sampling distribution of the
  mean:
    – every sample is drawn randomly from
      a specific population
    – the sample size (N) is the same for
      all samples
    – the number of samples is very large
    – the mean (X) is calculated for each
      sample
    – the sample means are arranged into
      a frequency distribution
        Central Limit Theorem
• The sampling distribution of the mean
  approaches a normal curve as N increases.
• If you know , and s for the population:
  – the mean of the sampling distribution (expected
    value) = ,
  – the standard deviation of the sampling distribution
    (standard error) = s.
  symbols:
   sx – the standard error of the mean
   E (X) – the expected value of the mean
Calculating the Standard Error
         of the Mean
Determining Probabilities About
       Sample Means
              X–
     z=        sx

A test of computer anxiety has a population mean of 65 and a
standard deviation of 12. A random sample of size 36 is drawn from
the population with a sample mean of 68. What is the probability of
selecting a sample with a smaller mean?

1.   Compute the standard error of the mean: 12/ SQRT(36) = 12/6 = 2
2.   Compute a standard score for the mean of 68: Z = (68 - 65)/2 = 3/2 = +1.5

3.   The area between the mean and a Z-score of +1.5 is .4332

4.   Thus, the probability of a score below 68 is .5000 + .4332 = .9332
                  The t distribution
• when you do not know s and you don’t have population data to
  calculate it – use s as an estimate of s.
• t distribution – depends on the sample size with different
  distributions for each N.
    – degree of freedom: df = N – 1, ranges from 1 to ∞
   – t values are similar to the z scores used with the normal curve
   – as df increases, the t distribution approaches the normal distribution


 Table D in Appendix C:
   – first column shows the degree of freedom
   – the top row is for confidence intervals
   – each column is associated with a percent of probability
             Confidence Intervals
• a confidence interval establishes an interval within which a
  population parameter is expected to lie (with a certain degree of
  confidence)
• confidence intervals for population means produce an interval
  statistic (lower and upper limits) that is destined to contain  95
  percent of the time

   LL = X – t(sx)
   UL = X + t(sx)
        where X is the mean of the sample from the population
                  t is a value from the t distribution table
                  sx is the standard error of the mean, calculated from a
                  sample
    EXAMPLE: Confidence Intervals
    There are several published tests of self-efficacy. Suppose that the
    population mean for a particular test is 20. A therapy class taught by
    a graduate student had just finished the term. A psychologist wanted
    to determine the effect of the course on students’ sense of self-
    efficacy. The following statistics were produced. Use a 95 percent
    confidence interval to analyze the data. Write a conclusion about the
    effect of the class.
•   N = 36
   SX = 684
   SX2 = 13,136
•   Interpretation: The population mean for the test is 20, and the 95
    percent confidence interval about the mean of all those taking the
    workshop is entirely below 20. You can conclude (with 95 percent
    confidence) that the workshop was not sufficient to increase self-
    efficacy.
Hypothesis Testing: One-
   Sample Designs
              Hypothesis Testing
•    the Frito-Lay® company claims that bags of Doritos®
     tortilla chips contain 269.3 grams. When you buy a bag
     of chips, how much do you get?
    0 - a parameter that describes a population (company
     standard)
    1 - a parameter that describes a population of actual
     weights
•    there are three possible relationships b/w 1 and 0:
    1.   1 = 0
    2.   1 < 0
    3.   1 > 0
                             Outline
1.    gather sample data from the population you are
      interested in and calculate a statistic
2.    recognize two logical possibilities for the population:
     1.   H0: a statement specifying an exact value for the parameter of
          the population
     2.   H1: a statement specifying all other values for the parameter
3.    using a sampling distribution that assumes H0 is
      correct, find the probability of the statistic you
      calculated
4.    if the probability is very small, reject Ho and accept H1.
      If the probability is large, retain both H0 and H1,
      acknowledging that the data do not allow you to reject
      H0
   Establishing a significance level:
           Setting Alpha (a)
• significance level is the choice
  of a probability level
• rejection region


• critical values: For example,
  t.05 (14 df) = 2.145
    – the sampling distribution that
      was used (t)
     a level (.05)
    – degree of freedom (14)
    – critical value from the table
      (2.145, for a two-tailed test)
       The one-sample t test
       X – 0
t=       sx     ; df = N – 1 , where
X is the mean of the sample
0 is the hypothesized mean of the
    population
sx is the standard error of the mean
      EXAMPLE 1: “Doritos bags”,
          one-sample t-test
1.   the population mean is that of all bags; the sample
     mean (N = 8) is 270.675 grams
2.   H0: the mean weight of Doritos bags is 269.3 grams,
     the weight claimed by the Frito-Lay company.
     H1: the mean weight of Doritos bags is not 269.3
     grams
3.   the t distribution for 7 df (N-1) (Table D in Appendix C)
4.   if the probability is low – reject H0, meaning that the
     mean of Doritos bags is NOT 269.3 grams
     if the probability is high – reject H1, meaning that the
     mean weight of Doritos bags could be 269.3 grams
 EXAMPLE 2: “Misinformation test”
      one-sample t-test
• A psychologist who taught the introductory
  psychology course always gave her class a
  "misinformation test" on the first day of class.
  Her test contained some commonly held
  incorrect beliefs about psychology. Over the
  years the mean number of errors on this test had
  been 21.50. The data for this year follow.
  Analyze the data with a t test calculate the effect
  size index, and write a conclusion.
X = 198        X2 = 4700       N = 11
    Effect size index: How big is a
              difference?
          |X – 0|
d =          s
where   X is the mean of the sample
      0 is the mean specified by the null hypothesis
      s is the standard deviation of the null hypothesis
  population
• small effect     d = .20
• medium effect    d = .50
• large effect     d = .80
        Type I and Type II errors
Four possibilities:
1.   The H0 is true but test rejects it ( Type I error)
2.   The H0 is false but test accepts it (Type II error)
3.   The H0 is true and test accepts it ( correct decision )
4.   The H0 is false and test rejects it (correct decision)

•    the probability of Type I error is denoted by a (alpha)
•    the probability of Type II error is denoted by b (beta)
           One- and two-tailed tests
1.       a two-tailed test of
         significance:
     •      the sample is from a
            population with a mean less
            than/greater than that of the
            Ho
2.       a one-tailed test of
         significance:
     •      the sample is from a
            population with a mean less
            than that of the Ho
     •      the sample is from a
            population with a mean
            greater than that of the Ho
a, Type I error and p-value??

a is the probability of a Type I error (e.g., a =
  .05)
a Type I error refers to when we mistakenly
  reject Ho.
p-value is the probability of obtaining the
  sample statistic actually obtained, if Ho is
  true
 Testing the statistical significance
     of correlation coefficients
• definition formula:
     r-r
  t = sr ; sr = √(1-r2)/(N-2)

• working formula:
           N-2
  t = (r)√ 1-r2 ; df = N – 2, where N = number of pairs

• Two uses of the t distribution:
   – to test a sample mean against a hypothesized population mean
   – to test a sample correlation coefficient against a hypothesized
     population correlation coefficient of .00
Hypothesis Testing: Two-
   Sample Designs
                                    Logic
1.       begin with two logical possibilities:
     •      Ho: A= no A -- treatment A does not have an effect, the difference
            between population means is zero
     •      H1: A= no A – treatment A does have an effect, the difference
            between population means is not zero
2.       assume that Treatment A has no effect (that is, assume Ho)
3.       decide on an alpha level (e.g., a = .05).
4.       choose an appropriate inferential statistical test:
     •      test statistic (e.g., mean)
     •      a sampling distribution of the test statistic (when Ho is true) (e.g.,
            t distribution)
     •      a critical value for the alpha level
5.       calculate a test statistic using the sample data
6.       compare a test statistic to the critical value from the sampling
         distribution
7.       write a conclusion
Independent - sample designs
 H0: 1 = 2
 H1: 1 = 2
        X1 – X2
 t=     sX1 – X2      where,
 sX1 – X2 is the standard error of a difference

 df = N1 + N2 – 2

 assumptions:
 1.  the DV scores are normally distributed and have equal
     variances
 2.  the samples are randomly assigned/selected
        EXAMPLES: Two-sample t-tests,
            independent design
Example “Achievement Test Scores” (equal
N's design)
"Achievement test scores are declining all
around us," brooded Professor Probity. "Not in
my class," vouchsafed Professor Paragon.
"Here are my final-exam scores for last year
and this year on the same exam. Run your own
t test on them. Calculate the effect size index."
What did Probity discover?

Example “Yummi Very Vanilla” (unequal N's
design)
An experimenter randomly divided a group of
volunteers into two groups. One group fasted
for 24 hours and the other for 48 hours. One
person in the 24-hour group dropped out of the
study. Scores below represent the number of
ounces of Yummi (Very Vanilla) consumed
during the first 10 minutes after the fast.
Perform a t test, calculate d, and write a
sentence summary of your results.
    Paired (correlated) - samples
              designs
• natural pairs – we do not assign the participants to one
  group or the other
• matched pairs – we do assign the participants to one
  group or the other
• repeated measures – more than one measure is taken
  on each participant (e.g., before-and-after experiment)

       X-Y
• t = sD       where,
• sD = √sX2 + sY2 – 2rXY (sX)(sY)
  df = N -1, where N = number of pairs
EXAMPLE: “Rats Learning a Simple Maze”
    two-sample t-test, paired design

•   A comparative psychologist was interested in the effect of X-irradiation
    on learning. A group of rats learned a simple maze and were paired on
    the basis of the number of errors. One group was X-irradiated; then both
    groups learned a new, more complicated maze, and errors were
    recorded. Identify the independent and dependent variables. Test for a
    difference between the two groups. Find the effect size index. Write a
    conclusion for the study.
               Effect size index
• effect size index for independent samples:
   |X1 – X2|            when N1 is equal N2:
d = s-hat                           s-hat = √N1(sX1 – X2)
                         when N1 is not equal N2:
                                               s-hat12(df1) + s-hat22(df2)
                                     s-hat = √        df1 + df2
• effect size index for paired (correlated) samples:
    |X – Y|
d = s-hatd                 s-hatd = √N(sD), where N is the number
                                            of pairs of participants
• interpretation of d:
     d = .20 – small effect
     d = .50 – medium effect
     d = .80 – large effect
 Establishing a confidence interval
     about a mean difference

• confidence interval for independent samples:
   – LL = (X1 – X2) – t (sX1-X2)
   – UL = (X1 – X2) + t (sX1-X2)
   – interpretation: we can expect, with 95 percent confidence, that
     the true difference between [use the terms of your experiment] is
     between XX to AA.
• confidence interval for correlated samples:
   – LL = (X – Y) – t (sD)
   – UL = (X – Y) + t (sD)
                       Power
• Power = 1 – b
• Factors that affect the power:
  – effect size
  – the standard error of a difference:
     • sample size
     • sample variability
  – alpha (a)
  To allocate plenty of power, use large N’s!
Analysis of Variance: One-
   Way Classification
    Example: “Social status and attitudes
       toward fate”, one-way ANOVA
•   The following hypothetical data
    are scores from a test which
    measures a person's attitudes          Lower Class   Middle Class   Upper Class
    toward fate. (An example of such
    a test is Rotter's internal-external      12            13              6
    scale.) A high score indicates that       10              4             5
    the person views fate as being out
    of his or her control. Low scores         11              9             8
    indicate that he or she feels
    directly responsible for what              9              6             2
    happens. Test for differences              7            12              2
    among the three social classes.
•   Fill in:                                   3            10              1
•   Independent variable_________
•   Dependent variable_________
 Analysis of Variance: One-Way
          Classification
    estimate of s2    variation between treatment means
F = estimate of s2          variation within treatments

1. F is a ratio of two estimates of the population variance
   – a between-treatments variance over an error
   variance.
2. when Ho is true, both variances are good estimators of
   the population variance (ratio is about 1)
3. when Ho is not true, the between – treatments
   variance over-estimates the population variance (ratio
   is greater than 1)
                New Terms
• Sum of squares (SS) – sum of squared
  deviations. S(X – X)2
• Mean square (MS) is the ANOVA term for a
  variance s-hat2
• Grand mean – is the mean of all scores
• tot – the symbol stands for all such numbers in
  the experiment (SXtot)
• t – the symbol applies to a treatment group (SXt)
• K – is the number of treatments in the
  experiment (number of levels of the IV)
• F test – is the result of an analysis of variance
Factorial ANOVA
          Analysis of Variance: Factorial
                      Design
•         Suppose you wanted to see if the number of bedrooms as well as a house’s style
          and the interaction between a style and number of bedrooms has an effect on the
          house price.

             traditional   modern                             1 year           30 years
1 BR                250K      300K       275K      1 BR                 279K       189K           234K
5 BR                500K      550K       525K      5 BR                 500K       250K           375K
                    375K      425K                                389.5K          219.5K

    600                                  1 BR        600                                         1 BR
                                         5 BR                                                    5 BR
    500                                              500


    400                                              400


    300                                              300


    200                                              200


    100                                              100


     0                                                0
                1                    2                         1 year                 30 years
   Factorial Design Notation
• shorthand notations for factorial designs:
  “2 x 2”, “3 x 2” ,etc.
                            Factor A
                   A1           A2
                   1 year       30 years

            B1     Cell A1B1    Cell A2B1
            1 BR        279K         189K   234K
 Factor B
            B2     Cell A1B2    Cell A2B2
            5 BR        500K         250K   375K

                      389.5K       219.5K
Advantages of two-way ANOVA:

 • valuable resources can be spent
   more efficiently by studying two
   factors simultaneously rather
   than separately
 • we can investigate interactions
   between factors
                Exercise 1
• Identify the response variable, factors,
  state the number of levels for each factor,
  and the total number of observations (N).
  (a) A study of productivity of tomato plant
  compares four varieties of tomatoes and
  two types of fertilizer. Five plants of each
  variety are grown with each type of
  fertilizer. The yield in pounds of tomatoes
  is recorded for each plant.
                 Exercise 2
• Identify the response variable, factors, state the
  number of levels for each factor, and the total
  number of observations (N).
  (b) A marketing experiment compares six
  different types of packaging for a laundry
  detergent. A survey is conducted to determine
  the attractiveness of the packaging in four
  different parts of the country. Each type of
  packaging is shown to 30 different consumers in
  each part of the country, who rate the
  attractiveness of the product on a 1 to 10 scale.
               Exercise 3
• Identify the response variable, factors,
  state the number of levels for each factor,
  and the total number of observations (N).
  (c) To compare the effectiveness of three
  different weight-loss programs, five men
  and five women are randomly assigned to
  each other. At the end of the program, the
  weight loss for each of the participants is
  recorded.
                  Exercise 4
• Numbers in the cell are means based on 8
  scores for each cell.




• Data Set is an example of:
  a. a simple ANOVA
  b. a 2 x 2 factorial ANOVA
  c. a 3 x 3 factorial ANOVA
                 Exercise 5
• The figure that shows an interaction is
             Growth and age
• Imagine a two-way factorial design to study the
  following scientific hypothesis: “Toddlers get
  taller; adults don’t.” Here’s a quick summary:
       Response: Height in inches
       Factor 1: Age groups – 2-year-olds and
  adults
       Factor 2: Time – at the start of the study, and
  three years later
Make a two-way table summarizing the results you
  would expect to get from this study. Then draw
  and label a graph showing the interaction
  pattern.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:69
posted:3/5/2010
language:English
pages:90