Hypothesis testing by yantingting


									Hypothesis testing

 Behavioural Science II
Week 1, Semester 2, 2002
      Hypothesis testing
• Null hypothesis is that there is no
  systematic relationship between
  independent variables (IVs) and
  dependent variables (DVs).
• Research hypothesis is that any
  relationship observed in the data is

               Behavioural Science II    2
       Hypothesis testing
• Whereas research hypothesis tends to be
  imprecise about numerical differences
  between groups (e.g., difference in
  reaction times), null hypothesis states
  very specifically that difference should be

                  Behavioural Science II        3
   Null hypothesis versus
   alternative hypothesis
• The null hypothesis assumes that
  scores for different levels of the IV
  are random samples from the same
• The alternative hypothesis is that
  samples come from different

                Behavioural Science II    4
    Null hypothesis versus
    alternative hypothesis
• For any single experiment, we are bound
  to see a difference, just as we see a
  difference between the means of two
  random samples in a distribution of
  sample means.
• If the null hypothesis is true, then
  differences in mean scores are just two
  random samples from the same

                Behavioural Science II      5
Testing the null hypothesis
• A statistical test assesses the
  probability of obtaining a given
  sample or samples of scores,
  assuming the null hypothesis is

               Behavioural Science II   6
 Testing the null hypothesis
• If the probability is low enough (e.g.,
  p<.05), then the null hypothesis is rejected
  in favour of the alternative (research)
  hypothesis, and the IV is deemed to have
  a systematic effect.
• If the probability is not sufficiently low
  (e.g., p>.05), then the null hypothesis is
  not rejected but retained, and the IV is
  deemed to have no effect (i.e., the
  observed changes are due to chance).
                  Behavioural Science II     7
    Statistical significance
• Statistical significance refers to the
  probability of the data obtained, given that
  the null hypothesis is true.
• A statistically significant result does not
  mean that the null hypothesis is
• There is an ongoing gap between
  statistical significance and substantive

                  Behavioural Science II     8
   Hypothesis testing and
   sampling distributions
• The decision to reject or not reject
  the null hypothesis usually is made
  with reference to the sampling
  distribution of a statistic of some
  kind (e.g., z-distribution, t-

               Behavioural Science II    9
   Example of hypothesis
 testing using z-distribution
• Null hypothesis population
   = 15
• Random sample statistics
  Mean = 110

               Behavioural Science II   10
        Applying formulae
               15 15
    X             5
            N    9 3

         X  X     110 100 10
    Z                        2
           X          5      5
• Given that z-score of 1.96 = p< .05 (two-
  tailed), would reject null hypothesis.
                  Behavioural Science II      11
   Example of hypothesis
 testing using t-distribution
• Null hypothesis population
• Random sample statistics
  Mean = 110
  ∑x2 = 960

               Behavioural Science II   12
           Applying formulae
Given that t-
  scores of  ˜
                    x 2

                              960 960
                                       10.95
  2.306 (df=8)     N 1       9 1   8
  =p< .05
  reject the  
                     ˜        10.95 10.95
                ˜X                       3.65
  null               N           9    3

                 X  X         110 100   10
            t                                 2.74
                  ˜X              3.65     3.65
                       Behavioural Science II    13
  Hypothesis testing using
    confidence intervals
• We reject null hypothesis when null
  population mean lies outside the
  confidence interval.
• We infer alternative population mean is
  higher than null population mean if lower
  limit of confidence intervals is to right of
  null population mean and lower if upper
  limit of confidence intervals is to left of
  null population mean.
                  Behavioural Science II         14
Errors in hypothesis testing
• Given the gap between statistical and
  substantive significance, a decision
  based on probability to retain or
  reject the null hypothesis can be

               Behavioural Science II   15
   When null hypothesis is
     true (Type I error)
• When null hypothesis is true, and it
  is rejected, this decision is called a
  Type 1 error.
• The probability of making such an
  error is designated alpha () and is
  equivalent to the significance level
  (e.g., p<.05).

                Behavioural Science II     16
   When null hypothesis is
     true (Type I error)
• If null hypothesis is true and alpha level is
  set at .05, then the null hypothesis will be
  rejected 5% of time even though it is true.
• One way to safeguard against a Type I
  error is to set a more stringent alpha level
  (e.g., p<.01).

                  Behavioural Science II      17
   When null hypothesis is
  false (Type II or III errors)
• When alternative hypothesis is true,
  and the statistic (mean) from
  alternative distribution falls within
  cut-off points (i.e., p>.05), then null
  hypothesis would be retained.

                Behavioural Science II      18
             Type II error
• Retaining null hypothesis when alternative
  hypothesis is true is called a Type II error.
• The probability of making a Type II error
  usually is symbolized as beta ().
• The probability of beta depends on how
  much the alternative hypothesis sampling
  distribution overlaps the retention region
  of the null hypothesis sampling

                  Behavioural Science II      19
             Type III error
• It is also possible to make a Type III error,
  by rejecting a null hypothesis but inferring
  the incorrect alternative hypothesis.
• The probability of making a Type III error
  usually is symbolized as gamma () and is
  equivalent to whatever percentage of
  scores in the alternative distribution falls
  in the far end of the null hypothesis
  distribution. The probability of making a
  Type III error is usually quite small.
                  Behavioural Science II      20
      The power of a test
• The probability of rejecting a false
  null hypothesis and correctly
  inferring the position or direction of
  the alternative hypothesis with
  respect to the null hypothesis.
• Factors affecting power and error

                Behavioural Science II     21
    Power is affected by
  significance (alpha) level
• Setting a less stringent significance
  level increases the discriminatory
  power of the statistical test and
  increases power as long as the
  alternative hypothesis is true.

               Behavioural Science II     22
Power is affected by magnitude of
difference between sample means
• So, increasing the difference in the
  size of the mean at differing levels of
  the IV increases the power of the

                Behavioural Science II   23
 Power is affected by sample size

• An increase in sample size increases
  the power of the test, if the
  alternative hypothesis is true.
• This is because as sample size
  increases, the standard error of the
  mean decreases, thus reducing the
  overlap between the null and
  alternative hypotheses.
               Behavioural Science II   24
             Effect size
• In order to gauge the effect of the IV,
  it makes sense to contrast the
  difference between the population
  mean for the null hypothesis and the
  population mean for the alternative

                Behavioural Science II   25
      Effect size formula
                       0  1
      Eff ect_ size 
• where
•  is standard deviation of population
  of dependent measure scores.

               Behavioural Science II   26
      Judging effect sizes
• According to Cohen (1988)
  .20 = small effect size
  .50 = medium effect size
  .80 = large effect size

                  Behavioural Science II   27
 Do we really need the null
• A significant test of the null
  hypothesis does not mean the data
  are not a product of chance.
• The significant result may simply be
  a Type I error (falsely rejecting null

                Behavioural Science II     28
 Do we really need the null
• Better to test research hypothesis, if
  know size and direction of effect.
• Even better report combination of
  outcome values (e.g., effect sizes,
  confidence intervals, strength of

                Behavioural Science II     29
One-tailed versus two-tailed
• Conventionally reject null hypothesis if
  obtained z-score or t-score falls beyond
  certain values in either tail of the relevant
  sampling distribution (i.e., a two-tailed
• In specific contexts, a one-tailed test
  might seem appropriate (e.g., reject null
  hypothesis only if test statistic fell in 5%
  left-hand tail of distribution.
                   Behavioural Science II         30
One-tailed versus two-tailed
• Generally, two-tailed tests are preferred to
  one-tailed tests.
• The IV may have an effect in opposite
  direction to the one predicted.

                  Behavioural Science II     31

To top