Threats to Internal and External Validity

Document Sample
Threats to Internal and External Validity Powered By Docstoc
					 Factors that threaten the
validity of research findings
 Material for this presentation has been
 taken from the seminal article by Don
 Campbell and Julian Stanley:

    Experimental and quasi-experimental designs
    for research on teaching,

 which was first published as Chapter 5 in
 N.L Page (1963), Ed., Handbook of
 Research on Teaching.
 Two classes of factors that jeopardize
   the validity of research findings
• Factors concerned with internal validity.
  – Do the research conditions warrant the
  – Without internal validity results are
• Factors concerned with external validity.
  – To what extent can the results be
  – To what populations, settings, treatment
    variables, and measurement variables?
     Factors affecting Internal
Internal validity is        • History
  threatened whenever       • Maturation
  there exists the          • Testing
  possibility of un-
  controlled extraneous     • Instrumentation
  variables that might      • Statistical
  otherwise account for       regression
  the results of a study.   • Selection
Eight classes of            • Research mortality
  extraneous variables      • Interactions w/
  can be identified.          selection
Specific events, in addition to the
 treatment, that occur between the first
 and second measurement.
The longer the interval between the
 pretest and posttest, the more viable
 this threat.
Changes in physical, intellectual, or
  emotional characteristics, that occur
  naturally over time, that influence the
  results of a research study.
In longitudinal studies, for instance,
  individuals grow older, become more
  sophisticated, maybe more set in there
Also called “pretest sensitization,” this
 refers to the effects of taking a test
 upon performance on a second testing.
Merely having been exposed to the
 pretest may influence performance on a
Testing becomes a more viable threat to
 internal validity as the time between
 pretest and posttest is shortened.
Changes in the way a test or other
 measuring instrument is calibrated that
 could account for results of a research
 study (different forms of a test can
 have different levels of difficulty).
This threat typically arises from
 unreliability in the measuring
Can also be present when using observers.
       Statistical Regression

Occurs when individuals are selected for
 an intervention or treatment on the
 basis of extreme scores on a pretest.
Extreme scores are more likely to reflect
 larger (positive or negative) errors in
 measurement (chance factors).
Such extreme measurement errors are
 NOT likely to occur on a second testing.
     Differential Selection
This can occur when intact groups are
   The groups may have been different to
   begin with.
   If three different classrooms are each
   exposed to a different intervention, the
   classroom performances may differ only
   because the groups were different to begin
Selection-Maturation Interaction
Occurs when differential selection is
 confounded with maturational effects.
The treatment group might be composed
 of higher aptitude students, or…
The treatment group might have more
 students who are born during the
 summer months.
          Research Mortality
The differential loss of individuals from
 treatment and/or comparison groups.
This is often a problem when research
 participants are volunteers.
    Volunteers may drop our of the study if
    they find it is consuming too much of their
    Other’s may drop out if they find the task
    to be too arduous.
 Interaction of Selection with the Other
   Factors Affecting Internal Validity
Occurs when intact groups, which may not
 be equivalent, are selected to
 participate in research interventions.
As in a previous example, three different
 classrooms may be exposed to different
 treatments, but one of the classroom
 might be composed of students having
 higher achievement trajectories.
           External Validity
Concerned with whether the results of a study
   can be generalized beyond the study itself:
  1. Population validity (when the sample does not
     adequately represent the population).
  2. Personological validity (when personal/
     psychological characteristics interact with the
  3. Ecological validity (when the situational
     characteristics of the study are not
     representative of the population).
    Factors affecting External
External validity is    • Reactive or
  threatened              interactive effects
  whenever conditions     of testing
  inherent in the       • Interaction effect
  research design are     of selection bias and
  such that the           the intervention.
  generalizability of
  the results is        • Reactive effects of
  limited.                treatment
Four classes of
  threats to external   • Multiple treatment
  validity can be         interference
   Reactive effect of testing

Occurs whenever a pretest increases or
 decreases the respondents’ sensitivity
 to the treatment.
Studies involving self-report measures of
 attitude and interest are very
 susceptible to this threat.
      Selection x Treatment
This can occur when selected treatment
 or comparison groups are more or less
 sensitive to the treatment prior to
 initiating the treatment (or
Most likely to occur when the treatment
 and comparison groups are not randomly
  Reactive Effects of Experimental
These can occur when the conditions of
 the study are such that the results are
 not likely to be replicated in non-
 experimental situations.
  –   Hawthorn effects
  –   John Henry effects
  –   Placebo effects
  –   Novelty effects
Multiple-treatment Interference

This has a likelihood of occurring
 whenever the same research
 participants are exposed to multiple
  – Sequence effects
  – Carry-over effects
         Research Designs
We will examine the operative threats to
 internal and external validity in twelve
 specific types of research designs.
Some symbols to be used:
   R = Random Assignment
   X = Treatment Intervention
   O = Observation or Measurement
Design 1: One-shot Case Study
This is a widely-used research design in
  – A single group receives a treatment or
  – Following the treatment individuals are
    measured on some outcome variable:
  – It can be diagramed as follows:

              X           O
               Design 1:
     One-shot Case Study, Continued

• This design is typical of a case study
• Inferences typically are based upon
  expectations of what the results would have
  been had X not occurred.
• These designs often are subject to the error
  of misplaced precision, since they often
  involve tedious collection of specific detail
  and careful observations.
• The problem is that there usually are
  numerous rival, plausible sources of effect on
  the outcome other than X.
                Design 2:
    One-group Pretest-Posttest Design
This, also, is a widely-used research design in
 education (see the diagram).
  A pretest is given, followed by a treatment or
    intervention, followed by a posttest.
  – The difference between O1 and O2 is used to infer
    an effect due to X.
  – This design is subject to four of the eight threats
    to internal validity and one of the threats to
    external validity. Can you name them?
            O1        X        O2
One-group Pretest-Posttest Design (Continued)
Threats to internal validity
  1.   History
         Many change-producing events may have occurred
         between O1 and O2 .
         History is more viable the longer the lapse between the
         pretest and posttest.
  2. Maturation
         During the time between O1 and O2 the individuals may
         have grown older, wiser, more tired, more wary, or more
  3. Testing
         The fact that the participants in the study were
         exposed to a pretest may, by itself, influence
         performance on the posttest.
One-group Pretest-Posttest Design (Continued)
Threats to internal validity (continued)
  4. Instrumentation
        If O1 and O2 are obtained from judges (or raters), for
        example, than the judges may become more skillful
        between the two sets of observations.
        Standardized achievement tests might be re-normed
        between pretesting and postesting.
  5. Statistical regression
        For example, if students are selected to participate in a
        remedial intervention because of extremely low scores
        on a pretest they are very likely, as a group, to score
        higher upon receiving the same (or similar) test as a
        This results mainly from errors in measurement (or
        unreliability in the tests).
             Design 3:
      Static-group Comparison
In this design (diagramed below) a non-random
    treatment group is compared to a non-
    random comparison group.
Problems associated with this design stem from
    the fact that that there is no way to
    substantiate that the treatment and
    comparison groups were equivalent to begin
                     X           O1
Static-group Comparison (Continued)
Threats to internal validity
  1.   Selection
         Here, intact groups, are being compared. It is possible
         that the treatment group was already prepared to do
         better (or worse) than the comparison group on O;
         hence the treatment group might have performed
         differently from the comparison group even in the
         absence of X.
  2. Mortality
         It is possible that differences between O1 and O2 are
         due to the fact that the nature of the treatment is
         such that participants drop out at higher rates than do
         participants in the comparison group.
Static-group Comparison (Continued)
Threats to internal validity (continued)
  3. Interactive effects (e.g., selections and
       It may be that one of the groups being
       compared has a higher (or lower) achievement
       trajectory (e.g., when a more advanced class is
       compared with a lesser-advanced class).
The three designs discussed so far are usually
   referred to as pre-experimental designs.
We will now turn to a consideration of three
   true experimental designs.
         True Experiments
• True experiments are characterized by
  random assignment:
  – Random assignment of individuals to
    treatment conditions.
  – Random assignments of treatment
    conditions to individuals.
• When comparison groups are large
  enough (usually, n > 20) and individuals
  are selected at random than
  representativeness can be assumed.
                   Design 4.
     Pretest-posttest Control Group Design

               R    O1      X      O3
               R    O2             O4
• Here, individuals are randomly assigned to one of two
  groups: the treatment group and a comparison group.
• The treatment group receives the intervention.
• The groups are compared in terms of their
  difference scores:

            (MO3- MO1 ) vs (MO4 – MO2)
  Pretest-posttest Control Group Design (Continued)

• This design, and the next two true-
  experimental designs, control for all eight of
  the threats to internal validity.
• Any differences between groups that might
  have existed prior to X are (assumed to be)
  controlled through random assignment.
• Any effects do to history, maturation,
  testing, instrumentation, regression and so on
  would be expected to occur with equal
  frequency in both groups.
  Pretest-posttest Control Group Design (Continued)

Factors effecting external validity:
  1.   Interactions between the treatment and testing.
         The occurs whenever the pretest sensitizes the
         treatment group to the effects of the treatment.
  2. Interactions between the treatment and group
         This can happen when the population from which the
         comparison group samples were selected is not the same
         as the target population.
  3. Reactive arrangements
         Sometimes the setting for the study is artificially
         restrictive. When this occurs generalizability suffers.
              Design 5.
      Solomon Four-group Design
This design enjoys several
   advantages.                      R O1 X O2
1. Both the main effect of
   testing and the interaction of   R O3     O4
   testing and treatment are
                                    R      X O5
2. There are multiple tests of      R        O6
   the effect of X:
O2>O1 ; O2 >O4 ; O5>O6 ; O5 >O3
                Design 6:
           Posttest-only Design
Pretests are not always necessary. Given randomization
  of subjects to treatment conditions we can assume
  that the groups were equivalent prior to the
  treatment intervention.
In this design all the threats to internal validity are
  controlled for.
As far as external validity is concerned we might still
  question whether there might be reactive effects.

                R        X      O1
                R               O2
           Design 8:
 Non-equivalent Pretest-Postest
 Most widely-used quasi-design in
 education research.
        O1 X O2

         O3                   O4
Used to determine (and adjust where
 necessary) whether the groups were
 equivalent before onset of treatment.
                               Design 7:
                           Time Series Design
     O   1       O3          O    5       O7          X    9      O11          O13           O    15      O17
     O   2       O4           O6          O8           X   10     O12           O14          O    16       O18
\                                                   \-------------------------------------------------------
\                                                   \
\                                                   \


O   2        O4         O6            O8          X   10       O12        O14           O    16        O18
       Design 9:
Counterbalanced Designs

 X  1    O   1     X  2     O  2     X  3      O  3

 X  3     O  4     X   1    O   5     X  2     O   6

 X  2     O  7     X   3    O   8     X   1    O   9
Treatment Reversal Design with

  R    O1 O3 X5 O7 X9 O11
  R    O2 O4 X6 O8 X10 O12
Treatment Reversal Design
  without Randomization

   O1 O3 X5 O7 X9 O11
   O2 O4 X6 O8 X10 O12
 Single (or few) Subject Designs
I certain types of situations these
  designs are very appropriate.
  When the target population is very small.
  Particularly applicable to clinical settings.
  When studying specific behaviors of unique
  Individuals serve as their own controls.
  When we want to show that an intervention
    can have an effect.
 Requirements of Single-Subject
External validity is often difficulty to
Internal validity requires three things:
  Repeated and reliable measurement.
    Valid and reliable measuring instruments (or
  Baseline stability.
  Single variable rule (manipulate only one
   variable at a time.)
                   Design 8:
            A-B-A Withdrawal Design
This design involves alternating phases of
 baseline observation and treatment
 intervention, X:

0         0          0          0 | X            0        X          0         X          0
__________________________________   ________________________________________________________

    Baseline Phase                                 Treatment Phase

During the treatment phase the
 intervention is turned on and off.
               Design 9:
      A-B-A Single Subject Design

  0 0 0 0 X X X X 0 0 0 0
 _____________________________   _______________________________   ____________________________

     Baseline Phase                  Treatment Phase                 Post-treatment

One problem with this design is that it is
 sometimes considered unethical to
 discontinue treatment when the
 treatment has been shown to be
            Design 10:
   A-B-A-B Single Subject Design

0 0 0 0 X X X X 0 0 0 0 X X X X
_________________   _____________________   __________________   _____________________

  Baseline            Treatment               Baseline             Treatment

The advantage is that it leaves an
 effective treatment in place.
Other Single-Subject Designs
There are a wide variety of single-subject
  Multiple baseline designs.
  Alternating treatment designs.
  Increasing/decreasing treatment
   intervention designs.
  Replicated single-subject designs.

Shared By: