Validity and reliability

Document Sample
Validity and reliability Powered By Docstoc
					Validity and reliability

        Chapter 8
This term refers to the appropriateness,
 meaningfulness, correctness, and
 usefulness of inferences a researcher
Reliability refers to the consistency of scores
 or answers from one administration of an
 instrument to another, or from one set of
 items to another. A reliable instrument
 yields similar results if given to a similar
 population at different times.
Researchers make inferences from their
 studies. That is, they use the results to
 make decisions or judgments about what
 they were trying to find out.
     Content-related evidence
Content-related evidence of validity focuses
 on the content and format of an
 instrument. Is it appropriate?
 Comprehensive? Is it logical? How do the
 items or questions represent the content?
 Is the format appropriate?
     Criterion-related evidence
This refers to the relationship between the
 scores obtained using the instrument and
 the scores obtained using one or more
 other instruments or measures. For
 example, are students‟ scores on teacher
 made tests consistent with their scores on
 standardized tests in the same subject
     Construct-related evidence
This refers to the psychological construct or
 characteristic being measured. For
 example, if one is looking at problem
 solving in leaders, how well does a
 particular instrument explain the
 relationship between being able to
 problem solve and effectiveness as a
Elements of content-related evidence

Adequacy of sampling: the size and scope
  of the questions must be large enough to
  cover the topic.
Format of the instrument: Clarity of
  printing, type size, adequacy of work area,
  appropriateness of language, clarity of
  directions, etc.
 How to achieve content validity
Consult other experts who rate the items.
 Rate items, eliminating or changing those
 that do not meet the specified content.
 Repeat until all raters agree on the
 questions and answers.
      Criterion-related validity
To obtain criterion-related validity,
 researchers identify a characteristic,
 assess it using one instrument (e.g., IQ
 test) and compare the score with
 performance on an external measure,
 such as GPA or an achievement test.
Predictive and concurrent validity
Predictive validity is used to look at validity
  over time. Researchers administer one
  assessment, allow time to lapse, then
  compare the earlier score to another
  measure of performance.
Concurrent validity looks at information
  gathered from two sources at the same
         Correlation coefficient
A correlation coefficient, symbolized with the letter
  r, indicated the degree of relationship between
  individuals‟ scores on two instruments. All
  correlation coefficients fall between +1.00 and -
  1.00, with a strong positive correlation indicating
  a high score on one instrument and high score
  on another, or a low score on one and a low
  score on the other. A negative correlation
  indicates a high score on one paired with a low
  one on the other.
           Validity coefficient
A validity coefficient is obtained by
  correlating a set of scores on one test (a
  predictor) with a set of scores on another
  (the criterion). The degree to which the
  predictor and the criterion relate is the
  validity coefficient. A predictor that has a
  strong relationship to a criterion test
  would have a high coefficient.
          Expectancy tables
This is a two way chart with predictor
 categories listed down the left-hand side
 of the chart and the criterion categories
 listed horizontally along the top of the
 Expectancy table
           Avg. # of Yrs.
           Teaching Experience
New VP‟s   ‟00 ‟01 ‟02 „03
 Male       12 11 9       8
 Female     14 13 11 12
      Construct-related validity
This type of validity is more typically
 associated with research studies than
 testing. It relates to psychological traits,
 so multiple sources are used to collect
 evidence. Often times a combination of
 observation, surveys, focus groups, and
 other measures are used to identify how
 much of the trait being measured is
 possessed by the observee.
This refers to the consistency of scores
 obtained from one instrument to another,
 or from the same instrument over
 different groups.
       Errors of measurement
Every test or instrument has associated with
  it errors of measurement. These can be
  due to a number of things: testing
  conditions, student health or motivation,
  test anxiety, etc. Test developers work
  hard to try to ensure that their errors are
  not grounded in flaws with the test itself.
         Reliability coefficient
This is a number that tells us how likely one
 instrument is to be consistent over
 repeated administrations.
          Reliability Methods
Test-retest: Same test to same group
Equivalent-forms: A different form of the
  same instrument is given to the same
  group of individuals
Internal consistency: Split-half procedure
Kuder-Richardson: Mathematically
  computes reliability from the # of items,
  the mean, and the standard deviation of
  the test.
       Alpha or Cronbach‟s alpha
   This test is used on instruments where
    answers aren‟t scored “right” and “wrong”.
    It is often used to test the reliability of
    survey instruments.
 Standard error of the measurement

This is a calculation that shows the extent to
 which a measurement would vary under
 changed circumstances. In other words, it
 tells you how much of the error is due to
 issues related to measuring.
           Qualitative Research
   Many qualitative researchers contend that
    validity and reliability are irrelevant to
    their work because they study one
    phenomenon and don‟t seek to generalize.
    Fraenkel and Wallen contend that any
    instrument or design used to collect data
    should be credible and backed by
    evidence consistent with quantitative
Internal Validity

     Chapter 9
   Validity can be used in three ways. It can
    refer to instrument or measurement
    validity, as we just discussed, external or
    generalization validity, as presented in
    Chapter 6, or internal validity, which
    means that the relationship a researcher
    observes between two variables should be
    clear in its meaning rather than due to
    something that is unclear (“something
       What is “something else”?
 Any one (or more) of these conditions:
 Age or ability of subjects
 Conditions under which the study was
 Type of materials used in the study
     Technically,the “something else” is called a
     threat to internal validity.
        Threats to internal validity
   Subject characteristics
   Loss of subjects
   Location
   Instrumentation
   Testing
   History
   Maturation
   Attitude of subjects
   Regression
   Implementation
          Subject characteristics
   Subject characteristics can pose a threat if
    there is selection bias, or if there are
    unintended factors present within or
    among groups selected for a study. For
    example, in group studies, members may
    differ on the basis of age, gender, ability,
    socioeconomic background, etc. They
    must be controlled for in order to ensure
    that the key variables in the study, not
    these, explain differences.
       Subject characteristics
 Age            Intelligence
 Strength       Vocabulary
 Maturity       Reading ability
 Gender         Fluency
 Ethnicity      Manual dexterity
 Coordination   Socioeconomic status
 Speed          Religious/political belief
       Loss of subjects (mortality)
   Loss of subjects limits generalizability, but it can
    also affect internal validity if the subjects who
    don‟t respond or participate are over
    represented in a group. E.g., If you were
    conducting a study to examine college students‟
    participation in on-campus extracurricular
    activities and a substantial proportion of non-
    dorming students didn‟t respond, your results
    would be skewed toward the likely participants.
   The place where data collection occurs,
    aka “location” might pose a threat. For
    example, hot, noisy, unpleasant conditions
    might affect test scores; situations where
    privacy is important for the results, but
    where people are streaming in and out of
    the room, might pose a threat.
 Decay: If the nature of the instrument or
  the scoring procedure is changed in some
  way, instrument decay occurs.
 Data Collector Characteristics: The person
  collecting data can affect the outcome.
 Data Collector Bias: The data collector
  might hold an opinion that is at odds with
  respondents and it affects the
   In longitudinal studies, data are often
    collected through more than one
    administration of a test. If the previous
    test influences subsequent ones by getting
    the subject to engage in learning or some
    other behavior that he or she might not
    otherwise have done, there is a testing
   If an unanticipated or unplanned event
    occurs prior to a study or intervention,
    there might be a history threat.
           Attitude of subjects
   Sometimes the very fact of being studied
    influences subjects. The best known
    example of this is the Hawthorne Effect.
   In testing this is known as regression
    toward the mean. It means that low and
    high performing groups, over time, tend
    to score more towards the average on
    subsequent tests regardless of what
    happens in the meantime.
   This threat can be caused by various
    things; different data collectors, teachers,
    conditions in treatment, method bias, etc.
         Minimizing Threats
 Standardize conditions of study
 Obtain more information on subjects
 Obtain as much information on details of
  the study: location, history,
  instrumentation, subject attitude,
 Choose an appropriate design
 Train data collectors

Shared By: