Validity and Reliability - PowerPoint

Document Sample
Validity and Reliability - PowerPoint Powered By Docstoc
					Think in terms of ‘the
purpose of tests’ and
the ‘consistency’ with
which the purpose is
fulfilled/met



Validity and Reliability

                Neither Valid       Reliable
                nor Reliable        but not
                                    Valid




                 Fairly Valid but   Valid &
                 not very           Reliable
                 Reliable
                                               Validity
   Depends on the PURPOSE
   E.g. a ruler may be a valid measuring device for
    length, but isn’t very valid for measuring volume
   Measuring what ‘it’ is supposed to
   Matter of degree (how valid?)
   Specific to a particular purpose!
   Must be inferred from evidence; cannot be directly
    measured
   Learning outcomes
    1.   Content coverage (relevance?)
    2.   Level & type of student engagement (cognitive, affective,
         psychomotor) – appropriate?
                                  Reliability
   Consistency in the type of result a test yields
       Time & space
       participants
   Not perfectly similar result but ‘very close-to’
    being similar
   When someone says you are a ‘reliable’
    person, what do they really mean?
   Are you a reliable person? 
                    What do you think…?
   Forced-choice assessment forms are high in reliability, but weak
    in validity (true/false)
   Performance-based assessment forms are high in both validity
    and reliability (true/false)
   A test item is said to be unreliable when most students answered
    the item wrongly (true/false)
   When a test contains items that do not represent the content
    covered during instruction, it is known as an unreliable test
    (true/false)
   Test items that do not successfully measure the intended
    learning outcomes (objectives) are invalid items (true/false)
   Assessment that does not represent student learning well
    enough are definitely invalid and unreliable (true/false)
   A valid test can sometimes be unreliable (true/false)
     If a test is valid, it is reliable! (by-product)
                           Question…

In the context of what you understand about
  VALIDITY and RELIABILITY, how do you go
  about establishing/ensuring them in your own
  test papers?
                  Indicators of quality
   Validity
   Reliability
   Utility
   Fairness

Question: how are they all inter-related?
           Types of validity measures
        Face validity
        Construct validity
        Content validity
        Criterion validity
    1.    Predictive
    2.    Concurrent
        Consequences validity
                                        Face Validity
   Does it appear to measure what it is supposed to
    measure?

   Example: Let’s say you are interested in measuring,
    ‘Propensity towards violence and aggression’. By simply
    looking at the following items, state which ones qualify to
    measure the variable of interest:
       Have you been arrested?
       Have you been involved in physical fighting?
       Do you get angry easily?
       Do you sleep with your socks on?
       Is it hard to control your anger?
       Do you enjoy playing sports?
                                     Construct Validity
       Does the test measure the ‘human’
        CHARACTERISTIC(s) it is supposed to?
       Examples of constructs or ‘human’ characteristics:
           Mathematical reasoning
           Verbal reasoning
           Musical ability
           Spatial ability
           Mechanical aptitude
           Motivation
       Applicable to PBA/authentic assessment
       Each construct is broken down into its component parts
       E.g. ‘motivation’ can be broken down to:
           Interest
           Attention span
           Hours spent
           Assignments undertaken and submitted, etc.
        All of these sub-constructs put together – measure ‘motivation’
                          Content Validity
   How well elements of the test relate to the content
    domain?
   How closely content of questions in the test relates
    to content of the curriculum?
   Directly relates to instructional objectives and the
    fulfillment of the same!
   Major concern for achievement tests (where content
    is emphasized)
   Can you test students on things they have not been
    taught?
               How to establish Content
                               Validity?
        Instructional objectives (looking at your list)
        Table of Specification
        E.g.
        At the end of the chapter, the student will be able
         to do the following:
    1.    Explain what ‘stars’ are
    2.    Discuss the type of stars and galaxies in our universe
    3.    Categorize different constellations by looking at the stars
    4.    Differentiate between our stars, the sun, and all other
          stars
Table of Specification (An Example)


Content areas        Categories of Performance (Mental
                                   Skills)

                    Knowledge Comprehension Analysis     Total
1. What are
   ‘stars’?
2. Our star, the
   Sun
3. Constellations
4. Galaxies
           Total                                         Grand
                                                         Total
                          Criterion Validity
   The degree to which content on a test (predictor)
    correlates with performance on relevant criterion
    measures (concrete criterion in the "real" world?)
   If they do correlate highly, it means that the test
    (predictor) is a valid one!
   E.g. if you taught skills relating to ‘public speaking’
    and had students do a test on it, the test can be
    validated by looking at how it relates to actual
    performance (public speaking) of students inside or
    outside of the classroom
Two Types of Criterion Validity
   Concurrent Criterion Validity = how well performance
    on a test estimates current performance on some valued
    measure (criterion)? (e.g. test of dictionary skills can
    estimate students’ current skills in the actual use of
    dictionary – observation)

   Predictive Criterion Validity = how well performance
    on a test predicts future performance on some valued
    measure (criterion)? (e.g. reading readiness test might
    be used to predict students’ achievement in reading)

   Both are only possible IF the predictors are VALID
             Consequences Validity
   The extent to which the assessment served
    its intended purpose
   Did the test improve performance?
    Motivation? Independent learning?
   Did it distort the focus of instruction?
   Did it encourage or discourage creativity?
    Exploration? Higher order thinking?
    Factors that can lower Validity
   Unclear directions
   Difficult reading vocabulary and sentence structure
   Ambiguity in statements
   Inadequate time limits
   Inappropriate level of difficulty
   Poorly constructed test items
   Test items inappropriate for the outcomes being measured
   Tests that are too short
   Improper arrangement of items (complex to easy?)
   Identifiable patterns of answers
   Teaching
   Administration and scoring
   Students
   Nature of criterion
                                     Reliability
   Measure of consistency of test results from one
    administration of the test to the next

   Generalizability – consistency (interwoven concepts)
    – if a test item is reliable, it can be correlated with
    other items to collectively measure a construct or
    content mastery

   A component of validity

   Length of assessment
                        Measuring Reliability
 Test – retest
Give the same test twice to the same group with any
  time interval between tests
 Equivalent forms (similar in content, difficulty level, arrangement, type of
    assessment, etc.)
Give two forms of the test to the same group in close
   succession
 Split-half
Test has two equivalent halves. Give test once, score
   two equivalent halves (odd items vs. even items)
 Cronbach Alpha (SPSS)
Inter-item consistency – one test – one administration
 Inter-rater Consistency (subjective scoring)
Calculate percent of exact agreement by using
   Pearson's product moment and find out the
   coefficient of determination (SPSS)
       How to improve Reliability?
   Quality of items; concise statements,
    homogenous words (some sort of uniformity)
   Adequate sampling of content domain;
    comprehensiveness of items
   Longer assessment – less distorted by
    chance factors
   Developing a scoring plan (esp. for subjective
    items – rubrics)
   Ensure VALIDITY

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:50
posted:11/10/2011
language:English
pages:19