Defining and Measuring Variables (Part 3!!!) by FfB43W

VIEWS: 0 PAGES: 30

									Defining and Measuring
       Variables
       (Part 3!!!)


        PSYC 350

    September 13, 2010
 Outline: Selecting a measurement

1. Modalities of measurement
2. Scales/levels of measurement
3. Validity and reliability of measurement
4. Other aspects of measurement
       Reliability and Validity
How can we be sure that the measurements
 obtained from an operational definition
 actually represent the intangible construct
 we're interested in?

Researchers have developed two general
 criteria for evaluating the quality of
 measurement procedures.
        Reliability and Validity
• Reliability
  – The consistency or stability of a measure
  – Does it measure the same thing each time?

• Validity
  – The truthfulness of a measure
  – Does it measure what it intends to measure?
    Reliability of measurement
Reliability
  – The consistency or stability of a measure
  – Does it measure the same thing each time?


A measurement is reliable if repeated
  measurements of the same individual
  under the same conditions produce
  identical (or nearly identical) values.
    Reliability of measurement
Reliability

Includes the notion that each individual
  measurement has an element of error.

    Measured score = True score + Error

A measurement is reliable if measurement
  error is small
     Reliability of measurement
           Measured score = True score + Error

E.g., IQ score:

your measured score is determined partially by your level
  of intelligence (your true score) …

… but it is also influenced by other factors like your mood,
 health, luck in guessing (error).
     Reliability of measurement
       Measured score = True score + Error
As long as error is small, reliability is good.

  (e.g., IQ tests)

If the error component is large, the measurement
    is not reliable.
  (e.g., reaction time tests)
    Reliability of measurement
            Common Sources of Error
1. Observer error

  Simple human error (such as lack of precision)
  in measuring.

  E.g., four people with stopwatches recording the
  winner's time in a race -- differences in judgment
  and reaction time.
    Reliability of measurement
            Common Sources of Error
2. Environmental changes

  It's not really possible to measure the same
  individual at different times under identical
  circumstances.

  Small environmental changes can influence
  measurement (e.g., temperature, time of day,
  background noise)
    Reliability of measurement
            Common Sources of Error
3. Participant changes

  The participant can change between
  measurements

  E.g., mood, body temperature, hunger, fatigue
    Reliability of measurement
• Forms of reliability
  – Test-retest reliability
  – Split-half reliability
  – Inter-rater reliability
                       Reliability
• Test-retest reliability
   – A test of stability over time
   – Test people with exactly the same test (or equivalent
     version of the test) at two time points and compare
     scores
   – Potential problems
       • First testing contaminates second
       • Participants change over time
                     Reliability
• Split-half reliability
   – A measure of consistency within a measure
   – Relatedness of items on a test or questionnaire
   When we give a multi-item test, we assume that the
    different questions measure a part or aspect of the
    construct.
   If this is true, there should be some consistency among
       the items.
   Researchers split the set of items in half, and then
    evaluate whether they are in agreement.
                  Reliability
• Inter-rater reliability
  – A measure of agreement between two
    observers of the same behaviors
  – High reliability demonstrates strong definitions
    of behavioral categories
      Validity of measurement
Validity
  – The truthfulness of a measure
  – Does it measure what it intends to measure?


  E.g., IQ: Think of the absent-minded professor
    with an IQ of 160 who functions poorly in
    everyday life.
      Validity of measurement
Types of Validity
   – Face validity
   – Concurrent validity
   – Predictive validity
   – Construct validity
      • Convergent validity
      • Divergent validity
                                Validity
• Face validity
    – Appearance of validity
    – Based on subjective judgment
    – Difficult to quantify
    – Little scientific value

Potential problem with high face validity: you don't always want
  participants to know what you're measuring.
                  Validity
• Concurrent validity
  – Scores obtained from a new measure covary
    with (are consistent with) scores from an
    established measure of the same construct
But … caution: the fact that two sets of
  measurement are related doesn't mean that they
  are measuring the same thing.
  E.g., measuring people's height by weighing
    them
                         Validity
• Predictive validity
   – Scores obtained from a measure accurately predict
     future behavior

Most theories make predictions about how different values
  of a construct will affect behavior.

If people's behavior is consistent with the results of a test,
    the measure has predictive validity.

E.g., "need for achievement" score predicted
  children's behavior in a game
                           Validity
• Construct validity
   – How well a test assesses the underlying construct
   – Is this measurement of the variable consistent with all the things
     we already know about the variable and its relationship to other
     variables?
   E.g., measuring people's height by weighing them is ok in terms of
     concurrent validity (height and weight are correlated), but not in
     terms of construct validity (height is not influenced by food
     deprivation but weight is).
         Construct validity
Two types of Construct Validity


  • Convergent validity

  • Divergent validity
               Construct validity
• Convergent validity
  – Scores on the measure are related to scores
    on measures of related constructs
  – Use two methods to measure the same
    construct and then show that the scores are
    strongly related
  **Note: There is a difference between convergent validity. It's a subtle
     difference. Concurrent validity has to do with comparing a new measure
     to established measures. Please see p. 81 of your text.
           Construct validity
• Divergent validity
  – The test does not measure something it was
    not intended to measure
  – Demonstrate that we are measuring one
    specific construct (and not a mixture of two)
             Construct validity
•   Divergent validity
    – Use two methods to measure two
      constructs, then demonstrate:
    1. convergent validity for each construct
    2. no relationship between scores for the two
       constructs when measured by the same
       method.
               Construct validity
•    Convergent and divergent validity
    E.g.:
    –   want to measure aggression in children, but worried
        that your measure might reflect general activity level
    –   get observations of aggression and activity, and get
        teacher ratings of aggression and activity
    –   aggression measures should match, activity
        measures should match, the two observational
        measures should not match, and the two ratings
        measures should not match
             Reliability and validity
Indicate whether the researcher is interested in demonstrating reliability or
  validity. Specify the type of reliability or validity being described.

1.Dr. Chang has written a test to measure extroversion. He is worried
  that his test might simply measure social desirability (i.e., people will
  respond to the items the way they think they should respond, instead
  of being honest). To make sure his test does not measure social
  desirability, he gives 100 research participants his measure of
  extroversion and a measure of social desirability and calculates the
  correlation between the two scales. Dr. Change is trying to
  demonstrate ______________________________.

2.Dr. Diefenbacher has written two versions of her Religiosity Scale.
  Each version is 20 items in length. She gives all 40 items to 100
  research participants. Then she examines the correlation between
  each set of 20 items. Dr. Diefenbacher is interested in demonstrating
  _____________________________.
Other aspects of measurement
• Sensitivity and range effects
  – Ceiling effects
  – Floor effects
Other aspects of measurement
• Participant reactivity
  – Social desirability
  – Demand characteristics


• Observer bias
  – Expectancy effects
 Other aspects of measurement
• Participant reactivity
   – Social desirability
   – Demand characteristics

• Observer bias
   – Expectancy effects

• Potential solutions
   – Deception
   – Blinding (single-blind & double-blind studies)

								
To top