Reliability by wuyunyi

VIEWS: 56 PAGES: 31

									                      Validity

• In our last class, we began to discuss some of the
  ways in which we can assess the quality of our
  measurements.
• We discussed the concept of reliability (i.e., the
  degree to which measurements are free of random
  error).
  Why reliability alone is not enough

• Understanding the degree to which measurements
  are reliable, however, is not sufficient for evaluating
  their quality.
• In-class scale example
   – Recall that test-retest estimates of reliability tend
     to range between 0 (low reliability) and 1 (high
     reliability)
   – Note: An on-line correlation calculator is available
     at http://easycalculation.com/statistics/correlation.php
                      Validity

• In this example, the measurements appear reliable,
  but there is a problem . . .
• Validity reflects the degree to which measurements
  are free of both random error, E, and systematic
  error, S.
• O=T+E+S
• Systematic errors reflect the influence of any non-
  random factor beyond what we’re attempting to
  measure.
     Validity: Does systematic error
               accumulate?
• Question: If we sum or average multiple observations
  (i.e., using a multiple indicators approach), how will
  systematic errors influence our estimates of the ―true‖
  score?
   Validity: Does error accumulate?

• Answer: Unlike random errors, systematic errors
  accumulate.
• Systematic errors exert a constant source of
  influence on measurements. We will always
  overestimate (or underestimate) T if systematic error
  is present.
               O=   T    +   E   +   S
      Obs. 1   12   10       0       +2
      Obs. 2   12   10       0       +2
      Obs. 3   12   10       0       +2
      Obs. 4   12   10       0       +2
      Obs. 5   12   10       0       +2
      Obs. 6   12   10       0       +2
      Obs. 7   12   10       0       +2
     Average   12   10       0       +2



Note: Each measurement is 2 points higher than the
true value of 10. The errors do no average out.
                   O=   T    +   E    +   S
          Obs. 1   12   10        0       +2
          Obs. 2   11   10       -1       +2
          Obs. 3   12   10        0       +2
          Obs. 4   13   10       +1       +2
          Obs. 5   10   10       -2       +2
          Obs. 6   12   10        0       +2
          Obs. 7   14   10       +2       +2
         Average   12   10        0       +2



Note: Even when random error is present, E averages
to 0 but S does not. Thus, we have reliable measures
that have validity problems.
         Validity: Ensuring validity

• What can we do to minimize the impact of systematic
  errors?
• One way to minimize their impact is to use a variety
  of indicators—different sources of information.
• Different kinds of indicators of a latent variable may
  not share the same systematic errors
• If true, then S will behave like random error across
  measurements (but not within measurements)
                     Example

• As an example, let’s consider the measurement of
  self-esteem.
   – Some methods, such as self-report
     questionnaires, may lead people to over-estimate
     their self-esteem. Most people want to think
     highly of themselves.
   – Other methods, such as clinical ratings by
     trained observers, may lead to under-estimates of
     self-esteem. Clinicians, for example, may be
     prone to assume that people are not as well-off as
     they say they are.
                      O=   T    +   E    +   S
           Method 1
  Self-     Obs. 1    13   10       +1       +2
            Obs. 2    12   10        0       +2
reports
            Obs. 3    12   10        0       +2
            Obs. 4    11   10       -1       +2
           Method 2
Clinical    Obs. 5    10   10       +2       -2
            Obs. 6     8   10        0       -2
 ratings    Obs. 7     8   10        0       -2
            Obs. 8     6   10       -2       -2
           Average    10   10        0        0


   Note: Method 1 systematically overestimates T whereas
   Method 2 systematically underestimates T. In
   combination, however, those systematic errors cancel
   out.
               Another example

• One problem with the use of self-report questionnaire
  rating scales is that some people tend to give high (or
  low) answers consistently (i.e., regardless of the
  question being asked).
• This is sometimes referred to as a ―yay-saying‖ or
  ―nay-saying‖ bias.
    1 = strongly disagree | 5 = strongly agree

Item                           T    S      O
                                                 In this example, we
I think I am a worthwhile      4    +1     5
person.
                                                 have someone with
                                                 relatively high self-
I have high self-esteem.       4    +1     5
                                                 esteem, but this
                                                 person
I am confident in my ability   4    +1     5
to meet challenges in life.                      systematically rates
My friends and family value    4    +1     5
                                                 questions one point
me as a person.                                  higher than he or
 Average score:                4    +1     5     she should.
    1 = strongly disagree | 5 = strongly agree

Item                            T   S      O     If we ―reverse key‖ half
I think I am a worthwhile       4   +1     5     of the items, the bias
person.
                                                 averages out.
I have high self-esteem.        4   +1     5
                                                 Responses to reverse
I am NOT confident in my        2   +1     3     keyed items are
ability to meet challenges in
life.
                                                 counted in the opposite
My friends and family DO        2   +1     3     direction.
NOT value me as a person.
                                                 T:
 Average score:                 4   +1     4
                                                 (4 + 4 + [6-2] + [6-2]) / 4 = 4
                                                 O:
                                                 (5 + 5 + [6-3] + [6-3]) / 4 = 4
                      Validity

• To the extent to which a measure has validity, we say
  that it measures what it is supposed to measure
• Question: How do you assess validity?




                      ** Very tough question to answer! **
 Different ways to think about validity

• To the extent that a measure has validity, we can say
  that it measures what it is supposed to measure.

• There are different reasons for measuring
  psychological variables. The precise way in which
  we assess validity depends on the reason that we’re
  taking the measurements in the first place.
                    Prediction

• As an example, if one’s goal is to develop a way to
  determine who is at risk for developing
  schizophrenia, one’s goal is prediction.
              Predictive Validity

• We may begin by obtaining a group of people who
  have schizophrenia and a group of people who do
  not.
• Then, we may try to figure out which kinds of
  antecedent variables differentiate the two groups.
                                     Correct classifications




Lost a parent before the age of 10            10%




Parent or grandparent had                     50%
schizophrenia


Mother was cold and aloof to the              15%
person when he or she was a child
               Predictive Validity

• In short, some of these variables appear to be better
  than others at discriminating schizophrenics from
  non-schizophrenics
• The degree to which a measure can predict what it is
  supposed to predict is called it’s predictive validity.
• When we are taking measurements for the purpose
  of prediction, we assess validity as the degree to
  which those predictions are accurate or useful.
                               Reality: Schizophrenic
                               No                Yes
Measure: Schizophrenic


                                40               10
                         No




                               10               40
                         Yes




  80% ( [40 + 40] / 100) people were correctly classified
  (50% base rate)
                               Reality: Schizophrenic
                               No                Yes
Measure: Schizophrenic


                                10               10
                         No




                               40               40
                         Yes




  50% ( [40 + 10] / 100) people were correctly classified
  (with a 50% base rate. Yuck.)
                                       Reality: Schizophrenic
                                       No                   Yes
        Measure: Schizophrenic


                                        98                   0
                                 No




                                        1                   1
                                 Yes




99% ( [98 + 1] / 100) people were correctly classified, but note the base rate problem.
Cohen’s kappa is used to account for this problem. Kappa in this example is 66%
              Construct Validity

• Sometimes we’re not interested in measuring
  something just for ―technological‖ purposes, such as
  prediction.
• We may be interested in measuring a construct in
  order to learn more about it
   – Example: We may be interested in measuring self-
     esteem not because we want to predict something
     with the measure per se, but because we want to
     know how self-esteem develops, whether it
     develops differently for males and females, etc.
              Construct Validity

• Notice that this is much different than what we were
  discussing before. In our schizophrenia example, it
  doesn’t matter whether our measure of schizophrenia
  really measured schizophrenic tendencies per se.
• As long as the measure helps us predict
  schizophrenia well, we don’t really care what it
  measures.
              Construct Validity

• When we are interested in the theoretical construct
  per se, however, the issue of exactly what is being
  measured becomes much more important.
• The general strategy for assessing construct
  validity involves (a) explicating the theoretical
  relations among relevant variables and (b) examining
  the degree to which the measure of the construct
  relates to things that it should and fails to relate to
  things that it should not.
            Nomological Network

• The nomological            achieve
  network represents        in school
  the interrelations                                 ability to
  among variables              +                 +    cope
  involving the construct
  of interest.                           self-
                                        esteem


                                         -
                            distrust
                            friends
    Nomological Network & Validity

• The process of assessing construct validity basically
  involves determining the degree to which our
  measure of the construct behaves in the way
  assumed by the theoretical network in which it is
  embedded.
• If, theoretically, people with high self-esteem should
  be more likely to succeed in school, then our
  measure of self-esteem should be able to predict
  people’s grades in school.
              Construct Validity

• Notice here that establishing construct validity
  involves prediction. The difference between
  prediction in this context and prediction in the
  previous context is that we are no longer trying to
  predict school performance as best as we possibly
  can.
• Our measure of self-esteem should only predict
  performance to the degree to which we would expect
  these two variables to be related theoretically.
             Discriminant Validity

• The measure should            achieve
  also fail to be related to   in school
  variables that,                                       ability to
  theoretically, are              +                 +    cope
  unrelated to self-esteem.
                                            self-
• The ability of a measure                 esteem
  to fail to predict
  irrelevant variables is                   -            like
  referred to as the                                    coffee
  measure’s discriminant       distrust
                               friends
  validity.
        Validity: Assessing validity

• Finally, it is useful, but not necessary, for a measure
  to have face validity.
• Face validity: The degree to which a measure
  appears to measuring what it is supposed to
  measure.
• A questionnaire item designed to measure self-
  esteem that reads ―I have high self-esteem‖ has face
  validity. An item that reads ―I like cabbage in my
  Frosted Flakes‖ does not.
• In the context of prediction, face validity doesn’t
  matter. In the context of construct validity, it matters
  more.
  A Final Note on Construct Validity

• The process of establishing construct validity is one
  of the primary enterprises of psychological research.
• When we are measuring the association between two
  variables to assess a measure’s predictive or
  discriminant validity, we are evaluating both (a) the
  quality of the measure and (b) the soundness of the
  nomological network.
• It is not unusual for researchers to refine the
  nomological network as they learn more about how
  various measures are inter-related.

								
To top