Validity _ Reliability by dandanhuanghuang


									Validity and Reliability
           Validity & Reliability
• Internal validity - are you measuring what you
  think you are measuring?
   – Effects on Internal Validity:
      •   History
      •   Maturation
      •   Testing
      •   Subject selection
      •   Subject mortality/attrition
      •   Instrumentation
      •   Statistical regression
      •   Selection bias
      •   Selection/maturation interaction
          Validity & Reliability
• External Validity - generalizibility?
  – Effects on External Validity:
     •   Hawthorne effect
     •   Replication
     •   Multiple treatments
     •   Researcher effect
        Validity & Reliability
• How can we control these?
  – Internal validity
     • Randomization
     • Calibration
     • Placebo, Blind, Double Blind Setups
  – External validity
     • Randomization
     • Ecological validity
     • Design
     Four Types of Measurement
• Face or logical
   – measure obviously involves the performance being measured
       • static balance - balancing on one foot
            – R would like more objective evidence than this
• Criterion - validated vs. a criterion
   – 2 types:
       • Concurrent - a instrument is correlated with some criterion measured
         at the same time, concurrently
            – Stratford et al. (1987) Diagnostic values of knee extension tracings in
              suspected ACL tears Phys Ther
       • Predictive - use of a criterion to predict later behavior
            – Harris et al. (1984) Predictive validity of the “Movement Assessment of
              Infants” Devel Behav Ped
  Measurement Validity (con’t)
• Construct
  – degree to which a test measures some hypothetical
    construct; established by relating the test to some
     • What is strength?
         – move vs. gravity, speed specific torque, lift a weight a number of
           times in a specified time, functional task completed, type of
           muscle activity...
     • All are strength!
  – use of operational definitions
     • does not guarantee construct validity
         – is the operational definition appropriate? sensitive enough?
• “the degree to which a measurement is free from
  random error”
   – Effects on Reliability:
      •   Subject fatigue
      •   Subject motivation
      •   Subject learning
      •   Subject ability
      •   Tester skill
      •   Different testers
      •   Test environment
              Types of Reliability
• Test-retest-
   – stability of an observation over two time intervals.
• Split-half-
   – agreement parts of the instrument measuring the same thing?
• Inter-rater-
   – scores of a trained observer of a measure agree with scores of a
     second trained observer.
• Intra-rater-
   – scores of a trained observer agree with scores of the same trained
       • test retest, videotape
               Measurement error
• Sources of error:
   – subject
      • mood, fatigue, injury type, practice, motivation, knowledge
   – testing
      • test directions, multiple testers
   – evaluation
      • testers competency, criteria (subjective/objective)
   – instrumentation
      • calibration, sampling rate
    Other measurement issues
• When ever you acquire measured data, you
  should make every effort to maximize its
  accuracy and precision.
  – Accuracy is how close the measurement comes
    to the true value.
  – Precision is how close the measurements are
    from each other (units – mm, grades, Newtons).
              Accuracy / Precision

Not precise
               Accuracy / Precision

Accuracy but
not very
              Accuracy / Precision

Precise and
Look out!
    So what… Get to the point!
• We want both
  accuracy and

• Think of PT tests and
  – VAS, MMT, Pain
    Provocation Tests etc.
      Screening and Diagnosis-
       Sensitivity, Specificity
• Sensitivity = proportion or percentage of
  individuals with a particular diagnosis who are
  correctly identified as positive by the test (true
   – Compared to people who do have the condition.
• Specificity = proportion or percentage of without a
  particular diagnosis who are correctly identified as
  negative by the test (true negative).
   – Compared to people who do not have the condition.
               Sensitivity / Specificity
                             Condition compared to the
                             Gold Standard
                         Present            Absent

based on the
               Present   A                  B
test being
               Absent    C                  D

                 Sensitivity = A/(A+C)
                 Specificity = D/(B+D)
Predictive value – Tells the other
        part of the story.
• Positive predictive value = percentage of those
  identified by the test as positive who actually have
  the diagnosis.
   – Compared to people who do and do not have the
• Negative predictive value = percentage of those
  identified by the test as negative who actually do
  not have the diagnosis.
   – Compared to people who do and do not have the
                   Predictive value

                              Compared to the Gold Standard

Compared                    Present           Absent
based on the
test being
                 Present    A                 B
evaluated        Absent     C                 D
               Positive predictive value = A/(A+B)
               Negative predictive value = D/(C+D)
     Example from the literature.
• Tennet, TD, Beach, WR and Meyers, JE (2003) A Review of the
  Special Tests Associated with Shoulder Examination: Part I-The
  Rotator Cuff Tests. Am J of Sports Med. 31: 154-160.
    – Hawkins Test – Forward flexing the humerus to 90 degrees and
      forcibly internally rotating the shoulder.
    – Sensitivity = 92% for bursitis and 88% for cuff abnormalities.
        • Got it right, positively identified, people that have it!
    – Specificity = 44% for bursitis and 43% for cuff abnormalities.
        • Got it right, negatively identified, people that don’t have it!
    – Positive predictive values = 39% for bursitis and 37% for cuff
        • Got it right, positively identified, mixed sample.
    – Negative predictive values = 93.1% for bursitis and 90% for cuff
        • Got it right, negatively identified, mixed sample .
• There are over 30 special tests associated
  with a shoulder examination (Rotator
  Cuff/Instability Tests).
  – “Unfortunately, often these tests are of little
    help in confirming a diagnosis and many, for
    example, Yergason’s sign, were originally
    described without recourse to evidence based
    medicine, but now have become part of the
    standard orthopedic criteria.’’ (p . 160, Tennet
    et al., 2003 (AJSM).
            You should ask…
– Are some tests we use better than others?
   •   Reliable?
   •   Specific?
   •   Sensitive?
   •   Positive prediction?
   •   Negative prediction?
– Are the tests we use performed as originally described?
– Are the findings of the tests we use interpreted as
  originally intended?

To top