# Validity and reliability: screening and diagnostic tests

Document Sample

```					               Lecture 3
Validity of screening and
diagnostic tests
• Reliability: kappa coefficient
• Criterion validity:
–   “Gold” or criterion/reference standard
–   Sensitivity, specificity, predictive value
–   Relationship to prevalence
–   Likelihood ratio
–   ROC curve
–   Diagnostic odds ratio                        1
Clinical/public health
applications
• screening:
– for asymptomatic disease (e.g., Pap test,
mammography)
• for risk (e.g., family history of breast cancer
• case-finding: testing of patients for diseases
unrelated to their complaint
• diagnostic: to help make diagnosis in
symptomatic disease or to follow-up on
screening test                                2
Evaluation of screening and
diagnostic tests
• Performance characteristics
– test alone
• Effectiveness (on outcomes of disease):
– test + intervention

3
Criteria for test selection
•   Reliability
•   Validity
•   Feasibility
•   Simplicity
•   Cost
•   Acceptability

4
Measures of inter- and intra-rater
reliability: categorical data
• Percent agreement
– limitation: value is affected by prevalence -
higher if very low or very high prevalence
• Kappa statistic
– takes chance agreement into account
– defined as fraction of observed agreement not
due to chance

5
Kappa statistic
Kappa = p(obs) - p(exp)
1 - p(exp)

p(obs): proportion of observed agreement
p(exp): proportion of agreement expected by chance

6
Example of Computation of Kappa

Agreement between the First and the Second Readings to Identify Atherosclerosis Plaque
in the Left Carotid Bifurcation by B-Mode Ultrasound Examination in the
Atherosclerosis Risk in Communities (ARIC) Study

Plaque            Normal          Total
Second reading         Plaque             140                52            192
Normal              69               725            794
Total              209               777            986

Observed agreement = 140 +725/986 = 0.877

Chance agreement for plaque – plaque cell = (209 x 192)/986 = 40.7

Chance agreement for normal- normal cell = 777 x 794/986 = 625.7

Total chance agreement = 40.7 + 625.7/986 = 0.676

Kappa = 0.877 – 0.676 = 0.62
1 – 0.676

7
Interpretation of kappa
• Various suggested interpretations
• Example: Lanis & Koch, Fleiss
excellent:    over 0.75
fair to good: 0.40 - 0.75
poor:         less than 0.40

8
Validity (accuracy) of
screening/diagnostic tests
• Face validity, content validity: judgement of the
appropriateness of content of measurement
• Criterion validity
– concurrent
– predictive

9
Normal vs abnormal
• Statistical definition
– “Gaussian” or “normal” distribution
• Clinical definition
– using criterion

10
11
12
13
14
Selection of criterion
(“gold” or criterion standard)
• Concurrent
– salivary screening test for HIV
– history of cough more than 2 weeks (for TB)
• Predictive
– APACHE (acute physiology and chronic
disease evaluation) instrument for ICU patients
– blood lipid level
– maternal height
15
Sensitivity and specificity
Assess correct classification of:
• People with the disease (sensitivity)
• People without the disease (specificity)

16
"True" Disease Status
Present             Absent

Screening
test results
Positive           "True positives"    "False positives"
A                   B
Negative           "False negatives"   "True negatives"
C                  D

Sensitivity of screening test =        A
A+C

Specificity of screening test =        D
B+D

Predictive value of positive test =    A
A+B

Predictive value of negative test =    D
C+D

17
Predictive value
• More relevant to clinicians and patients
• Affected by prevalence

18
Choice of cut-point
If higher score increases probability of disease
• Lower cut-point:
– increases sensitivity, reduces specificity
• Higher cut-point:
– reduces sensitivity, increases specificity

19
Considerations in selection of
cut-point
Implications of false positive results
• burden on follow-up services
• labelling effect
Implications of false negative results
• Failure to intervene

20
(ROC) curve
• Evaluates test over range of cut-points
• Plot of sensitivity against 1-specificity
• Area under curve (AUC) summarizes
performance:
– AUC of 0.5 = no better than chance

21
22
Likelihood ratio
• Likelihood ratio (LR) = sensitivity
1-specificity
• Used to compute post-test odds of disease
from pre-test odds:
post-test odds = pre-test odds x LR
• pre-test odds derived from prevalence
• post-test odds can be converted to
predictive value of positive test           23
Example of LR
• prevalence of disease in a population is 25%
• sensitivity is 80%
• specificity is 90%,
• pre-test odds = 0.25 = 1/3
1 - 0.25
• likelihood ratio = 0.80 = 8
1-0.90
24
Example of LR (cont)
• If prevalence of disease in a population is
25%
• pre-test odds = 0.25 = 1/3
1 - 0.25
• post-test odds = 1/3 x 8 = 8/3
• predictive value of positive result = 8/3+8
= 8/11 = 73%
25
Diagnostic odds ratio
• Ratio of odds of positive test in diseased vs
odds of negative test in non-diseased:
a.d
b.c
• From previous example:
OR = 8 x 27 = 36
2x3
26
Summary: LR and DPR
• Values:
– 1 indicates that test performs no better than
chance
– >1 indicates better than chance
– <1 indicates worse than chance
• Relationship to prevalence?

27
Applications of LR and DOR
• Likelihood ratio: Primarily in clinical
context, when interest is in how much the
likelihood of disease is increased by use of
a particular test
• Diagnostic odds ratio Primarily in research,
when interest is in factors that are
associated with test performance (e.g., using
logistic regression)
28

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 48 posted: 11/6/2012 language: Unknown pages: 28