Principles of Epidemiology Lecture 8
Case-Control Studies (I)
Wei J. Chen, MD, ScD
Institute of Epidemiology College of Public Health National Taiwan University
1
Outlines
I. Rationale
I-1. Sampling from a fixed cohort I-2. Sampling from a dynamic cohort (density sampling)
II. Source of controls III. Comparability IV. Variants of the case-control design
2
I. Rationale
3
Basic Elements of Case-Control Studies
Natures
Select cases and controls Assess their exposure experience retrospectively Odds
Estimates
Odds for exposure in cases: Pr (E+ | D+) / Pr (E- | D+) Odds for exposure in controls: Pr (E+ | D-) / Pr (E- | D-) OR = [Pr (E+ | D+) / Pr (E- | D+)] / [Pr (E+ | D-) / Pr (E- | D-)]
Odds ratio
4
Rationales
Early
Rare disease assumption Sampling from a fixed cohort Sampling from a dynamic cohort
Recent
Density sampling or sampling from person-time pool
5
I-1. Sampling from a Fixed Cohort
Disease 1 Exposure 1 X1 0 Z1 Total N1
0
X0
Z0
N0
Cumulative incidence ratio (CIR)
(X1/N1)/(X0/N0) = (X1/X0)/(N1/N0) If X1 « N1 and X0 « N0,, then Z1 N1, Z0 N0
CIR (X1/X0)/(Z1/Z0) , exposure odds among cases/exposure odds among non-cases
Noncases sampled among the population of noncases
F: sampling fraction among noncases CIR (X1/X0)/(f·Z1/f·Z0) = (X1/X0)/(Y1/Y0)
6
Example of Controls Selected from a Fixed Cohort
Breslow and Day (1980) A true cohort
N= 10,000, exposure rate=30% IR (exposed) = 0.02 / year; IR (unexposed) = 0.01 / year Exposed cases: 3000 x (1 - e-IR•D) = 3000 x (1 - e-0.06) = 175 Unexposed cases: 7000 x (1 - e-0.03) = 207 7
For 3 years
(1) Analyzed as a Case-control Study
Calculate OR instead of IRR
Exposed Diseased Disease-free 175 2825 3000 Unexposed 207 6793 7000 382 9618 10000
Odds ratio = (175x6793)/(207x2825) = 2.03
8
(2) A Case-Control Study
All patients were ascertained as cases Along with a 10% sample of controls Sampling fraction for cases and controls must be the same regardless of exposure category
Exposed Diseased Disease-free 175 282 457
Unexposed 207 679 886 382 961 1343
Odds ratio = (175x679)/(207x282) = 2.04
9
I-2. Density Sampling in Control Selection
Density case-control study design
I1=A1/T1, I0=A0/T0 Goal of case-control design
Use a control series in place of complete assessment of the T1 and T0 Density sampling: controls selected in such a way that the relative sizes of the T1 and T0 can be validly estimated
Nested within a source population
A description of the source population correspond to the ideal eligibility criteria for both cases and controls to be in the study 10
Pseudo-Rates and Odds Ratio
Goal of control sampling
The exposure distribution among controls is the same as it is in the source population of cases B1/T1 = B0/T0 = r, if controls are selected independently of exposure (A1/B1) / (A0/B0) = (A1/T1) / (A0/T0) Ratio of pseudo-rates is an estimate of the IR ratio precision 11
Control sampling rate
Pseudo-rate
Penalty
Features of Density Sampling
A clear definition of source population needed
Sampling of controls and cases should be independent of exposure Easy to see the equivalence of odds ratio to IR ratio No rare disease assumption needed
12
Main advantage
A Hypothetical Scenario for Sampling from Person-time Pool
Select a date at random from the case accrual period Select a person at random from the population list Was the subject resident within the predetermined area as of the random date chosen? Repeat 1-3 until the desired number Asking exposure information: reference point
cases: onset of illness control: the random point in time 13
Guidelines of Density Sampling in Control Selection
From the same population that give rise to cases Independent of exposure status Probability of selecting proportional to person time Risk-set sampling
Eligible time for a control is the time when one is eligible to become a case Risk set: the set of individuals in the source population who are at risk of becoming a case at the time that the case is diagnosed Controls are matched to the case with respect to sampling 14
Special Situation for Control Selection
An individual selected as control who later develop the disease and is selected as a case
Counted both as a control and a case
The same person may appear in the control group two or more times
The same person at different times may provide different exposure (or confounder) information
15
Previous Guidelines on the Selection of Controls
Schlesselman (1982)
“the control series is intended to provide an estimate of the exposure rate that would be expected to occur in the cases if there were no association between the study disease and exposure” “the controls should be selected in an unbiased manner from those individuals who would have been included in the case series, had they developed the disease under study” 16
Miettinen (1976)
II. Source of Controls
17
Source of Control Series
Population controls
Cases are a representative sample of all cases in a precisely defined and identified population Control:
A random sampling from registry Selecting probability is proportional to the individual’s person-time at risk
Neighborhood controls
Controls are matched to the cases on neighborhood Neighborhood may be related to exposure; should be accounted for in the analysis Matched to cases on area code and prefix 18
Random digital dialing
Source of Control Series (cont.)
Hospital- or clinic-based controls
The source of population is often not identifiable Control selection
Limited the diagnoses for controls to those not related to the exposure of interest
Other diseases
In populations with established registries or insurance-claims databases May be related in exposure List provision dependent on the cases Overlapping Proxy respondents if cases are dead
Friend controls
Dead controls
19
Methods for Obtaining Populationbased Controls
Random Digit Dialing (RDD)
A two-stage sampling method to minimize the chances of calling telephone numbers that are not assigned to households (Waksberg, 1978) Any household with k>1 residential telephone numbers was subsampled with probability 1/k Screener question: “How many people living in this household (including yourself) are X to Y years old?” After enumberation, select a sample randomly (Kish’s sampling tables) Typically multi-stage
Area Probability Sampling (APS)
Block groups Segments (one or more blocks) Listing of housing units Random sample of housing units
20
Steps for RDD
1) 2) 3) 4)
Obtaining a list of all telephone area codes and existing prefix numbers (first 6 digits) Add all possible choices for the next two digits; the 8-digit numbers as Primary Sampling Units (PSUs) Randomly select an 8-digit number and also randomly select the final 2 digits Dial the number
1)
2)
If a residential address, select more additional 2 digits until the desired number k; conduct interviews on k+1 numbers If not residential, reject the PSU
5) 6)
Repeat steps 1-4 until the desired number of PSUs, m, is reached; Total sample size: m (k+1), m and k are chosen to satisfy criteria for an optimal sampling design 21
Example of One Kish Selection Table
Selection Table A If the Number of Eligible Persons is: 1 2 3 4 5 6+ Interview the Person Numbered: 1 1 1 1 1 1
1. Count the Number of Eligible Persons in the Household 2. Locate Number in Left-Hand Column: Place a check in that Row 3. Place a Check in the Corresponding Right-Hand Column 4. The Number Next to the Check Corresponds to the Person Selected as the Respondent 5. Go to Col. 11I and Place an “R” in the Row Which Corresponds to the Selected Respondent
22
12 Kish Selection Tables: to be used in order
Selection Table A If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table B If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table C If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table D If the Number of Eligible Persons is: 1 2 3 4 5 6+ Interview the Person Numbered: 1 1 1 1 1 1 Selection Table E If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table F If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table G If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table H If the Number of Eligible Persons is: 1 2 3 4 5 6+ Interview the Person Numbered: 1 1 2 2 3 3 Selection Table I If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table J If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table K If the Number of Eligible Persons is: 1 2 3 4 5 6+ Selection Table L If the Number of Eligible Persons is: 1 2 3 4 5 6+ Interview the Person Numbered: 1 2 3 3 3 5
Interview the Person Numbered: 1 1 1 1 1 1
Interview the Person Numbered: 1 1 2 2 3 3
Interview the Person Numbered: 1 2 3 4 5 5
Interview the Person Numbered: 1 1 1 1 2 2
Interview the Person Numbered: 1 2 2 3 4 4
Interview the Person Numbered: 1 2 3 4 5 6
Interview the Person Numbered: 1 1 1 2 2 2
Interview the Person Numbered: 1 2 2 3 4 4
Interview the Person Numbered: 1 2 3 4 5 6
23
III. Comparability
24
Misconception about Control Selection
Representativeness
Wrong
Of all person with diseases Of the entire nondiseased population
Correct
the source population for the cases is the one that the controls should represent
Exposure opportunity
Not needed, as in a real follow-up study
25
Comparability of Information
Comparable or nondifferential error in exposure measurement tends to bias the observed odds ratio toward the null Not always true
Unless exposure errors are also independent of errors in other variables Efforts to insure comparable exposure information lead to comparable information on other variables
26
Number of Control Groups
The value of using more than one control group is quite limited
A lack of difference between the groups only tells us that both groups incorporate similar net bias A difference only tells us that at least one is biased but does not tell us which is best or which is worse
27
Timing of Classification and Diagnosis
For cases
A lag period before diagnosis for exposure assessment Selection time
Natural event analogous to the case diagnosis time, e.g., time of hospitalization for hospital control Actual time of selection
For controls
28
IV. Variants of the Case-control Design
29
Variants of the Case-control Design
Case-cohort studies Nested case-control studies Cumulative (“Epidemic”) case-control studies
Controls are selected from those who remain free of disease at the end of epidemic
Case-only studies
In studies of gene-environment interaction
Analogue to classical crossover study for interventions without carry-over effect For each case
Case-crossover studies
Pre-disease time periods selected as control period Exposure at onset vs. exposure during control period
Example: sexual activity and myocardial infarction 30
Variants of the Case-control Design (cont.)
Two-stage sampling
The control series comprises a large number of individuals with a limited information (e.g., exposure status) A subsample of the controls were investigated for more detailed information (e.g., covariates)
In studies
Case-control studies with prevalent cases
Congenital malformations Chronic conditions with ill-defined onset times and limited effect on mortality (e.g., obesity)
31
Case-Cohort Study
For a fixed cohort
Cases: all incident cases in a given risk period Controls: a random sample from the population at risk at the start of the risk period
P(D| E) P(E|D) P(D) (1) P(E)
Rationale
P(D| E ) P(E | D) P(D) (2) P(E )
P(E|D) (1) P(D|E) P(E| D) P(E ) P(E | D) P(E) (2) P(D| E ) P(E | D) P(E) P(E )
Risk ratio = (exposure odds in cases) / (exposure odds in the total cohort at risk)
32
Example of Case-Cohort Study
An existing cohort
Blood drawn on 10,000 individuals Control: 400 sampled from original 10000
Typing results: 40+ , 360 -
Follow-up
200 with rheumatoid arthritis
Typing results: 80+ , 120 – OR = 80x360/40x120 = 6
150 with ankylosing spondylitis
Typing results: 15+ , 135 – OR = 1
33
Example of Nested Casecontrol Studies
Risk-set sampling (Sahl et al., 1993)
Mortality from various cancers and exposure to electromagnetic fields Case: cancer case from the worker cohort Control
Individuals in the worker cohort who were alive on the date of death of the case Who had the same birth year, sex, and ethnicity as the case Randomly select 10 matching controls for each case
34
Further Readings on Control Selection
Potthoff RF (1994) Telephone sampling in epidemiologic research: to reap the benefits, avoid the pittfalls. American Journal of Epidemiology, 139, 967-978 Reilly M (1996) Optimal sampling strategies for two-stage studies. American Journal of Epidemiology, 143, 92-100 Brogan DJ et al. (2001) Comparison of Telephone Sampling and Area Sampling: Response Rates and WithinHousehold Coverage. American Journal of Epidemiology, 153, 1119-1127. DiGaetano R & Waksberg J (2002) Commentary: tradeoffs in the development of a sample design for case-control studies. American Journal of Epidemiology, 155, 771-775
35