Odds for success
Document Sample


Logistic Regression
Regression method to use when the dependent
variable is dichotomous (1=yes, 0=no).
Used to relate linear functions of explanatory
variables to the probability of “success”.
Example: Association between periodontal status
and 10-year CHD incidence (yes/no).
10 year CHD incidence
no yes
Count 3622 187
Healthy
% 95.1 4.9
Count 2308 150
Gingivitis
Periodontal % 93.9 6.1
classifcation Count 1657 258
Perio.
% 86.5 13.5
Count 1823 457
Edentulous
% 80.0 20.0
We’d like to adjust for the confounding effect of age.
The logistic model relates the probability of success
(p) with the explanatory variable (X) via the
relationship:
p
1 p X
log
Could also have more than one explanatory variable.
The function log[p/(1-p)] is called the “logit”.
Why use the logit function?
Modelling p = α+ βX may not make sense because
p has a limited range of (0,1), while α+ βX
theoretically is unrestricted .
Logit(p) = log[p/(1-p)] can range from -∞ to ∞.
p/(1-p) has a somewhat well-known interpretation.
It is the “odds” of success.
The interpretation of the regression coefficient, β is
fairly intuitive. See next slide…
Interpretation of coefficients in Logistic reg.
p
Odds for success = 1 p
p
Logistic model: 1 p X
log
→ odds for success when X=x is e x
To compare odds of success for different values of X
we can look at the Odds Ratio.
In particular, the odds ratio for X = x+1 compared to
X = x (i.e. the effect of a one unit increase in X) is:
odds( X x 1) e x
OR x e
odds( X x) e
→ β is the log(odds ratio)
Hypothesis testing and Confidence Intervals
The hypotheses of interest are usually:
H0: β = 0 vs. H1: β ≠ 0,
Which is equivalent to testing whether or not there is
association between x and p.
The test statistic is
b
Z
SE (b) ,
where b is the regression estimate of β, (estimated by
computer program).
Compare Z to a standard Normal distribution Should
have a large number of events for normal
approximation to be valid.
100(1- α)% confidence interval for β is given by
b Z 1 / 2 SE (b)
100(1- α)% confidence interval for the Odds Ratio
is
(exp{b – Z1-α/2٠SE(b)}, exp{b + Z1-α/2٠SE(b)} )
Example: Perio status, Age, risk of CHD
Logistic regression using SPSS. Probability of CHD within
10 years is the dependent variable.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
periodontal s tatus 383.792 3 .000
Gingivitis .230 .113 4.164 1 .041 1.259
Periodontitis 1.104 .101 120.608 1 .000 3.016
Edentulous 1.580 .091 298.642 1 .000 4.856
Cons tant -2.964 .075 1561.851 1 .000 .052
Odds ratio corresponding to perio is 3.016.
That says that the odds of CHD for those with perio is
about 3 times greater than the odds for those with healthy
gums (the category not seen above)
Now we adjust for age by including it in the model:
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
periodontal status 33.032 3 .000
Gingivitis .144 .118 1.505 1 .220 1.155
Periodontitis .470 .106 19.557 1 .000 1.600
Edentulous .505 .099 25.833 1 .000 1.658
AGE .075 .003 563.351 1 .000 1.078
Constant -6.677 .195 1173.673 1 .000 .001
Note that the odds ratio for perio is now only 1.600. A
statistically significant association is still indicated.
Thoughts on Odds Ratios
Odds ratios originated with the use of case-control studies.
These are studies where a group of people who have the
disease are sampled, and compared with a sample from the
non-disease population with respect to some exposure of
interest.
Case control studies are usually done when the disease is
rare, and a random sample of the general population would
not produce enough people with the disease.
Problem with case control study is that you cannot estimate
P(disease | exposure) because you didn’t sample from the
exposed population.
However, you can estimate P(exposure | disease).
And it turns out that with odds ratios:
odds (disease | exposed) odds (exposed | disease)
=
odds (disease | not exposed) odds (exposed | no disease)
Using case-control sampling we can still compare the odds
of getting the disease if you are or are not exposed.
Though the odds ratio is necessary in case control studies,
the truly interpretable statistic is the relative risk,
P(disease | exposed)
RR =
P(disease | not exposed)
The relative risk is not equivalent to the odds ratio
1 - P(disease | not exposed)
OR = RR ×
1- P(disease | exposed)
The OR and RR differ by an amount determined by the
prevalence of disease. For very rare diseases, OR ≈ RR.
Otherwise, they can quite different
CHD (%) Relative Risk Odds Ratio
4.9 4.9/95.1
Healthy 4.9 = 1.00 = 1.00
4.9 4.9/95.1
6.1 6.1/93.9
Gingivitis 6.1 = 1.24 = 1.26
Perio 4.9 4.9/95.1
status
13.5 13.5/86.5
Perio. 13.5 = 2.74 = 3.02
4.9 4.9/95.1
20.0 20.0/80.0
Edentulous 20.0 = 4.08 = 4.86
4.9 4.9/95.1
Odds ratios can be deceptive since we usually interpret
them as relative risks.
Odds ratio/Relative Risk and Logistic Regression
In our perio/CHD example we adjusted for age. The model
for the logistic regression was:
p
1 p 1 perio 2 age
log
exp(β1) = 1.60 is interpreted as the odds ratio for CHD with
and without periodontitis, for a given age.
The way the logistic regression model works the odds ratio
will be the same for any age. However, the translation
from OR to RR will differ depending on the age, since the
age has a huge effect on the incidence of CHD.
OR and RR estimates by AGE
1.7
OR
1.6
1.5
RR
1.4
1.3
30 40 50 60 70
age (years)
Get documents about "