Odds for success by 9u914yg

VIEWS: 5 PAGES: 8

• pg 1
```									Logistic Regression

Regression method to use when the dependent
variable is dichotomous (1=yes, 0=no).

Used to relate linear functions of explanatory
variables to the probability of “success”.

Example: Association between periodontal status
and 10-year CHD incidence (yes/no).
10 year CHD incidence

no           yes
Count      3622         187
Healthy
%        95.1          4.9

Count      2308         150
Gingivitis
Periodontal                       %        93.9          6.1
classifcation                   Count      1657         258
Perio.
%        86.5         13.5

Count      1823         457
Edentulous
%        80.0         20.0

We’d like to adjust for the confounding effect of age.
The logistic model relates the probability of success
(p) with the explanatory variable (X) via the
relationship:
 p 
 1  p     X
log       
       
Could also have more than one explanatory variable.

The function log[p/(1-p)] is called the “logit”.

Why use the logit function?

 Modelling p = α+ βX may not make sense because
p has a limited range of (0,1), while α+ βX
theoretically is unrestricted .

 Logit(p) = log[p/(1-p)] can range from -∞ to ∞.

 p/(1-p) has a somewhat well-known interpretation.
It is the “odds” of success.

 The interpretation of the regression coefficient, β is
fairly intuitive. See next slide…
Interpretation of coefficients in Logistic reg.

p
Odds for success = 1  p

 p 
Logistic model:     1  p     X
log       
       

→ odds for success when X=x is e   x
To compare odds of success for different values of X
we can look at the Odds Ratio.

In particular, the odds ratio for X = x+1 compared to
X = x (i.e. the effect of a one unit increase in X) is:

odds( X  x  1) e  x 
OR                     x  e
odds( X  x)     e

→ β is the log(odds ratio)
Hypothesis testing and Confidence Intervals
The hypotheses of interest are usually:
H0: β = 0 vs. H1: β ≠ 0,
Which is equivalent to testing whether or not there is
association between x and p.

The test statistic is
b
Z
SE (b) ,
where b is the regression estimate of β, (estimated by
computer program).

Compare Z to a standard Normal distribution Should
have a large number of events for normal
approximation to be valid.

100(1- α)% confidence interval for β is given by
b  Z 1 / 2  SE (b)

100(1- α)% confidence interval for the Odds Ratio
is

(exp{b – Z1-α/2٠SE(b)},          exp{b + Z1-α/2٠SE(b)} )
Example: Perio status, Age, risk of CHD
Logistic regression using SPSS. Probability of CHD within
10 years is the dependent variable.
Variables in the Equation

B          S.E.         Wald      df            Sig.       Exp(B)
periodontal s tatus                          383.792         3          .000
Gingivitis     .230       .113        4.164         1          .041       1.259
Periodontitis    1.104       .101      120.608         1          .000       3.016
Edentulous      1.580       .091      298.642         1          .000       4.856
Cons tant             -2.964       .075     1561.851         1          .000        .052

Odds ratio corresponding to perio is 3.016.

That says that the odds of CHD for those with perio is
about 3 times greater than the odds for those with healthy
gums (the category not seen above)
Now we adjust for age by including it in the model:
Variables in the Equation

B          S.E.         Wald          df         Sig.      Exp(B)
periodontal status                             33.032             3       .000
Gingivitis     .144         .118       1.505             1       .220       1.155
Periodontitis     .470         .106      19.557             1       .000       1.600
Edentulous       .505         .099      25.833             1       .000       1.658
AGE                     .075         .003     563.351             1       .000       1.078
Constant              -6.677         .195    1173.673             1       .000        .001

Note that the odds ratio for perio is now only 1.600. A
statistically significant association is still indicated.
Thoughts on Odds Ratios

Odds ratios originated with the use of case-control studies.
These are studies where a group of people who have the
disease are sampled, and compared with a sample from the
non-disease population with respect to some exposure of
interest.

Case control studies are usually done when the disease is
rare, and a random sample of the general population would
not produce enough people with the disease.

Problem with case control study is that you cannot estimate
P(disease | exposure) because you didn’t sample from the
exposed population.

However, you can estimate P(exposure | disease).

And it turns out that with odds ratios:

odds (disease | exposed)               odds (exposed | disease)
=
odds (disease | not exposed)            odds (exposed | no disease)

Using case-control sampling we can still compare the odds
of getting the disease if you are or are not exposed.
Though the odds ratio is necessary in case control studies,
the truly interpretable statistic is the relative risk,
P(disease | exposed)
RR =
P(disease | not exposed)

The relative risk is not equivalent to the odds ratio

1 - P(disease | not exposed)
OR = RR ×
1- P(disease | exposed)

The OR and RR differ by an amount determined by the
prevalence of disease. For very rare diseases, OR ≈ RR.
Otherwise, they can quite different

CHD (%)       Relative Risk       Odds Ratio
4.9               4.9/95.1
Healthy        4.9               = 1.00                  = 1.00
4.9               4.9/95.1
6.1               6.1/93.9
Gingivitis     6.1               = 1.24                  = 1.26
Perio                               4.9               4.9/95.1
status
13.5               13.5/86.5
Perio.        13.5               = 2.74                  = 3.02
4.9                4.9/95.1
20.0               20.0/80.0
Edentulous    20.0               = 4.08                  = 4.86
4.9                4.9/95.1

Odds ratios can be deceptive since we usually interpret
them as relative risks.
Odds ratio/Relative Risk and Logistic Regression

In our perio/CHD example we adjusted for age. The model
for the logistic regression was:
 p 
 1  p     1  perio   2  age
log       
       
exp(β1) = 1.60 is interpreted as the odds ratio for CHD with
and without periodontitis, for a given age.

The way the logistic regression model works the odds ratio
will be the same for any age. However, the translation
from OR to RR will differ depending on the age, since the
age has a huge effect on the incidence of CHD.
OR and RR estimates by AGE
1.7

OR
1.6
1.5

RR
1.4
1.3

30        40         50        60           70
age (years)

```
To top