# Introduction to logistic regression

Shared by:
Categories
Tags
-
Stats
views:
7
posted:
8/8/2012
language:
Unknown
pages:
26
Document Sample

```							Introduction to
Modeling
Beyond the Basics (Chapter 7)

1
Content

• Simple and multiple linear regression
• Simple logistic regression
– The logistic function
– Estimation of parameters
– Interpretation of coefficients
• Multiple logistic regression
– Interpretation of coefficients
– Coding of variables

2
How can we analyse these data?
Table 1   Age and systolic blood pressure (SBP) among 33 adult women

Age      SBP              Age     SBP              Age     SBP
22       131              41     139              52      128
23       128              41     171              54      105
24       116              46     137              56      145
27       106              47     111              57      141
28       114              48     115              58      153
29       123              49     133              59      157
30       117              49     128              63      155
32       122              50     183              67      176
33        99              51     130              71      172
35       121              51     133              77      178
40       147              51     144              81      217

3
SBP (mm Hg)
SBP  81.54  1.222  Age
220

200

180

160

140

120

100

80
20        30          40         50          60          70   80   90

Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974                  4
Simple linear regression

• Relation between 2 continuous variables (SBP and age)

y
Slope
(y)  α  β1x1
x

• Regression coefficient b1
– Measures association between y and x
– Amount by which y changes on average when x changes by
one unit
– Least squares method

5
Multiple linear regression

• Relation between a continuous variable and a set of
i continuous or categorical variables

y  α  β1x1  β2 x 2  ...  βi xi

• Partial regression coefficients bi
– Amount by which y changes on average when xi changes by
one unit and all the other xis remain constant
– Measures association between xi and y adjusted for all other xi

• Example
– SBP versus age, weight, height, etc

6
Multiple linear regression

y                  α  β1x1  β2 x 2  ...  βi xi

Dependent              Independent variables
Predicted              Predictor variables
Response variable      Explanatory variables
Outcome variable       Covariables

7
Multivariate analysis

Model                              Outcome

Linear regression                  continous
Poisson regression                 counts
Cox model                          survival
Logistic regression                binomial
......

• Choice of the tool according to study, objectives, and the
variables
– Control of confounding
– Model building, prediction

8
Logistic regression

• Models the relationship between a set of variables xi
– dichotomous (eat : yes/no)
– categorical (social class, ... )
– continuous (age, ...)

and

– dichotomous variable Y

• Dichotomous (binary) outcome most common
situation in biology and epidemiology

9
How can we analyse these data?

Table 2   Age and signes of Coronary Heart Disease (CHD) , 33 women

Age       CD
CHD              Age      CD
CHD              Age      CD
CHD
22        0               40       0               54       0
23        0               41       1               55       1
24        0               46       0               58       1
27        0               47       0               60       1
28        0               48       0               60       0
30        0               49       1               62       1
30        0               49       0               65       1
32        0               50       1               67       1
33        0               51       0               71       1
35        1               51       1               77       1
38        0               52       0               81       1

10
How can we analyse these data?

• Comparison of the mean age of diseased and
non-diseased women

– Non-diseased:   38.6 years
– Diseased:       58.7 years (p<0.0001)

• Linear regression?

11
Dot-plot: Data from Table 2
e
Ys
Signsofcoronarydisease

o
N

0       0
2      0
4         6
0   0
8    0
10
G (es
r
A Eya )

12
Linear Regression

a
Coe fficients

Unstandardiz ed         Standardized
Coef f icients        Coef f icients
Model                  B         Std. Error       Beta          t       Sig.
1       (Cons tant)    -.527           .218                    -2.415     .022
age             .020           .004            .636     4.593     .000
a. Dependent Variable: c hd

13
YES

NO

Y = -0.527 + 0.20 x AGE
14
Table 3 - Prevalence (%) of signs of CHD according to age group

Diseased

Age group    # in group     #              %
20 -29          5         0              0

30 - 39          6         1              17

40 - 49          7         2              29

50 - 59          7         4              57

60 - 69          5         4              80

70 - 79          2         2          100

80 - 89          1         1          100

15
Dot-plot: Data from Table 3
100

80
CHD (%)

60

40

20

0
0   20-29
1   30-39
2    40-49
3    50-59
4    60-69
5    70-79
6     80-89
7
Age group                        16
The logistic function (1)
Probability of
disease          1.0                     eα βx
P(y x) 
1  eα βx
0.8

0.6

0.4

0.2

0.0

x     17
The logistic function (2)

e bx
P( y x ) 
1  e bx

 P( y x ) 
ln                  bx
1  P( y x ) 

{
logit of P(y|x)

18
The logistic function (3)

– Simple transformation of P(y|x)
– Linear relationship with x

– Can be continuous (Logit between -  to + )
– known binomial distribution (P between 0 and 1)
– Directly related to the notion of odds of disease

 P                   P
 eαβx
ln         α  βx
 1- P               1- P

19
Binary Logistic Regression

Variables in the Equation

B           S.E.      Wald           df       Sig.     Ex p(B)
Step
a     age              .132         .046     8.053              1     .005      1.141
1      Cons tant      -6.708        2.354     8.121              1     .004       .001
a. Variable(s ) entered on step 1: age.

20
Binary Logistic Regression

Variables in the Equation

B           S.E.          Wald       df       Sig.     Ex p(B)
Step
a
age              .135         .050         7.418          1     .006      1.145
1      sex             1.744        1.057         2.722          1     .099      5.719
Cons tant      -7.537        2.610         8.337          1     .004       .001
a. Variable(s ) entered on step 1: age, sex.

21
Binary Logistic Regression

Variables in the Equation

B           S.E.        Wald        df       Sig.     Ex p(B)
Step
a
age                .121         .047       6.574           1     .010      1.128
1       age by s ex        .036         .022       2.582           1     .108      1.037
Cons tant        -6.797        2.415       7.923           1     .005       .001
a. Variable(s ) entered on step 1: age, age * s ex .

22
Multiple logistic regression

• More than one independent variable
– Dichotomous, ordinal, nominal, continuous …

 P 
ln         α  β1x1  β2 x 2  ... βixi
 1- P 
• Interpretation of bi
– Increase in log-odds for a one unit increase in xi with all
the other xis constant
– Measures association between xi and log-odds adjusted
for all other xi

23
Multiple logistic regression

• Effect modification
– Can be modelled by including interaction terms

 P 
ln    α  β1x1  β2 x 2  β3 x1  x 2
 1- P 

24
dummy or indicator coded

25
Reference

• Hosmer DW, Lemeshow S. Applied logistic
regression.Wiley & Sons, New York, 1989

26

```
Related docs
Other docs by HC120808155334
Chapter 13 Study Guide
extras 13 1