Introduction to logistic regression

Document Sample
Introduction to logistic regression Powered By Docstoc
					Introduction to
   Modeling
    Navadeh S. MD MPH
   Scientific Writing Unit
 Kerman Medical University



                             1
                         Content

• Simple and multiple linear regression
• Simple logistic regression
   – The logistic function
   – Estimation of parameters
   – Interpretation of coefficients
• Multiple logistic regression
   – Interpretation of coefficients
   – Coding of variables




                                          2
     How can we analyse these data?
Table 1   Age and systolic blood pressure (SBP) among 33 adult women

   Age      SBP              Age     SBP              Age     SBP
    22       131              41     139              52      128
    23       128              41     171              54      105
    24       116              46     137              56      145
    27       106              47     111              57      141
    28       114              48     115              58      153
    29       123              49     133              59      157
    30       117              49     128              63      155
    32       122              50     183              67      176
    33        99              51     130              71      172
    35       121              51     133              77      178
    40       147              51     144              81      217




                                                                       3
             SBP (mm Hg)
                                                                SBP  81.54  1.222  Age
            220

            200

            180

            160

            140

            120

            100

              80
                   20        30          40         50          60          70   80   90

                                                  Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974                  4
            Simple linear regression

• Relation between 2 continuous variables (SBP and age)

        y
                                      Slope
                                              (y)  α  β1x1
                                  x


• Regression coefficient b1
   – Measures association between y and x
   – Amount by which y changes on average when x changes by
     one unit
   – Least squares method


                                                              5
          Multiple linear regression

• Relation between a continuous variable and a set of
  i continuous or categorical variables

                 y  α  β1x1  β2 x 2  ...  βi xi

• Partial regression coefficients bi
   – Amount by which y changes on average when xi changes by
     one unit and all the other xis remain constant
   – Measures association between xi and y adjusted for all other xi

• Example
   – SBP versus age, weight, height, etc



                                                                       6
      Multiple linear regression

y                  α  β1x1  β2 x 2  ...  βi xi

Dependent              Independent variables
Predicted              Predictor variables
Response variable      Explanatory variables
Outcome variable       Covariables




                                                      7
              Multivariate analysis

   Model                              Outcome

   Linear regression                  continous
   Poisson regression                 counts
   Cox model                          survival
   Logistic regression                binomial
   ......


• Choice of the tool according to study, objectives, and the
  variables
   – Control of confounding
   – Model building, prediction



                                                               8
                Logistic regression

• Models the relationship between a set of variables xi
   – dichotomous (eat : yes/no)
   – categorical (social class, ... )
   – continuous (age, ...)

                                and

   – dichotomous variable Y

• Dichotomous (binary) outcome most common
  situation in biology and epidemiology


                                                      9
    How can we analyse these data?

Table 2   Age and signes of Coronary Heart Disease (CHD) , 33 women


   Age       CD
            CHD              Age      CD
                                     CHD              Age      CD
                                                              CHD
   22        0               40       0               54       0
   23        0               41       1               55       1
   24        0               46       0               58       1
   27        0               47       0               60       1
   28        0               48       0               60       0
   30        0               49       1               62       1
   30        0               49       0               65       1
   32        0               50       1               67       1
   33        0               51       0               71       1
   35        1               51       1               77       1
   38        0               52       0               81       1




                                                                      10
   How can we analyse these data?

• Comparison of the mean age of diseased and
  non-diseased women

   – Non-diseased:   38.6 years
   – Diseased:       58.7 years (p<0.0001)


• Linear regression?




                                               11
                              Dot-plot: Data from Table 2
                         e
                         Ys
Signsofcoronarydisease




                         o
                         N



                          0       0
                                  2      0
                                         4         6
                                                   0   0
                                                       8    0
                                                            10
                                         G (es
                                             r
                                         A Eya )

                                                                 12
                  Linear Regression


                                              a
                                  Coe fficients

                      Unstandardiz ed         Standardized
                        Coef f icients        Coef f icients
Model                  B         Std. Error       Beta          t       Sig.
1       (Cons tant)    -.527           .218                    -2.415     .022
        age             .020           .004            .636     4.593     .000
  a. Dependent Variable: c hd




                                                                          13
YES




NO




      Y = -0.527 + 0.20 x AGE
                           14
Table 3 - Prevalence (%) of signs of CHD according to age group

                                          Diseased

          Age group    # in group     #              %
            20 -29          5         0              0

           30 - 39          6         1              17

           40 - 49          7         2              29

           50 - 59          7         4              57

           60 - 69          5         4              80

           70 - 79          2         2          100

           80 - 89          1         1          100

                                                                  15
                    Dot-plot: Data from Table 3
          100

          80
CHD (%)




          60

          40

          20

           0
                0   20-29
                        1   30-39
                               2    40-49
                                       3    50-59
                                               4    60-69
                                                       5    70-79
                                                              6     80-89
                                                                      7
                                       Age group                        16
                       The logistic function (1)
Probability of
disease          1.0                     eα βx
                              P(y x) 
                                       1  eα βx
                 0.8


                 0.6


                 0.4


                 0.2


                 0.0

                                              x     17
The logistic function (2)

                  e bx
     P( y x ) 
                1  e bx

         P( y x ) 
     ln                  bx
        1  P( y x ) 

       {
       logit of P(y|x)




                                   18
          The logistic function (3)

• Advantages of the logit
   – Simple transformation of P(y|x)
   – Linear relationship with x

   – Can be continuous (Logit between -  to + )
   – known binomial distribution (P between 0 and 1)
   – Directly related to the notion of odds of disease


            P                   P
                                       eαβx
        ln         α  βx
            1- P               1- P


                                                         19
             Binary Logistic Regression



                                Variables in the Equation

                       B           S.E.      Wald           df       Sig.     Ex p(B)
Step
 a     age              .132         .046     8.053              1     .005      1.141
1      Cons tant      -6.708        2.354     8.121              1     .004       .001
  a. Variable(s ) entered on step 1: age.




                                                                                 20
              Binary Logistic Regression


                                Variables in the Equation

                       B           S.E.          Wald       df       Sig.     Ex p(B)
Step
 a
       age              .135         .050         7.418          1     .006      1.145
1      sex             1.744        1.057         2.722          1     .099      5.719
       Cons tant      -7.537        2.610         8.337          1     .004       .001
  a. Variable(s ) entered on step 1: age, sex.




                                                                                         21
             Binary Logistic Regression


                                  Variables in the Equation

                          B           S.E.        Wald        df       Sig.     Ex p(B)
Step
 a
        age                .121         .047       6.574           1     .010      1.128
1       age by s ex        .036         .022       2.582           1     .108      1.037
        Cons tant        -6.797        2.415       7.923           1     .005       .001
  a. Variable(s ) entered on step 1: age, age * s ex .




                                                                                    22
        Multiple logistic regression

• More than one independent variable
   – Dichotomous, ordinal, nominal, continuous …


                 P 
             ln         α  β1x1  β2 x 2  ... βixi
                 1- P 
• Interpretation of bi
   – Increase in log-odds for a one unit increase in xi with all
     the other xis constant
   – Measures association between xi and log-odds adjusted
     for all other xi



                                                                   23
       Multiple logistic regression

• Effect modification
   – Can be modelled by including interaction terms


      P 
  ln    α  β1x1  β2 x 2  β3 x1  x 2
      1- P 



                                                      24
dummy or indicator coded




                           25
                 Reference

• Hosmer DW, Lemeshow S. Applied logistic
  regression.Wiley & Sons, New York, 1989




                                            26

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:7/27/2012
language:English
pages:26