Document Sample

Introduction to Modeling Navadeh S. MD MPH Scientific Writing Unit Kerman Medical University 1 Content • Simple and multiple linear regression • Simple logistic regression – The logistic function – Estimation of parameters – Interpretation of coefficients • Multiple logistic regression – Interpretation of coefficients – Coding of variables 2 How can we analyse these data? Table 1 Age and systolic blood pressure (SBP) among 33 adult women Age SBP Age SBP Age SBP 22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217 3 SBP (mm Hg) SBP 81.54 1.222 Age 220 200 180 160 140 120 100 80 20 30 40 50 60 70 80 90 Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974 4 Simple linear regression • Relation between 2 continuous variables (SBP and age) y Slope (y) α β1x1 x • Regression coefficient b1 – Measures association between y and x – Amount by which y changes on average when x changes by one unit – Least squares method 5 Multiple linear regression • Relation between a continuous variable and a set of i continuous or categorical variables y α β1x1 β2 x 2 ... βi xi • Partial regression coefficients bi – Amount by which y changes on average when xi changes by one unit and all the other xis remain constant – Measures association between xi and y adjusted for all other xi • Example – SBP versus age, weight, height, etc 6 Multiple linear regression y α β1x1 β2 x 2 ... βi xi Dependent Independent variables Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables 7 Multivariate analysis Model Outcome Linear regression continous Poisson regression counts Cox model survival Logistic regression binomial ...... • Choice of the tool according to study, objectives, and the variables – Control of confounding – Model building, prediction 8 Logistic regression • Models the relationship between a set of variables xi – dichotomous (eat : yes/no) – categorical (social class, ... ) – continuous (age, ...) and – dichotomous variable Y • Dichotomous (binary) outcome most common situation in biology and epidemiology 9 How can we analyse these data? Table 2 Age and signes of Coronary Heart Disease (CHD) , 33 women Age CD CHD Age CD CHD Age CD CHD 22 0 40 0 54 0 23 0 41 1 55 1 24 0 46 0 58 1 27 0 47 0 60 1 28 0 48 0 60 0 30 0 49 1 62 1 30 0 49 0 65 1 32 0 50 1 67 1 33 0 51 0 71 1 35 1 51 1 77 1 38 0 52 0 81 1 10 How can we analyse these data? • Comparison of the mean age of diseased and non-diseased women – Non-diseased: 38.6 years – Diseased: 58.7 years (p<0.0001) • Linear regression? 11 Dot-plot: Data from Table 2 e Ys Signsofcoronarydisease o N 0 0 2 0 4 6 0 0 8 0 10 G (es r A Eya ) 12 Linear Regression a Coe fficients Unstandardiz ed Standardized Coef f icients Coef f icients Model B Std. Error Beta t Sig. 1 (Cons tant) -.527 .218 -2.415 .022 age .020 .004 .636 4.593 .000 a. Dependent Variable: c hd 13 YES NO Y = -0.527 + 0.20 x AGE 14 Table 3 - Prevalence (%) of signs of CHD according to age group Diseased Age group # in group # % 20 -29 5 0 0 30 - 39 6 1 17 40 - 49 7 2 29 50 - 59 7 4 57 60 - 69 5 4 80 70 - 79 2 2 100 80 - 89 1 1 100 15 Dot-plot: Data from Table 3 100 80 CHD (%) 60 40 20 0 0 20-29 1 30-39 2 40-49 3 50-59 4 60-69 5 70-79 6 80-89 7 Age group 16 The logistic function (1) Probability of disease 1.0 eα βx P(y x) 1 eα βx 0.8 0.6 0.4 0.2 0.0 x 17 The logistic function (2) e bx P( y x ) 1 e bx P( y x ) ln bx 1 P( y x ) { logit of P(y|x) 18 The logistic function (3) • Advantages of the logit – Simple transformation of P(y|x) – Linear relationship with x – Can be continuous (Logit between - to + ) – known binomial distribution (P between 0 and 1) – Directly related to the notion of odds of disease P P eαβx ln α βx 1- P 1- P 19 Binary Logistic Regression Variables in the Equation B S.E. Wald df Sig. Ex p(B) Step a age .132 .046 8.053 1 .005 1.141 1 Cons tant -6.708 2.354 8.121 1 .004 .001 a. Variable(s ) entered on step 1: age. 20 Binary Logistic Regression Variables in the Equation B S.E. Wald df Sig. Ex p(B) Step a age .135 .050 7.418 1 .006 1.145 1 sex 1.744 1.057 2.722 1 .099 5.719 Cons tant -7.537 2.610 8.337 1 .004 .001 a. Variable(s ) entered on step 1: age, sex. 21 Binary Logistic Regression Variables in the Equation B S.E. Wald df Sig. Ex p(B) Step a age .121 .047 6.574 1 .010 1.128 1 age by s ex .036 .022 2.582 1 .108 1.037 Cons tant -6.797 2.415 7.923 1 .005 .001 a. Variable(s ) entered on step 1: age, age * s ex . 22 Multiple logistic regression • More than one independent variable – Dichotomous, ordinal, nominal, continuous … P ln α β1x1 β2 x 2 ... βixi 1- P • Interpretation of bi – Increase in log-odds for a one unit increase in xi with all the other xis constant – Measures association between xi and log-odds adjusted for all other xi 23 Multiple logistic regression • Effect modification – Can be modelled by including interaction terms P ln α β1x1 β2 x 2 β3 x1 x 2 1- P 24 dummy or indicator coded 25 Reference • Hosmer DW, Lemeshow S. Applied logistic regression.Wiley & Sons, New York, 1989 26

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 10 |

posted: | 7/27/2012 |

language: | English |

pages: | 26 |

OTHER DOCS BY g7PJZX

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.