VIEWS: 36 PAGES: 3 CATEGORY: Jobs & Careers POSTED ON: 12/11/2009 Public Domain
Outline Outline Linear Models So far we have discussed linear models: ANOVA, regression, and ANCOVA We can even build our models in a “mixed effects” framework by incorporating different error terms 1 Generalized Linear Models We have always used a response variable (y ) that is continuous and normally distributed 1 Generalized Linear Models If the model errors weren’t normally distributed, we simply transformed y and made it somewhat normally distributed Week 11 - Generalized linear models , Week 11 - Generalized linear models , Week 11 - Generalized linear models , B. Aukema, NRES 798 1 / 16 B. Aukema, NRES 798 2 / 16 B. Aukema, NRES 798 3 / 16 Generalized Linear Models Generalized Linear Models Generalized Linear Models Generalized linear models are models that do not depend on a I’ll demonstrate how a generalized linear model works with a A generalized linear model uses something called a link function normally distributed response variable regression equation between the response (not normally distributed!) and the “mean” The two most common types are logistic models for binary data Each different “family” (e.g., Poisson, binomial, etc.) uses a yi = β0 + β1 xi + ǫi and Poisson models for Poisson distributed data different link function, sometimes known as a “canonical link” Generalized linear models, because they are linear models, can The β0 + β1 xi is the “mean” part of the model if the response is (there is only one per family) still do ANOVA, regression, and ANCOVA. In fact, they can even normally distributed. It contains the coefﬁcients in which we are incorporate random effects as well. The only difference is that we most interested can specify that the response variable is not normally distributed Note that generalized linear models are abbreviated GLM. (SAS corporation is very sloppy with their terminology. They have named one procedure GLM for general linear model, which is nonsensical). Week 11 - Generalized linear models , Week 11 - Generalized linear models , Week 11 - Generalized linear models , B. Aukema, NRES 798 4 / 16 B. Aukema, NRES 798 5 / 16 B. Aukema, NRES 798 6 / 16 Generalized Linear Models Generalized Linear Models Generalized Linear Models Logistic models Logistic models: the (sort of) confusing part Logistic regression: example The link function for the binomial family is known as the “logit”: Now, the mean response in a logical model, the average of all the 0/1 data, will be the probability of the measured event Let’s say you wanted to ﬁnd the probability of y occurring at p(Y ) x = 1124. From the summary table, you can ﬁnd the equation log( ) = β0 + β1 x The link function is simply a transformation of the probability of 1 − p(Y ) your response Coefficients: Estimate Std. Error z value Pr(>|z|) or whatever your “mean” part of the model is (e.g., could be µ + αi To retrieve this probability, you need to backtransform the model (Intercept) 10.753080 3.870467 2.778 0.005465 ** if a one-way ANOVA) coefﬁcients x -0.009576 0.003534 -2.709 0.006739 ** In practice, you specify to the computer For example, in our regression, the back transformation to get the 1 The response variable direct probability back, would be y = 10.75 − 0.009576 × 1124 = −0.0134 2 The model 3 The data distribution family exp(β0 + β1 x) To get the probability, we back transform this value: p(Y ) = You do not transform the original data! 1 + exp(β0 + β1 x) exp(−0.0134) p(Y ) = = 0.497 1 + exp(−0.0134) Week 11 - Generalized linear models , Week 11 - Generalized linear models , Week 11 - Generalized linear models , B. Aukema, NRES 798 7 / 16 B. Aukema, NRES 798 8 / 16 B. Aukema, NRES 798 9 / 16 Generalized Linear Models Generalized Linear Models Generalized Linear Models Logistic models: model ﬁtting Logistic model ﬁtting Logistic models: the concept of deviance Fitting a model is pretty similar to what we have already done Parameter estimation is performed via likelihood, not least squares New things: We use glm() instead of lm() Hence, you will not ﬁnd an R 2 value but instead an AIC value The “residual” error is called “deviance” in logistic regression In a mixed model framework, we use glmmPQL() instead of With deviance, we no longer use our trusty t and F tests lme() The syntax is entirely the same for model speciﬁcation (i.e., specify the response, the model, the data source, AND now the data family as well (e.g., family=“binomial” for logistic regression)) Week 11 - Generalized linear models , Week 11 - Generalized linear models , Week 11 - Generalized linear models , B. Aukema, NRES 798 10 / 16 B. Aukema, NRES 798 11 / 16 B. Aukema, NRES 798 12 / 16 Generalized Linear Models Generalized Linear Models Generalized Linear Models Logistic models: χ2 tests Logistic models: Wald tests Logistic regression: Model assumptions New things: New things: 1 Data are independent F tests have been replaced with χ2 tests Instead of a t test to check whether or not your model parameters 2 Model is correct So, for example, when using anova(), you will see χ2 tests, are different from zero, you now use a Z distribution especially in model selection For logistic models, each test is called a Wald test Week 11 - Generalized linear models , Week 11 - Generalized linear models , Week 11 - Generalized linear models , B. Aukema, NRES 798 13 / 16 B. Aukema, NRES 798 14 / 16 B. Aukema, NRES 798 15 / 16 Generalized Linear Models Logistic regression: model checking There is still some residual, which you can read from the model summary. It can be helpful to see how much noise there is We do not, however, check residual plots: we cannot transform the data anyway! Week 11 - Generalized linear models , B. Aukema, NRES 798 16 / 16