# Ch 3 Generalized Linear Models The structural form of

Document Sample

```					                         Ch 3 Generalized Linear Models

The structural form of the model describes the patterns of association interaction.

The size of the model parameters determine the strength and importance of the

effects.

The models this book presents are generalized linear models (GLM). This broad

class of models includes ordinary regression and ANOVA models for continuous

responses as well as models for discrete responses.

3.1 Components of a generalized linear model

Three components: The random component, systematic component and link

function.

Random Component

The random component of a GLM identifies the response variable Y and selects a

probability distribution for it. Denote the observations on Y by (Y1,….,Yn).

Standard GLMs treat Y1,…,Yn as independent.

Each observation is a count: binomial, Poisson or negative binomial distribution

for Y. If each observation is continuous: normal distribution for Y.

Systematic component

The systematic component of a GLM specifies the explanatory variables. These

enter linearly as predictors on the right-hand side of the model equation. The linear

combination of the explanatory variables is called the linear predictor:
α + β1 x1 + ... + β k xk

Denote the expected value of Y by µ = E (Y ) . Link function specifies a function

g(.) that relates   to the linear predictor as
g ( µ ) = α + β1 x1 + ... + β k xk

The function g(.), the link function, connects the random and systematic

components.

g ( µ ) = µ . Identity link. Regression model: µ = α + β1 x1 + ... + β k xk

g ( µ ) = log ( µ ) . Log link, loglinear model: log ( µ ) = α + β1 x1 + ... + β k xk

 µ 
g ( µ ) = log        . Logit link, logistic regression model:
 1− µ 
 µ 
log          = α + β1 x1 + ... + β k xk
 1− µ 
Each potential probability distribution for Y has one special function of the

mean that is called its natural parameter. The link function that uses the

natural parameter as g ( µ ) in the GLM is called the canonical link.

Normal GLM

Ordinary regression models for continuous response are special cases of GLMs.

They assume a normal distribution for Y and model its mean directly.

g ( µ ) = µ = α + β1 x1 + ... + β k xk (identity link function)

1. Nonnormal responses

2. MLE of GLM

3.2 Generalized linear models for binary data

The categorical response variables have only two categories: you take public

transportation today (Yes, No), or whether you have had a physical exam in the

past year (Yes, No). Its mean is E (Y ) = π .

The distribution of Y is specified by probabilities: P (Y = 1) = π ; P (Y = 0 ) = 1 − π .

Linear Probability Model

In ordinary regression, π = E (Y ) is a linear function of x. For a binary response, a
model is π = α + β x . This is called a linear probability model. This model is a

GLM with binomial random component and identity link function.

This model is simple, but unfortunately it has a structural defect. Probabilities fall

between 0 and 1, whereas linear functions take values over the entire real line.

If we ignored the binary nature of Y and used ordinary regression, the estimates of

the parameters would be the least squares estimates. They are the ML estimates

under the assumption of a normal response. An assumption of normality for a

binary response is not sensible.

Example: snoring and heart disease

Logistic regression model

Relationships between π ( x ) and x are usually nonlinear rather than linear.

exp (α + β x )     eα + β x
π ( x) =                     =
1 + exp (α + β x ) 1 + eα + β x

 π ( x) 
 1− π ( x)  = α + β x
This is call logistic regression function (logit model) : log            
.
           

The random component for the outcome has a binomial distribution. The link
 π 
function is the logit function log             of π , symbolized by “logit( π )”.
 1−π 
The parameter β in equation (3.2) determines the rate of increase or decrease of

the curve. The magnitude of β determines how fast the curve increases or

decreases.

Example: the snoring and heart disease data in Table 3.1,

 π ( x) 
logit π ( x )  = log             = −3.87 + 0.40 x
                1− π ( x) 
           

Since, β = 0.40 >0, the estimated probability of heart disease increase as snoring
level increases.

Probit regression model

Another model that has the S-shaped curves of Figure 3.2 is called the probit

model.

The link function for the model, called the probit link, transforms probabilities to

z-score from the standard normal distribution.

The probit model: probit π ( x ) = α + β x
       

The probit link function: probit(0.05)=-1.645, probit(0.95)= 1.645, probit(0.975)=

1.96

Example: the snoring and heart disease data with scores {0, 2, 4, 5} for snoring

level, the probit model is

probit π ( x )  = −2.061 + 0.188 x
        

At snoring level x=0,

In practice, probit and logistic regression models provide similar fits.

3.3 Generalized linear models for count data
Many discrete response variables have counts as possible outcomes. This

section introduces GLMs for count data.

Poisson distribution: unimodal and skewed to the right over the possible

values 0,1,2…It has a single parameter µ>0, which both its mean and its

variance. As the mean increases the skew decreases and the distribution

becomes more bell-shaped.

Poisson regression
Poisson loglinear model is a GLM that assumes a Poisson distribution for Y and

uses the log link function. For a single explanatory variable x, the Poisson

loglinear model
log µ = α + β x
⇒ µ = exp (α + β x ) = exp (α ) exp ( β x )

⇒ µ = eα ( e β )
x

A one-unit increase in x has a multiplicative impact of e β on µ .

β = 0 , the multiplicative factor is 1. Then the mean of Y does not change as x

changes.

β < 0 , e β < 1 and β > 0 , e β > 1 .

Example: Female horseshoe crabs and their satellites

To study investigated factors that affect whether the female crab had any other

males, called satellites, residing nearby her.

The response outcome for each female crab is her number of satellites. The

explanatory variable: female crab’s shell width. Mean=26.3 cm and s.d.=2.1

cm.

Figure 3.4 and 3.5: The sample means and the smoothed curve both show a

strong increasing trend. The trend seems approximately linear.

1.   Let µ denote the expected number of satellites for a female crab, and let x

denote her with. The poisson loglinear model is

log µ = α + β x = −3.305 + 0.164 x . Since β = 0.164 >0, width has a positive

estimated effect on the number of satellites.

The model fit yields an estimated mean number of satellites µ , a fitted value,

at any width. x=26.3, µ =2.7.

( )
For this model, exp β = 1.18 represents the multiplicative effect on the

fitted value for each 1-unit increase in x. A 1 cm increase in width has an 18%
increase in the estimated mean number of satellites.

2.   The Poisson regression model with identity link function has ML fit

µ = −11.53 + 0.550 ( SE = 0.059 ) x

This model is additive, rather than multiplicative. A 1 cm increase in width has a

predicted increase of 0.55 in the expected number of satellites.

Figure 3.6 plots the fitted number of satellites against width, for the models

Overdispersion: greater variability than expected

For grouped horseshoe crab data, table 3.3 shows the sample mean and

variance for the counts of the number of satellites for the female crabs in each

width category. The variances are much larger than the means. The

phenomenon for Poisson distribution is called overdispersion.
sample
No. Cases        No. Satellites sample mean
variance
<=23.25                14              1.00            2.77
23.25-24.25             14              1.43            8.88
24.25-25.25             28              2.39            6.54
25.25-26.25             39              2.69           11.38
26.25-27.25             22              2.86            6.89
27.25-28.25             24              3.88            8.81
28.25-29.25             18              3.94           16.88
>29.25                14              5.14            8.29
Overdispersion is not an issue in ordinary regression models assuming

normally distributed Y, because the normal has a separate parameter from the

mean.

Unlike the Poisson, the Negative Binomial regression has an additional

parameter such that the variance can exceed the mean.

E (Y ) = µ , Var (Y ) = µ + Dµ 2

3.4 Statistical inference and model checking
Statistical inference based on the Wald statistic is simplest, but likelihood-ratio

inference is more trustworthy.

Wald approach: 95% confidence interval for β equal β ± 1.96 ( SE ) .

To test H 0 : β = 0 , the Wald test statistic z = β SE ~ N ( 0,1) .

Likelihood-ratio approach: H 0 : β = 0

  
Test statistic: −2 log    = −2 log (       ) − log ( 1 )  = −2 ( L0 − L1 ) , this test statistic
0
        0                 
 1

has a large-sample chi-squared distribution with df=1.

Example: snoring and heart disease revisited
The deviance

Let LM denote the maximized log-likelihood value for a model M of interest. Let

LS denote the maximized log-likelihood value for the most complex model

possible.

The model has a separate parameter for each observation, and it provides a

perfect fit to the data. The model is said to be saturated.

Deviance= −2 [ LM − LS ] is the likelihood-ratio statistic for comparing model

M to the saturated model. It is a test statistic for the hypothesis that all

parameters that are in the saturated model but not in model M equal zero. For

some GLMs, the deviance has approximately a chi-squared distribution.

For the snoring and heart disease data, the linear probability model describes

four binomial observations by two parameters. The deviance =1.0, df=4-2=2.

P-value=0.97.

Model comparison using the deviance

Now consider two models, denoted by M0 and M1, such that M0 is a special case of

M1. Given that the more complex model holds, the likelihood-ratio statistic for

testing that the simpler model holds is −2 [ L0 − L1 ] . Since

−2 [ L0 − L1 ] = −2 [ L0 − LS ] − {2 [ L1 − LS ]} = Deviance0 − Deviance1

This test statistic is large when M0 fits poorly compared with M1. For large

samples, the statistic has an approximate chi-squared distribution, with df equal to

the difference between the residual df values for the separate models. (This df

value equals the number of additional parameters that are in M1 but not in M0)

H0: ??

Ha: ??

For the snoring and heart disease example.
Residuals comparing observations to the model fit

For any GLM, goodness-of-fit statistics only broadly summarize how well models

fit data. We obtain further insight by comparing observed and fitted values

individually. For observation I, the difference yi − µ i between an observed and

fitted value has limited usefulness.

Pearson residual is a standardized difference

yi − µ i
ei =
Var ( yi )
2
 y −µ       
For Poisson distribution: ∑ i ei = ∑ i  i
2             i      . When the GLM is the model
 µ          
    i       
corresponding to independence for cells in a two-way contingency table, this is the

Pearson chi-squared statistic X 2 for testing independence.
yi − µ i
The standardized residual =              , it is easier to tell when a deviation is
SE
“large”. Standardized residuals larger than about 2 or 3 in absolute value are

worthy of attention.

3.5 Fitting generalized linear models

Wald, likelihood-ratio, and score inference use the likelihood function

We shall not present the general formula for score statistics, but many test

statistics in this text are this type.

An advantage of the score statistic is that it exists even when the ML estimate

β is infinite. In that case, one cannot compute the Wald statistic.

In a sense, this statistic uses the most information of the three types of test

statistic. It is usually more reliable than the Wald statistic, especially when n

is small to moderate.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 18 posted: 11/17/2009 language: English pages: 9
How are you planning on using Docstoc?