# ABSLec17 by wanghonghx

VIEWS: 6 PAGES: 21

• pg 1
```									      Bayesian General Linear Model

1) Review of the General Linear Model

2) Bayesian setup of the GLM

3) WinBugs Implementation of the GLM
-    Logit, Poisson, etc.
Review of GLMS
In general, statistical models contain both systematic and random
components.

For the standard linear model, we assume that Y (the dep. var) is a
vector of random variables whose components are independently
distributed with mean 

 represents the systematic component of the model (the expected
value of Y) and is assumed to be a linear function of explanatory
variables X and parameters b.

The random part of the model (the unexplainable error terms) are
assumed to be independent with constant error variance.

In the classical model, we assume that the random component follows
a normal distribution. Thus, components of Y are independent
normal variables with constant variance.

Thus, Y =  + e = XB + e
Review of GLMs
To segue to the general linear model, note that the classical model has three important
components:

1)   the random component—each observation or component of Y has an independent
normal distribution with E(Y) =  and constant variance 2.

2) the systematic component—covariates X produce a linear predictor  = XB

3) The link between the random and the systematic components  = .

For the normal linear model, the link states that the linear predictor  = XB is identical to
the expected value of the random component.

However, more generally, i = g(i), where g( ) is called the link function.

General linear models extend the above setup to the case where:
1) the random component follows a distribution other than the normal
2) the link function is a function other than that given above.
Review of GLMs
Common distributions other than the normal for the
random component include…

1) Poisson

2) Bernoulli (or binomial)

3) Weibull (for duration models)

4) Multinomial
Review of GLMs

The link function relates the linear predictor  to an observation y.

In classical linear models, where the dependent variable is normal, we
use the identity link. Since the expected value of the linear predictor
can take any value for the normal distribution, the identity link makes
sense.

For a Poisson (count) model,  > 0, so the identity link is less attractive
because the linear predictor  can be negative while  cannot.
So, for Poisson models we typically use the log link,  = log(), or its
inverse  = exp[].
This function maps the linear predictor which can take any value along the
real line to a set of plausible expected values.

For a Bernoulli model, 0 <  < 1, so the identity link is unattractive.
So, for Bernoulli models, we typically use the logit link,  = log(/(1- ))
Bayesian Setup of the GLM

The Bayesian setup for the GLM is a very natural extension of the framework
we have used for regression models.

Step 1. Specify the probability distribution for the dependent variable in your
model.
i.e. yi ~ N(i, t)

Step 2. Define the linear predictor for your model.
i.e. b1 + b2x2i

Step 3. Choose the link function that maps from the linear predictor  to a set of
plausible expected values for .
i.e. i = b1 + b2x2i

Step 4. Choose priors for all of the parameters in your model.
i.e. bj ~ N(0, .001) and t ~ G(.1 , .1)
Implementation for GLMs
1)     In most cases, you will need to specify a set of initial values for the
regression coefficients for the logit model.

2)     You may need to specify a more reasonable set of priors for the precision
terms for your parameters than the massively diffuse priors we used for
the normal linear model.

Failure to follow 1) and 2) may cause WinBugs to bomb. To see why, suppose we chose
diffuse priors for our regression coefficients and did not specify a set of initial
values. WinBugs will choose a reasonable set of initial values for us by basically
sampling from our priors. It is likely that the coefficients will be either very big or
very small numbers. Consequently, when these numbers are used to sample from
the linear predictor, the expected value will be a very large or very small number,
which when substituted into the link function will yield an impossibly large number
or a number so close to zero that the link function will be undefined.

3)     GLMs will take a longer time to run than the linear model (not iterations of
the Gibbs sampler necessarily, but cpu time). Tricks to speed
convergence will therefore become increasingly important because each
iteration is more costly.
Logistic Regression
Suppose that yi ~ Bernoulli(pi),

To ensure that 0 < pi < 1, we use the logit transformation so,
logit(pi) <- b1 + b2x2i + … bkxki

We assume that bj ~ N(0, .001) for all j

notice—there is no prior distribution for the variance of y because there
is not a parameter in the Bernoulli distribution for variance

Thus, the joint posterior distribution of the parameters is given by the
following expression:
p(b1,…,bk|y,x)  p(b1,…,bk)ip(yi | b1 + b2x2i + … bkxki)
 p(b1,…,bk) i piy (1-pi)1-y
 p(b1,…,bk)I logit-1(b1 + b2x2i +…+ bkxki)y
 logit-1(1-(b1 + b2x2i + … + bkxki))1-y
Application—Voter perceptions of
party differences
Dependent Variable: whether voters indicate that
Republicans are more conservative than Democrats on a
liberal-conservative ideology scale (1=successful
classification, 0=failure)

Independent Variables:
- Respondent level data:
race, gender, education, party identification, strength of
respondent’s ideology (folded self-placement)
- Contextual data:
degree of elite-level polarization, whether there was
divided government, and whether there was a
presidential election year
WinBugs Implementation of Example

model {
for (i in 1:29022) {
placement[i] ~ dbern(p[i])
logit(p[i]) <- b[1] + b[2]*race[i] + b[3]*gender[i] + b[4]*educ[i] + b[5]*pid[i] +
b[6]*strideol[i] + b[7]*elitepol[i] + b[8]*divgov[i] + b[9]*offyear[i]
}

for (j in 1:9) {
b[j] ~ dnorm(0,.001)
}
}
Comment 1—if all independent variables are standardized the model
converges quickly even without the multivariate normal prior distribution for
bj, though with 30,000 observations, things still take awhile.]
Model Interpretation in WinBugs

I’ve never tried this, but in principle the generation of predicted values
with confidence bands should be quite simple.

Set it up as follows: Let logit(mustar) = the expected value for the linear
logit(mustar) <- b[1] + b[2]*mean(race[]) + b[3]*mean(gender[])
+ … b[k]*(mean(educ[]) + stdev(educ[]))

Pred ~ dbern(mustar)

Set sample monitor tools to monitor pred, which generate a sample
summary of predicted values from the posterior predictive
distribution corresponding to the stated values of X.
Hierarchical Logit

The hierarchical logistic regression model is a very easy
extension of standard logit.

Likelihood
yij ~ Bernoulli(pij),
logit(pij) <- b1j + b2jx2ij + … bkjxkij

Priors
bjk ~ N(Bjk, Tk) for all j,k
Bjk <- k1 + k2 zj2 + … + km zjm
qr ~ N(0, .001) for all q,r
Tk ~ Gamma(.01, .01)
Poisson Regression

Suppose that yi ~ Poisson(i)

To ensure that i > 0, we use the log link function,
log(i) <- b1 + b2x2i + … bkxki

We assume that bj ~ N(0, .001) for all j
Application—Slave Revolts
Dependent Variable: the number of national slave revolts in year t
(1805-1860).

Independent Variables:
nominal slave prices
whether the U.S. was at war
whether it was a presidential election year
lagged revolts
interaction between war and election, war and lagged revolts, and
election and lagged revolts

Key variables of interest were the interaction terms with elections
because the theory predicted that elections would communicate
weakness of the police state to the slaves.
WinBugs Implementation

model {
for (i in 1:56) {
natrvlt[i] ~ dpois(lambda[i])
log(lambda[i]) <- b[1] + b[2]*nlslvpr[i] +
b[3]*War[i] + b[4]*Election[i] + b[5]*natrvltL1[i] +
b[6]*Election[i]*natrvltL1[i] + b[7]*War[i]*Election[i] +
b[8]*War[i]*natrvltL1[i]
}

for (j in 1:8) { b[j] ~ dnorm(0,.1) }

}
Multinomial Logit

Suppose that yi ~ multinomial([pi1,…,pim], 1)
Let xij represent the set of factors that might induce actor i to choose
outcome j.
Then the probability that individual i chooses outcome j can be
represented:
pij = exp(b1j + b2jxij)/ k=1 to m exp(b1k + b2kxik)
Let log(phiij) = b1j + b2jxij such that:
pij = phiij / (k=1 to m phiik)

To implement the multinomial logit, it is necessary to fix one of latent
utilities in the model (the log(phi’s)) to sum value such that the
predictors then influence how the attributes of one variables
influence the change in probability of choosing one outcome versus
the baseline.
Thus, for the fixed outcome j, we let b1j = b2j = 0 and assume diffuse
normal priors for everything else.
WinBugs Implementation of MNL
model {
for (i in 1:numberofobservations) {
y[i, 1:numberofchoices] ~ dmulti( p[i, 1:numberofchoices] , 1 )
for (j in 1: numberofchoices) {
p[i,j] <- phi[i,j] / sum(phi[i,])
log(phi[i,j]) <- b[j,1] + b[j,2]*x2[i]
}
}
# priors - fix the values for the first outcome variable to be zero to establish a baseline
b[1,1] <- 0
b[1,2] <- 0
# all other parameters influence the probability of an outcome relative to the baseline
for (j in 2:numberofchoices) {
b[j,1] ~ dnorm(0, .01)
b[j,2] ~ dnorm(0, .01)
}
}
# Data--the trick is to create a matrix for the dependent variable here that is consistent with the number
of observations and the number of choices.
Suppose the number of observations = 10 and the number of choices = ). Then:

list( y = structure(.Data = c(1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1),.Dim=c(10,3)),
x2=c(1,1.1,1.2,2,2.1,2.2,3,3.1,3.2,3.3), numberofobservations=10, numberofchoices=3)
Autoregressive Model

Suppose that you have a model with serial correlation. That is, you
have a time-series model such that:
Yt = b1 + b2x2t + … bkxkt + et,
where et = et-1 + vt and 0  ||  1 and vt ~ N(0 , 2v)

It can be shown that:
Var(et) = 2v / (1- 2)     and       = cov(et , et-1) /2e

Recall that serial correlation decreases our model’s efficiency, but may
mask that loss because standard errors are biased (positive serial
correlation decreases standard errors, negative serial correlation
increases standard errors).
The standard correction for serial correlation

By Definition et = et-1 + vt
where et-1 = Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1

To correct for serial correlation, we leverage this expression for et into
our original expression for Yt such that for all t > 1:
Yt = b1 + b2x2t + … bkxkt + et-1 + vt
= b1 + b2x2t + … bkxkt + (Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1) + vt

Which can be rewritten for t > 1:
Yt - Yt-1 = b1 (1-) + b2(x2t - x2t-1) + … bk(xkt - xkt-1) + vt

This transformed equation has a normally distributed error term with
mean zero and constant variance.
Application—Reagan’s Presidential Approval
(Example brazenly lifted from Jackman’s MCMC page)

Dependent variable: Reagan’s aggregate
public approval score in month t.

Independent variables:
- Inflation rate
- Unemployment rate
WinBugs Implementation
model {
mu[1] <- b[1] + b[2]*infl[1] + b[3]*unemp[1]
app[1] ~ dnorm(mu[1],tau.u)

for (t in 2:96){               ## loop over obs 2 to T
mu[t] <- b[1]*(1-rho)
+ b[2]*(infl[t] - rho*infl[t-1])
+ b[3]*(unemp[t] - rho*unemp[t-1])
+ rho*app[t-1];
app[t] ~ dnorm(mu[t], tau.e);
}

sigma.e <- 1/tau.e        ## convert precision to variance for transformed equation
sigma.u <- sigma.e/(1+pow(rho,2)) ## regression error variance for original equation
tau.u <- 1/sigma.u         ## see definition of Var(et) from above

## priors
rho ~ dunif(-1,1);       ## uniform prior on stationary interval
b[1:3] ~ dmnorm(b0[], B0[ , ]); ## multivariate Normal prior
tau.e ~ dgamma(.05, .05);       ## vague prior on sigma
}

```
To top