ABSLec17 by wanghonghx


									      Bayesian General Linear Model

1) Review of the General Linear Model

2) Bayesian setup of the GLM

3) WinBugs Implementation of the GLM
  -    Logit, Poisson, etc.
                      Review of GLMS
In general, statistical models contain both systematic and random

For the standard linear model, we assume that Y (the dep. var) is a
   vector of random variables whose components are independently
   distributed with mean 

 represents the systematic component of the model (the expected
   value of Y) and is assumed to be a linear function of explanatory
   variables X and parameters b.

The random part of the model (the unexplainable error terms) are
  assumed to be independent with constant error variance.

In the classical model, we assume that the random component follows
    a normal distribution. Thus, components of Y are independent
    normal variables with constant variance.

Thus, Y =  + e = XB + e
                             Review of GLMs
To segue to the general linear model, note that the classical model has three important

1)   the random component—each observation or component of Y has an independent
     normal distribution with E(Y) =  and constant variance 2.

2) the systematic component—covariates X produce a linear predictor  = XB

3) The link between the random and the systematic components  = .

For the normal linear model, the link states that the linear predictor  = XB is identical to
     the expected value of the random component.

However, more generally, i = g(i), where g( ) is called the link function.

General linear models extend the above setup to the case where:
    1) the random component follows a distribution other than the normal
    2) the link function is a function other than that given above.
             Review of GLMs
Common distributions other than the normal for the
   random component include…

1) Poisson

2) Bernoulli (or binomial)

3) Weibull (for duration models)

4) Multinomial
                          Review of GLMs

The link function relates the linear predictor  to an observation y.

In classical linear models, where the dependent variable is normal, we
    use the identity link. Since the expected value of the linear predictor
    can take any value for the normal distribution, the identity link makes

For a Poisson (count) model,  > 0, so the identity link is less attractive
   because the linear predictor  can be negative while  cannot.
    So, for Poisson models we typically use the log link,  = log(), or its
      inverse  = exp[].
    This function maps the linear predictor which can take any value along the
      real line to a set of plausible expected values.

For a Bernoulli model, 0 <  < 1, so the identity link is unattractive.
    So, for Bernoulli models, we typically use the logit link,  = log(/(1- ))
                 Bayesian Setup of the GLM

The Bayesian setup for the GLM is a very natural extension of the framework
   we have used for regression models.

Step 1. Specify the probability distribution for the dependent variable in your
         i.e. yi ~ N(i, t)

Step 2. Define the linear predictor for your model.
         i.e. b1 + b2x2i

Step 3. Choose the link function that maps from the linear predictor  to a set of
   plausible expected values for .
         i.e. i = b1 + b2x2i

Step 4. Choose priors for all of the parameters in your model.
         i.e. bj ~ N(0, .001) and t ~ G(.1 , .1)
             General Comments on WinBugs
               Implementation for GLMs
1)     In most cases, you will need to specify a set of initial values for the
       regression coefficients for the logit model.

2)     You may need to specify a more reasonable set of priors for the precision
       terms for your parameters than the massively diffuse priors we used for
       the normal linear model.

Failure to follow 1) and 2) may cause WinBugs to bomb. To see why, suppose we chose
       diffuse priors for our regression coefficients and did not specify a set of initial
       values. WinBugs will choose a reasonable set of initial values for us by basically
       sampling from our priors. It is likely that the coefficients will be either very big or
       very small numbers. Consequently, when these numbers are used to sample from
       the linear predictor, the expected value will be a very large or very small number,
       which when substituted into the link function will yield an impossibly large number
       or a number so close to zero that the link function will be undefined.

3)     GLMs will take a longer time to run than the linear model (not iterations of
       the Gibbs sampler necessarily, but cpu time). Tricks to speed
       convergence will therefore become increasingly important because each
       iteration is more costly.
                     Logistic Regression
Suppose that yi ~ Bernoulli(pi),

To ensure that 0 < pi < 1, we use the logit transformation so,
   logit(pi) <- b1 + b2x2i + … bkxki

We assume that bj ~ N(0, .001) for all j

notice—there is no prior distribution for the variance of y because there
   is not a parameter in the Bernoulli distribution for variance

Thus, the joint posterior distribution of the parameters is given by the
   following expression:
p(b1,…,bk|y,x)  p(b1,…,bk)ip(yi | b1 + b2x2i + … bkxki)
 p(b1,…,bk) i piy (1-pi)1-y
 p(b1,…,bk)I logit-1(b1 + b2x2i +…+ bkxki)y
          logit-1(1-(b1 + b2x2i + … + bkxki))1-y
       Application—Voter perceptions of
                party differences
Dependent Variable: whether voters indicate that
  Republicans are more conservative than Democrats on a
  liberal-conservative ideology scale (1=successful
  classification, 0=failure)

Independent Variables:
 - Respondent level data:
   race, gender, education, party identification, strength of
   respondent’s ideology (folded self-placement)
 - Contextual data:
   degree of elite-level polarization, whether there was
   divided government, and whether there was a
   presidential election year
       WinBugs Implementation of Example

model {
 for (i in 1:29022) {
   placement[i] ~ dbern(p[i])
   logit(p[i]) <- b[1] + b[2]*race[i] + b[3]*gender[i] + b[4]*educ[i] + b[5]*pid[i] +
    b[6]*strideol[i] + b[7]*elitepol[i] + b[8]*divgov[i] + b[9]*offyear[i]

 for (j in 1:9) {
   b[j] ~ dnorm(0,.001)
Comment 1—if all independent variables are standardized the model
  converges quickly even without the multivariate normal prior distribution for
  bj, though with 30,000 observations, things still take awhile.]
          Model Interpretation in WinBugs

I’ve never tried this, but in principle the generation of predicted values
    with confidence bands should be quite simple.

Set it up as follows: Let logit(mustar) = the expected value for the linear
   component of your model. Then:
logit(mustar) <- b[1] + b[2]*mean(race[]) + b[3]*mean(gender[])
                  + … b[k]*(mean(educ[]) + stdev(educ[]))

Pred ~ dbern(mustar)

Set sample monitor tools to monitor pred, which generate a sample
  summary of predicted values from the posterior predictive
  distribution corresponding to the stated values of X.
                      Hierarchical Logit

The hierarchical logistic regression model is a very easy
  extension of standard logit.

   yij ~ Bernoulli(pij),
       logit(pij) <- b1j + b2jx2ij + … bkjxkij

   bjk ~ N(Bjk, Tk) for all j,k
     Bjk <- k1 + k2 zj2 + … + km zjm
   qr ~ N(0, .001) for all q,r
   Tk ~ Gamma(.01, .01)
                 Poisson Regression

Suppose that yi ~ Poisson(i)

To ensure that i > 0, we use the log link function,
  log(i) <- b1 + b2x2i + … bkxki

We assume that bj ~ N(0, .001) for all j
              Application—Slave Revolts
Dependent Variable: the number of national slave revolts in year t

Independent Variables:
   nominal slave prices
   whether the U.S. was at war
   whether it was a presidential election year
   lagged revolts
   interaction between war and election, war and lagged revolts, and
   election and lagged revolts

Key variables of interest were the interaction terms with elections
  because the theory predicted that elections would communicate
  weakness of the police state to the slaves.
               WinBugs Implementation

model {
for (i in 1:56) {
 natrvlt[i] ~ dpois(lambda[i])
 log(lambda[i]) <- b[1] + b[2]*nlslvpr[i] +
   b[3]*War[i] + b[4]*Election[i] + b[5]*natrvltL1[i] +
   b[6]*Election[i]*natrvltL1[i] + b[7]*War[i]*Election[i] +

for (j in 1:8) { b[j] ~ dnorm(0,.1) }

                       Multinomial Logit

Suppose that yi ~ multinomial([pi1,…,pim], 1)
Let xij represent the set of factors that might induce actor i to choose
   outcome j.
Then the probability that individual i chooses outcome j can be
   pij = exp(b1j + b2jxij)/ k=1 to m exp(b1k + b2kxik)
Let log(phiij) = b1j + b2jxij such that:
   pij = phiij / (k=1 to m phiik)

To implement the multinomial logit, it is necessary to fix one of latent
   utilities in the model (the log(phi’s)) to sum value such that the
   predictors then influence how the attributes of one variables
   influence the change in probability of choosing one outcome versus
   the baseline.
Thus, for the fixed outcome j, we let b1j = b2j = 0 and assume diffuse
   normal priors for everything else.
              WinBugs Implementation of MNL
model {
  for (i in 1:numberofobservations) {
    y[i, 1:numberofchoices] ~ dmulti( p[i, 1:numberofchoices] , 1 )
    for (j in 1: numberofchoices) {
        p[i,j] <- phi[i,j] / sum(phi[i,])
        log(phi[i,j]) <- b[j,1] + b[j,2]*x2[i]
# priors - fix the values for the first outcome variable to be zero to establish a baseline
b[1,1] <- 0
b[1,2] <- 0
# all other parameters influence the probability of an outcome relative to the baseline
  for (j in 2:numberofchoices) {
      b[j,1] ~ dnorm(0, .01)
      b[j,2] ~ dnorm(0, .01)
# Data--the trick is to create a matrix for the dependent variable here that is consistent with the number
    of observations and the number of choices.
Suppose the number of observations = 10 and the number of choices = ). Then:

list( y = structure(.Data = c(1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1),.Dim=c(10,3)),
      x2=c(1,1.1,1.2,2,2.1,2.2,3,3.1,3.2,3.3), numberofobservations=10, numberofchoices=3)
                     Autoregressive Model

Suppose that you have a model with serial correlation. That is, you
  have a time-series model such that:
  Yt = b1 + b2x2t + … bkxkt + et,
  where et = et-1 + vt and 0  ||  1 and vt ~ N(0 , 2v)

It can be shown that:
    Var(et) = 2v / (1- 2)     and       = cov(et , et-1) /2e

Recall that serial correlation decreases our model’s efficiency, but may
  mask that loss because standard errors are biased (positive serial
  correlation decreases standard errors, negative serial correlation
  increases standard errors).
     The standard correction for serial correlation

By Definition et = et-1 + vt
   where et-1 = Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1

To correct for serial correlation, we leverage this expression for et into
   our original expression for Yt such that for all t > 1:
   Yt = b1 + b2x2t + … bkxkt + et-1 + vt
      = b1 + b2x2t + … bkxkt + (Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1) + vt

Which can be rewritten for t > 1:
   Yt - Yt-1 = b1 (1-) + b2(x2t - x2t-1) + … bk(xkt - xkt-1) + vt

This transformed equation has a normally distributed error term with
   mean zero and constant variance.
  Application—Reagan’s Presidential Approval
     (Example brazenly lifted from Jackman’s MCMC page)

Dependent variable: Reagan’s aggregate
 public approval score in month t.

Independent variables:
  - Inflation rate
  - Unemployment rate
                     WinBugs Implementation
model {
   mu[1] <- b[1] + b[2]*infl[1] + b[3]*unemp[1]
   app[1] ~ dnorm(mu[1],tau.u)

    for (t in 2:96){               ## loop over obs 2 to T
      mu[t] <- b[1]*(1-rho)
      + b[2]*(infl[t] - rho*infl[t-1])
      + b[3]*(unemp[t] - rho*unemp[t-1])
      + rho*app[t-1];
      app[t] ~ dnorm(mu[t], tau.e);

    sigma.e <- 1/tau.e        ## convert precision to variance for transformed equation
    sigma.u <- sigma.e/(1+pow(rho,2)) ## regression error variance for original equation
    tau.u <- 1/sigma.u         ## see definition of Var(et) from above

    ## priors
    rho ~ dunif(-1,1);       ## uniform prior on stationary interval
    b[1:3] ~ dmnorm(b0[], B0[ , ]); ## multivariate Normal prior
    tau.e ~ dgamma(.05, .05);       ## vague prior on sigma

To top