VIEWS: 6 PAGES: 21 POSTED ON: 2/29/2012
Bayesian General Linear Model 1) Review of the General Linear Model 2) Bayesian setup of the GLM 3) WinBugs Implementation of the GLM - Logit, Poisson, etc. Review of GLMS In general, statistical models contain both systematic and random components. For the standard linear model, we assume that Y (the dep. var) is a vector of random variables whose components are independently distributed with mean represents the systematic component of the model (the expected value of Y) and is assumed to be a linear function of explanatory variables X and parameters b. The random part of the model (the unexplainable error terms) are assumed to be independent with constant error variance. In the classical model, we assume that the random component follows a normal distribution. Thus, components of Y are independent normal variables with constant variance. Thus, Y = + e = XB + e Review of GLMs To segue to the general linear model, note that the classical model has three important components: 1) the random component—each observation or component of Y has an independent normal distribution with E(Y) = and constant variance 2. 2) the systematic component—covariates X produce a linear predictor = XB 3) The link between the random and the systematic components = . For the normal linear model, the link states that the linear predictor = XB is identical to the expected value of the random component. However, more generally, i = g(i), where g( ) is called the link function. General linear models extend the above setup to the case where: 1) the random component follows a distribution other than the normal 2) the link function is a function other than that given above. Review of GLMs Common distributions other than the normal for the random component include… 1) Poisson 2) Bernoulli (or binomial) 3) Weibull (for duration models) 4) Multinomial Review of GLMs The link function relates the linear predictor to an observation y. In classical linear models, where the dependent variable is normal, we use the identity link. Since the expected value of the linear predictor can take any value for the normal distribution, the identity link makes sense. For a Poisson (count) model, > 0, so the identity link is less attractive because the linear predictor can be negative while cannot. So, for Poisson models we typically use the log link, = log(), or its inverse = exp[]. This function maps the linear predictor which can take any value along the real line to a set of plausible expected values. For a Bernoulli model, 0 < < 1, so the identity link is unattractive. So, for Bernoulli models, we typically use the logit link, = log(/(1- )) Bayesian Setup of the GLM The Bayesian setup for the GLM is a very natural extension of the framework we have used for regression models. Step 1. Specify the probability distribution for the dependent variable in your model. i.e. yi ~ N(i, t) Step 2. Define the linear predictor for your model. i.e. b1 + b2x2i Step 3. Choose the link function that maps from the linear predictor to a set of plausible expected values for . i.e. i = b1 + b2x2i Step 4. Choose priors for all of the parameters in your model. i.e. bj ~ N(0, .001) and t ~ G(.1 , .1) General Comments on WinBugs Implementation for GLMs 1) In most cases, you will need to specify a set of initial values for the regression coefficients for the logit model. 2) You may need to specify a more reasonable set of priors for the precision terms for your parameters than the massively diffuse priors we used for the normal linear model. Failure to follow 1) and 2) may cause WinBugs to bomb. To see why, suppose we chose diffuse priors for our regression coefficients and did not specify a set of initial values. WinBugs will choose a reasonable set of initial values for us by basically sampling from our priors. It is likely that the coefficients will be either very big or very small numbers. Consequently, when these numbers are used to sample from the linear predictor, the expected value will be a very large or very small number, which when substituted into the link function will yield an impossibly large number or a number so close to zero that the link function will be undefined. 3) GLMs will take a longer time to run than the linear model (not iterations of the Gibbs sampler necessarily, but cpu time). Tricks to speed convergence will therefore become increasingly important because each iteration is more costly. Logistic Regression Suppose that yi ~ Bernoulli(pi), To ensure that 0 < pi < 1, we use the logit transformation so, logit(pi) <- b1 + b2x2i + … bkxki We assume that bj ~ N(0, .001) for all j notice—there is no prior distribution for the variance of y because there is not a parameter in the Bernoulli distribution for variance Thus, the joint posterior distribution of the parameters is given by the following expression: p(b1,…,bk|y,x) p(b1,…,bk)ip(yi | b1 + b2x2i + … bkxki) p(b1,…,bk) i piy (1-pi)1-y p(b1,…,bk)I logit-1(b1 + b2x2i +…+ bkxki)y logit-1(1-(b1 + b2x2i + … + bkxki))1-y Application—Voter perceptions of party differences Dependent Variable: whether voters indicate that Republicans are more conservative than Democrats on a liberal-conservative ideology scale (1=successful classification, 0=failure) Independent Variables: - Respondent level data: race, gender, education, party identification, strength of respondent’s ideology (folded self-placement) - Contextual data: degree of elite-level polarization, whether there was divided government, and whether there was a presidential election year WinBugs Implementation of Example model { for (i in 1:29022) { placement[i] ~ dbern(p[i]) logit(p[i]) <- b[1] + b[2]*race[i] + b[3]*gender[i] + b[4]*educ[i] + b[5]*pid[i] + b[6]*strideol[i] + b[7]*elitepol[i] + b[8]*divgov[i] + b[9]*offyear[i] } for (j in 1:9) { b[j] ~ dnorm(0,.001) } } Comment 1—if all independent variables are standardized the model converges quickly even without the multivariate normal prior distribution for bj, though with 30,000 observations, things still take awhile.] Model Interpretation in WinBugs I’ve never tried this, but in principle the generation of predicted values with confidence bands should be quite simple. Set it up as follows: Let logit(mustar) = the expected value for the linear component of your model. Then: logit(mustar) <- b[1] + b[2]*mean(race[]) + b[3]*mean(gender[]) + … b[k]*(mean(educ[]) + stdev(educ[])) Pred ~ dbern(mustar) Set sample monitor tools to monitor pred, which generate a sample summary of predicted values from the posterior predictive distribution corresponding to the stated values of X. Hierarchical Logit The hierarchical logistic regression model is a very easy extension of standard logit. Likelihood yij ~ Bernoulli(pij), logit(pij) <- b1j + b2jx2ij + … bkjxkij Priors bjk ~ N(Bjk, Tk) for all j,k Bjk <- k1 + k2 zj2 + … + km zjm qr ~ N(0, .001) for all q,r Tk ~ Gamma(.01, .01) Poisson Regression Suppose that yi ~ Poisson(i) To ensure that i > 0, we use the log link function, log(i) <- b1 + b2x2i + … bkxki We assume that bj ~ N(0, .001) for all j Application—Slave Revolts Dependent Variable: the number of national slave revolts in year t (1805-1860). Independent Variables: nominal slave prices whether the U.S. was at war whether it was a presidential election year lagged revolts interaction between war and election, war and lagged revolts, and election and lagged revolts Key variables of interest were the interaction terms with elections because the theory predicted that elections would communicate weakness of the police state to the slaves. WinBugs Implementation model { for (i in 1:56) { natrvlt[i] ~ dpois(lambda[i]) log(lambda[i]) <- b[1] + b[2]*nlslvpr[i] + b[3]*War[i] + b[4]*Election[i] + b[5]*natrvltL1[i] + b[6]*Election[i]*natrvltL1[i] + b[7]*War[i]*Election[i] + b[8]*War[i]*natrvltL1[i] } for (j in 1:8) { b[j] ~ dnorm(0,.1) } } Multinomial Logit Suppose that yi ~ multinomial([pi1,…,pim], 1) Let xij represent the set of factors that might induce actor i to choose outcome j. Then the probability that individual i chooses outcome j can be represented: pij = exp(b1j + b2jxij)/ k=1 to m exp(b1k + b2kxik) Let log(phiij) = b1j + b2jxij such that: pij = phiij / (k=1 to m phiik) To implement the multinomial logit, it is necessary to fix one of latent utilities in the model (the log(phi’s)) to sum value such that the predictors then influence how the attributes of one variables influence the change in probability of choosing one outcome versus the baseline. Thus, for the fixed outcome j, we let b1j = b2j = 0 and assume diffuse normal priors for everything else. WinBugs Implementation of MNL model { for (i in 1:numberofobservations) { y[i, 1:numberofchoices] ~ dmulti( p[i, 1:numberofchoices] , 1 ) for (j in 1: numberofchoices) { p[i,j] <- phi[i,j] / sum(phi[i,]) log(phi[i,j]) <- b[j,1] + b[j,2]*x2[i] } } # priors - fix the values for the first outcome variable to be zero to establish a baseline b[1,1] <- 0 b[1,2] <- 0 # all other parameters influence the probability of an outcome relative to the baseline for (j in 2:numberofchoices) { b[j,1] ~ dnorm(0, .01) b[j,2] ~ dnorm(0, .01) } } # Data--the trick is to create a matrix for the dependent variable here that is consistent with the number of observations and the number of choices. Suppose the number of observations = 10 and the number of choices = ). Then: list( y = structure(.Data = c(1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1),.Dim=c(10,3)), x2=c(1,1.1,1.2,2,2.1,2.2,3,3.1,3.2,3.3), numberofobservations=10, numberofchoices=3) Autoregressive Model Suppose that you have a model with serial correlation. That is, you have a time-series model such that: Yt = b1 + b2x2t + … bkxkt + et, where et = et-1 + vt and 0 || 1 and vt ~ N(0 , 2v) It can be shown that: Var(et) = 2v / (1- 2) and = cov(et , et-1) /2e Recall that serial correlation decreases our model’s efficiency, but may mask that loss because standard errors are biased (positive serial correlation decreases standard errors, negative serial correlation increases standard errors). The standard correction for serial correlation By Definition et = et-1 + vt where et-1 = Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1 To correct for serial correlation, we leverage this expression for et into our original expression for Yt such that for all t > 1: Yt = b1 + b2x2t + … bkxkt + et-1 + vt = b1 + b2x2t + … bkxkt + (Yt-1 - b1 + b2x2t-1 + … bkxkt-1 + et-1) + vt Which can be rewritten for t > 1: Yt - Yt-1 = b1 (1-) + b2(x2t - x2t-1) + … bk(xkt - xkt-1) + vt This transformed equation has a normally distributed error term with mean zero and constant variance. Application—Reagan’s Presidential Approval (Example brazenly lifted from Jackman’s MCMC page) Dependent variable: Reagan’s aggregate public approval score in month t. Independent variables: - Inflation rate - Unemployment rate WinBugs Implementation model { mu[1] <- b[1] + b[2]*infl[1] + b[3]*unemp[1] app[1] ~ dnorm(mu[1],tau.u) for (t in 2:96){ ## loop over obs 2 to T mu[t] <- b[1]*(1-rho) + b[2]*(infl[t] - rho*infl[t-1]) + b[3]*(unemp[t] - rho*unemp[t-1]) + rho*app[t-1]; app[t] ~ dnorm(mu[t], tau.e); } sigma.e <- 1/tau.e ## convert precision to variance for transformed equation sigma.u <- sigma.e/(1+pow(rho,2)) ## regression error variance for original equation tau.u <- 1/sigma.u ## see definition of Var(et) from above ## priors rho ~ dunif(-1,1); ## uniform prior on stationary interval b[1:3] ~ dmnorm(b0[], B0[ , ]); ## multivariate Normal prior tau.e ~ dgamma(.05, .05); ## vague prior on sigma }