VIEWS: 12 PAGES: 38 POSTED ON: 12/15/2011 Public Domain
Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Econometric Analysis of Panel Data 6. Maximum Likelihood Estimation of the Random Effects Linear Model The Random Effects Model The random effects model y it =x β+c i +εit , observation for person i at time t it y i =X iβ+c ii+ε i , Ti observations in group i =X iβ+c i +ε i , note c i (c i , c i ,...,c i ) y =Xβ+c +ε , N Ti observations in the sample i=1 2 c=(c1 , c ,...c ), N Ti by 1 vector N i=1 ci is uncorrelated with xit for all t; E[ci |Xi] = 0 E[εit|Xi,ci]=0 Error Components Model Generalized Regression Model y it x b+εit +ui it E[εit | X i ] 0 2 u 2 u2 u 2 u2 2 u 2 u 2 Var[εi +uii ] 2 2 E[εit | X i ] σ E[ui | X i ] 0 2 u u u 2 2 2 2 E[ui2 | X i ] σ u y i =X iβ+ε i +uii for Ti observations Notation 2 u 2 u2 u 2 u 2 u u 2 2 2 Var[ε i +uii ] 2 u u 2 u 2 2 = 2I Ti u ii Ti Ti 2 = 2I Ti u ii 2 = Ωi Ω1 0 0 0 Ω2 0 (Note these differ only Var[w | X ] in the dimension Ti ) 0 0 ΩN Maximum Likelihood Assuming normality of it and ui. Treat T joint observations on [(i1 , i2 ,...iTi ),ui ] as one Ti variate observation. The mean vector of ε i uii is zero and the covariance matrix is Ωi=2I uii. 2 The joint density for ε i ( y i - X iβ) is f(ε i ) (2) Ti / 2 | Ωi |1 / 2 exp 1 ( y i - X iβ)Ωi-1 ( y i - X iβ) 2 logL= N logL i where i=1 -1 logL i (β, 2 ,u ) = 2 Ti log 2 log | Ωi | ( y i - X iβ)Ωi-1 ( y i - X iβ) 2 -1 = Ti log 2 log | Ωi | εΩi-1ε i 2 i Panel Data Algebra (3) 1 2 Ωi-1 2 I Ti 2 2 ii Tiu So, 1 2 εΩ ε i 2 i -1 i εε i 2 i 2 εiiε i i Tiu 1 2 (Ti i )2 2 εε i 2 i 2 Tiu Panel Data Algebra (3, cont.) Ωi =2I uii =2 [I 2ii]=2 A 2 |Ωi|=(2 ) Ti t 1 t , = a characteristic root of A T i Roots are (real since A is symmetric) solutions to Ac = c Ac = c = c + 2iic or 2i(ic) = ( - 1)c Any vector whose elements sum to zero (ic=0) is a characteristic vector that corresponds to root = 1. There are Ti -1 such vectors, so Ti - 1 of the roots are 1. Suppose ic 0. Premultiply by i to find 2ii(ic) = ( - 1)ic = Ti2 (ic)=( - 1)ic. Since ic 0, divide by it to obtain the remaining root =1+Ti2 . Therefore, |Ωi|=(2 ) Ti t 1 t (2 ) Ti (1 Ti2 ) i T Panel Data Algebra (3, conc.) -1 logL i Ti log 2 log | Ωi | εΩi-1ε i 2 i -1 1 2 (Ti i )2 Ti log 2 Ti log log(1 Ti ) 2 εε i 2 2 2 i 2 2 Tiu logL N 1 logL i i -1 1 N 2 (Ti i )2 [(log 2 log )i1 Ti + i1 log(1 Ti )] 2 i1 εε i 2 2 N N 2 i 2 2 2 Tiu 2 2 2 2 (Ti i )2 2 (Ti i )2 (Ti i )2 since / , u 2 2 Tiu 2 2 2 Ti 1 2 Ti -T 1 (Ti i )2 logL i i [(log 2 log 2 ) +log(1 Ti2 )] 2 εε i i 2 2 1 2 Ti Maximizing the Likelihood Difficult: “Brute force” + some elegant theoretical results: See Baltagi, pp. 20-21. (Back and forth from GLS to ε2 and u2.) Somewhat less difficult and more practical: At any iteration, given estimates of ε2 and u2 the estimator of is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40. 2 0. Begin iterations with, say, FGLS estimates of β, 2 , u . 2 ˆ 2 1. Given 2,r and u,r , compute βr+1 by FGLS (2,r , u,r ) ˆ ˆ ˆ ˆ ˆ 2 N ˆ,r 1MDˆi,r 1 i=1i i 2. Given βr+1 compute ˆ ,r+1 = N (Ti 1) i=1 ˆ 2 N ˆi.r 1 2i=1 2 3. Given βr+1 , ˆ compute ,r+1 ˆ = u,r+1 N ˆ ˆ 4. Return to step 1 and repeat until βr+1 -βr = 0. Direct Maximization of LogL Simpler : Take advantage of the invariance of maximum likelihood estimators to transformations of the parameters. 2 Let =1/2 , =u / 2 , R i Ti 1, Qi / R i , logL i (1 / 2)[(εε i Qi (Ti i )2 ) logR i Ti log Ti log2] i Can be maximized using ordinary optimization methods (not Newton, as suggested by Hsiao). Treat as a standard nonlinear optimization problem. Solve with iterative, gradient methods. Application – ML vs. FGLS Maximum Simulated Likelihood Assuming it and ui are normally distributed. Write ui = u v i where v i ~ N[0,1]. Then y it = x β + u v i it . If v i were it observed data, all observations would be independent, and log f(y it | x it , v i ) 1 / 2[log 2 log 2 (y it - x β - u v i )2 / 2 ] it Let 2 1 / 2 The log of the joint density for Ti observations with common v i is logL i (β, u , 2 | v i ) tT1 (1 / 2)[log 2 log 2 2 (y it - x β - u v i ) 2 ] i it The conditional log likelihood for the sample is then i logL(β, u , 2 | v) N 1 tT1 (1 / 2)[log 2 log 2 2 (y it - x β - u v i ) 2 ] i it Likelihood Function for Individual i The conditional log likelihood for the sample is then i i logL(β, u , 2 | v) N 1 tT1 (1 / 2)[log 2 log 2 2 (y it - x β - u v i ) 2 ] it The unconditional log likelihood is obtained by integrating v i out of L i (β, u , 2 | v i ); 2 exp[(2 / 2)(y it - x β - u v i )2 ] 2 Ti L i (β, u , ) t 1 it (v i )dv i E vi L i (β, u , 2 | v i ) 2 The integral usually does not have a closed form. (For the normal distribution above, actually, it does. We used that earlier. We ignore that for now.) Log Likelihood Function The full log likelihood function that needs to be maximized is logL i1 logL i (β, u , 2 ) N Ti 2 exp[(2 / 2)(y it - x β - u v i )2 ] N = i1 log t 1 it (v i )dv i 2 N i1 logE vi L i (β, u , 2 | v i ) This is the function to be maximized to obtain the MLE of [β, , u ] Computing the Expected LogL How to compute the integral: First note, (v i ) exp( v i2 / 2) / 2 2 exp[(2 / 2)(y it - x β - u v i )2 ] Ti t 1 it (v i )dv i 2 E vi L i (β, u , 2 | v i ) (1) Numerical (Gauss-Hermite) quadrature for integrals of this form is remarkably accurate; H 2 e v g(v)dv h 1 w hg(ah ) Example: Hermite Quadrature Nodes and Weights, H=5 Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018 Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532 Applications usually use many more points, up to 96 and Much more accurate (more digits) representations. Quadrature A change of variable is needed to get it into the right form: Each term then becomes 1 2 exp[(2 / 2)(y it - x β - uah )2 ] L i,Q h1 w h H Ti t 1 it 2 and the problem is solved by maximizing with respect to β, 2 , u logL Q i1 logL i,Q N (Maximization be continued later in the semester.) Gauss-Hermite Quadrature 2 exp[(2 / 2)(y it - x β - u v i )2 ] Ti t 1 it (v i )dv i 2 (v i ) exp( v i2 / 2) / 2 Make a change of variable to ai v i / 2 ,v i= 2 ai , dv i = 2 dai 1 1 2 exp[(2 / 2)(y it - x β - u 2ai )2 ] 2 Ti it exp(a )i t 1 2 dai 2 2 2 1 2 2 exp[(2 / 2)(y it - x β - u 2ai )2 ] 2 Ti it exp(a )i t 1 ] dai 2 2 2 1 2 exp(ai2 ) tT12 exp[(2 / 2)(y it - x β - u 2ai ) 2 ] dai i it 1 1 exp(ai2 )g ai dai H1w hg(ah ) h 2 2 Simulation The unconditional log likelihood is an expected value; logL i (β, u , 2 ) Ti 2 exp[(2 / 2)(y it - x β - u v i )2 ] = log t 1 it (v i )dv i 2 logE vi L i (β, u , 2 | v i ) = E v g(v i ) An expected value can be 'estimated' by sampling observations and averaging them 1 R Ti 2 exp[(2 / 2)(y it - x β - u v ir )2 ] E v g(v i ) r 1 t 1 ˆ it R 2 The unconditional log likelihood function is then 1 R Ti 2 exp[(2 / 2)(y it - x β - u v ir )2 ] i1 log R r 1 t 1 N it 2 This is a function of (β, 2 , u| y i , X i , v i,1 ,..., v i,R ),i 1,...,N The random draws on v i,r become part of the data, and the function is maximized with respect to the unknown parameters. Convergence Results Target is expected log likelihood: logE vi [L(β,2|v i )] Simulation estimator based on random sampling from population of v i LogL S (β,2 ) 1 R Ti 2 exp[(2 / 2)(y it - x β - u v ir )2 ] log i1 r 1 t 1 N it R 2 The essential result is plim(R )LogL S (β,2 ) logE vi [L(β,2|v i )] Conditions: (1) General regularity and smoothness of the log likelihood (2) R increases faster than N. ('Intelligent draws' - e.g. Halton sequences makes this somewhat ambiguous.) Result: Maximizer of LogL S (β,2 ) converges to the maximizer of logE vi [L(β,2|v i )]. MSL vs. ML - Application Two Level Panel Data Nested by construction Unbalanced panels No real obstacle to estimation Some inconvenient algebra. In 2 step FGLS of the RE, need “1/T” to solve for an estimate of σu2. What to use? Q 1/ T (1/T)=(1/N)N 1 (1 / Ti ) (early NLOGIT i QH=[N (1/Ti )]1/N i=1 (Stata) (TSP, current NLOGIT, do not use this.) Balanced Nested Panel Data Zi,j,k,t = test score for student t, teacher k, school j, district i L = 2 school districts, i = 1,…,L Mi = 3 schools in each district, j = 1,…,Mi Nij = 4 teachers in each school, k = 1,…,Nij Tijk = 20 students in each class, t = 1,…,Tijk Antweiler, W., “Nested Random Effects Estimation in Unbalanced Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313. Nested Effects Model y ijkt x uijk v ij wi ijkt ijkt Strict exogeneity, all parts uncorrelated. Normality assumption added later 2 Var[uijk v ij wi ijkt ]=u 2 2 2 v w Overall covariance matrix Ω is block diagonal over i, each diagonal block is block diagonal over j, each of these, in turn, is block diagonal over k, and each lowest level block has the form of Ω we saw earlier. GLS with Nested Effects Define 2 2 2 1 Tu 2 2 2 Tu 2 2 NT2 Tu 2 2 v 2 1 NT2 v 2 2 MNT2 NT2 Tu 2 2 MNT2 v w v 2 w GLS is equivalent to OLS regression of y ijkt y ijkt 1 y ijk . y ij .. y i ... 1 1 2 2 3 on the same transformation of x ijkt . FGLS estimates are obtained by "three group-wise between estimators and the within estimator for the innermost group." Unbalanced Nested Data With unbalanced panels, all the preceding results fall apart. GLS, FGLS, even fixed effects become analytically intractable. The log likelihood is very tractable Note a collision of practicality with nonrobustness. (Normality must be assumed.) Log Likelihood (1) 2 u 2 2 Define : u 2 , v 2 , w w . v 2 Nij Tijk Construct: ijk 1 Tijk u , ij k 1 ijk Mi ij ij 1 ij v , i j 1 ij i 1 w i T Sums of squares: A ijk t 1e ijkt , e ijkt y ijkt x β ijk 2 ijkt Tijk Nij B ijk Mi B ij B ijk e , B ij t 1 ijkt k 1 , Bi j 1 ijk ij Log Likelihood (2) H total number of observations -1 logL= [Hlog(22 ) L1 { i 2 log i Mi1 { j N log ij k ij 1 { 2 2 u Bijk A ijk v Bij w Bi2 log ijk 2 2 }- 2 }- 2 }] ijk ij i (For 3 levels instead of 4, set L = 1 and w = 0.) Maximizing Log L Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program. Numerical derivatives: Let δ be the full vector of K+4 parameters. Let r perturbation vector, with =max(0 ,1 | r |) in the rth position and zero in the other K+3 positions. logL logL(δ r ) logL(δ r ) r 2 Asymptotic Covariance Matrix "Even with an analytic gradient, however, the Hessian matrix, Ψ is typically obtained through numeric approximation methods." Read "the second derivatives are too complicated to derive, much less program." Also, since logL is not a sum of terms, the BHHH estimator is not useable. Numerical second derivatives were used. An Appropriate Asymptotic Covariance Matrix The expected Hessian is block diagonal. We can isolate β. 2 logL 1 N Tijk - 2 L1Mi1k ij 1 t 1 x ijkt x i j ijkt ββ W L Mi Nij 1 2 i1 j1k 1 ijk Tijk t 1 x ijkt Tijk t 1 x ijkt v L Mi 1 Nij 1 Nij 1 2 i1 j1 k 1 ij ijk Tijk t 1 x ijkt k 1 ijk Tijk t 1 x ijkt u L Mi 1 Nij 1 M 1 Nij 1 2 i1 j1 k 1 ij ijk Tijk t 1 x ijkt ji1 k 1 ij ijk Tijk t 1 x ijkt The inverse of this, evaluated at the MLEs provides the appropriate ˆ estimated asymptotic covariance matrix for β. Standard errors for the variance estimators are not needed. Some Observations Assuming the wrong (e.g., nonnested) error structure Still consistent – GLS with the wrong weights Standard errors (apparently) biased downward (Moulton bias) Adding “time” effects or other nonnested effects is “very challenging.” Perhaps do with “fixed” effects (dummy variables). An Application Y1jkt = log of atmospheric sulfur dioxide concentration at observation station k at time t, in country i. H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced Three levels, not 4 as in article. Xjkt =1,log(GDP/km2),log(K/L),log(Income), Suburban, Rural,Communist,log(Oil price), average temperature, time trend. Estimates Dimension Random Effects Nested Effects x1 . . . -10.787 (12.03) -7.103 (5.613) x2 C S T 0.445 (7.921) 0.202 (2.531) x3 C . T 0.255 (1.999) 0.371 (2.345) x4 C . T -0.714 (5.005) -0.477 (2.620) x5 C S T -0.627 (3.685) -0.720 (4.531) x6 C S T -0.834 (2.181) -1.061 (3.439) x7 C . . 0.471 (2.241) 0.613 (1.443) x8 . . T -0.831 (2.267) -0.089 (2.410) x9 C S T -0.045 (4.299) -0.044 (3.719) x10 . . T -0.043 (1.666) -0.046 (10.927) 2 0.330 0.329 u 1.807 1.017 v 1.347 logL -2645.4 -2606.0 (t ratios in parentheses) Rotating Panel-1 The structure of the sample and selection of individuals in a rotating sampling design are as follows: Let all individuals in the population be numbered consecutively. The sample in period 1 consists of N, individuals. In period 2, a fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new individuals from the population. In period 3 another fraction of the sample in the period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The procedure of dropping met-1 individuals selected in period t - 1 and replacing them by mit individuals from the population in period t is called rotating sampling. In this framework total number of observations and individuals observed are ΣtNt and N1 + Σt=2 to Tmit respectively. Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30, 1998, pp. 919-930 Rotating Panel-2 The outcome of the rotating sample for farms producing dairy products is given in Table 1. Each of the annual sample is composed of four parts or subsamples. For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first three parts (79, 62, and 98) are those not replaced during the transition from 1979 to 1980. The last subsample contains 74 newly included farms from the population. At the same time 85 farms are excluded from the sample in 1979. The difference between the excluded part (85) and the included part (74) corresponds to the changes in the rotating sample size between these two periods, i.e. 313-324 = -11. This difference includes only the part of the sample where each farm is observed consecutively for four years, Nrot. The difference in the non-rotating part, N„„„, is due to those farms which are not observed consecutively. The proportion of farms not observed consecutively, Nnon in the total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per cent. Rotating Panels-3 Simply an unbalanced panel Treat with the familiar techniques Accounting is complicated Time effects may be complicated. Biorn and Jansen (Scand. J. E., 1983) households cohort 1 has T = 1976,1977 while cohort 2 has T=1977,1978. But,… “Time in sample bias…” may require special treatment. Mexican labor survey has 3 periods rotation. Some families in 1 or 2 or 3 periods. Pseudo Panels T different cross sections. y i(t),t x ui(t) i(t),t , i(t)=1,...,N(t); t=1,...,T i(t),t T These are t=1N(t) independent observations. Define C cohorts - e.g., those born 1950-1955. y c,t x uc,t c,t , c=1,...,C; t=1,...,T c,t Cohort sizes are Nc (t). Assume large. Then uc,t uc for each cohort. Creates a fixed effects model: y c,t x uc c,t , c=1,...,C; t=1,...,T. c,t (See Baltagi 10.3 for issues relating to measurement error.)