Econometric Analysis of Panel Data by 6heedn

VIEWS: 12 PAGES: 38

									Econometric Analysis of Panel Data
        William Greene
        Department of Economics
        Stern School of Business
Econometric Analysis of Panel Data



6. Maximum Likelihood Estimation of
  the Random Effects Linear Model
            The Random Effects Model
   The random effects model
    y it =x  β+c i +εit , observation for person i at time t
            it

    y i =X iβ+c ii+ε i , Ti observations in group i
        =X iβ+c i +ε i , note c i  (c i , c i ,...,c i )
    y =Xβ+c +ε , N Ti observations in the sample
                  i=1

               2
          c=(c1 , c ,...c ), N Ti by 1 vector
                          N      i=1


   ci is uncorrelated with xit for all t;
         E[ci |Xi] = 0
         E[εit|Xi,ci]=0
             Error Components Model
Generalized Regression Model

y it  x  b+εit +ui
         it

E[εit | X i ]  0                       2   u
                                                 2
                                                       u2
                                                                  u
                                                                   2
                                                                        
                                                                       
                                            u2
                                                    2   u
                                                            2
                                                                  u
                                                                   2
                       Var[εi +uii ]                                  
   2              2
E[εit | X i ]  σ 
                                                                       
E[ui | X i ]  0                                                     2
                                        u            u         u 
                                              2          2       2
                  2
E[ui2 | X i ]  σ u                                                    
y i =X iβ+ε i +uii for Ti observations
                         Notation

                  2   u
                           2
                                 u2
                                                  u
                                                   2
                                                          
                                                         
                  u         2   u           u
                        2             2            2
Var[ε i +uii ]                                           
                                                         
                                                       2
                  u            u             2   u 
                        2          2
                                                         
              =  2I Ti   u ii Ti  Ti
                             2


              =  2I Ti   u ii
                             2


              = Ωi
              Ω1       0            0  
              0        Ω2           0   (Note these differ only
Var[w | X ]                           
                                        in the dimension Ti )
                                       
              0
                       0            ΩN 
                                        
                 Maximum Likelihood
Assuming normality of it and ui.
Treat T joint observations on [(i1 , i2 ,...iTi ),ui ] as one Ti
variate observation. The mean vector of ε i  uii is zero
and the covariance matrix is Ωi=2I  uii.
                                 
                                       2


The joint density for ε i  ( y i - X iβ) is
f(ε i )  (2)  Ti / 2 | Ωi |1 / 2 exp   1 ( y i - X iβ)Ωi-1 ( y i - X iβ) 
                                          2                                    
logL= N logL i where
           i=1

                       -1
logL i (β, 2 ,u ) =
                2
                           Ti log 2  log | Ωi | ( y i - X iβ)Ωi-1 ( y i - X iβ) 
                        2                                                           
            


                       -1
                     =     Ti log 2  log | Ωi |  εΩi-1ε i 
                       2                                       
                                                       i
Panel Data Algebra (3)

         1            2        
Ωi-1    2   I Ti  2
                         
                            2
                              ii
                   Tiu 
So,
          1                 2              
εΩ ε i  2
 i
       -1
       i          εε i  2
                    i
                               
                                  2
                                    εiiε i 
                                     i
                         Tiu          
              1          2 (Ti i )2 
             2   εε i  2
                           
                    i                2 
                         Tiu 
Panel Data Algebra (3, cont.)
Ωi =2I  uii =2 [I  2ii]=2 A
     
           2
                                

|Ωi|=(2 ) Ti  t 1  t , = a characteristic root of A
                T i
       

Roots are (real since A is symmetric) solutions to Ac = c
Ac = c = c + 2iic or 2i(ic) = ( - 1)c
Any vector whose elements sum to zero (ic=0)
is a characteristic vector that corresponds to root  = 1.
There are Ti -1 such vectors, so Ti - 1 of the roots are 1.
Suppose ic  0. Premultiply by i to find
2ii(ic) = ( - 1)ic = Ti2 (ic)=( - 1)ic. Since ic  0,
divide by it to obtain the remaining root =1+Ti2 .
Therefore, |Ωi|=(2 ) Ti  t 1  t  (2 ) Ti (1  Ti2 )
                             i T
                                       
           Panel Data Algebra (3, conc.)
       -1
logL i      Ti log 2  log | Ωi |  εΩi-1ε i 
        2                                       
                                        i


        -1                                         1     2 (Ti i )2  
                                                                           
        Ti log 2  Ti log   log(1  Ti )  2 εε i  2
                                   2               2
                                                          i            2 
        2                                                Tiu    
logL  N 1 logL i
         i

 -1                                                  1 N           2 (Ti i )2 
 [(log 2  log  )i1 Ti + i1 log(1  Ti )]  2 i1 εε i  2
                  2   N         N              2
                                                              i                2
 2                                                  2              Tiu 
           2   2   2   2 (Ti i )2    2 (Ti i )2   (Ti i )2
since    /  ,
               u   
                        
                        2         2
                                        
                                                    
                         Tiu       2     2 2
                                          Ti 1  2 Ti
        -T                                        1               (Ti i )2 
logL i  i [(log 2  log 2 ) +log(1  Ti2 )]  2
                                                        εε i 
                                                           i                 
         2                                       2             1  2 Ti 
         Maximizing the Likelihood
   Difficult: “Brute force” + some elegant theoretical results: See
    Baltagi, pp. 20-21. (Back and forth from GLS to ε2 and u2.)
   Somewhat less difficult and more practical: At any iteration, given
    estimates of ε2 and u2 the estimator of  is GLS (of course), so
    we iterate back and forth between these. See Hsiao, pp. 39-40.
                                                                    2
          0. Begin iterations with, say, FGLS estimates of β, 2 , u .
                                                               
                              2            ˆ                     2
          1. Given 2,r and u,r , compute βr+1 by FGLS (2,r , u,r )
                   ˆ       ˆ                            ˆ ˆ

                   ˆ                 2
                                                N ˆ,r 1MDˆi,r 1
                                                 i=1i
                                                           i
                                                             
          2. Given βr+1 compute 
                                ˆ     ,r+1   =
                                                  N (Ti  1)
                                                    i=1


                   ˆ       2                    N ˆi.r 1
                                                2i=1 
                                                       2
          3. Given βr+1 , 
                          ˆ   compute 
                            ,r+1      ˆ     =  u,r+1
                                                   N
                                               ˆ ˆ
          4. Return to step 1 and repeat until βr+1 -βr = 0.
     Direct Maximization of LogL

Simpler : Take advantage of the invariance of maximum
likelihood estimators to transformations of the parameters.
                2
Let =1/2 , =u / 2 , R i  Ti  1, Qi   / R i ,
                    

logL i  (1 / 2)[(εε i  Qi (Ti i )2 )  logR i  Ti log   Ti log2]
                     i

Can be maximized using ordinary optimization methods (not
Newton, as suggested by Hsiao). Treat as a standard nonlinear
optimization problem. Solve with iterative, gradient methods.
Application – ML vs. FGLS
      Maximum Simulated Likelihood
Assuming it and ui are normally distributed. Write ui = u v i
where v i ~ N[0,1]. Then y it = x  β + u v i  it . If v i were
                                  it

observed data, all observations would be independent, and
log f(y it | x it , v i )  1 / 2[log 2  log 2  (y it - x  β - u v i )2 / 2 ]
                                                              it                 

Let 2  1 / 2
             

The log of the joint density for Ti observations with common v i is
logL i (β, u , 2 | v i )   tT1 (1 / 2)[log 2  log 2  2 (y it - x  β - u v i ) 2 ]
                 
                                 i
                                                                          it

The conditional log likelihood for the sample is then
                       i     
logL(β, u , 2 | v)  N 1  tT1 (1 / 2)[log 2  log 2  2 (y it - x  β - u v i ) 2 ]
                                i
                                                                         it                    
    Likelihood Function for Individual i


The conditional log likelihood for the sample is then
                       i
                                i
                                        
logL(β, u , 2 | v)  N 1  tT1 (1 / 2)[log 2  log 2  2 (y it - x  β - u v i ) 2 ]
                                                                         it                               
The unconditional log likelihood is obtained by integrating v i out of L i (β, u , 2 | v i );
                                                                                     

                                       2 exp[(2 / 2)(y it - x  β - u v i )2 ]
                     
            2                    Ti
L i (β, u ,  ) 
                               t 1
                                                                it
                                                                                      (v i )dv i  E vi L i (β, u , 2 | v i )
                                                                                                                       
                     
                                       2
The integral usually does not have a closed form. (For the normal distribution
above, actually, it does. We used that earlier. We ignore that for now.)
       Log Likelihood Function

The full log likelihood function that needs to be maximized is


logL   i1 logL i (β, u , 2 ) 
           N
                              

                            Ti 2 exp[(2 / 2)(y it - x  β - u v i )2 ]              
          
              N
      =       i1
                     log     t 1                     it
                                                                               (v i )dv i 
                         
                         
                            
                                                  2                                      
                                                                                            
         
               N
               i1           
                     logE vi L i (β, u , 2 | v i )
                                                      
This is the function to be maximized to obtain the MLE of [β,  , u ]
 Computing the Expected LogL
  How to compute the integral: First note, (v i )  exp( v i2 / 2) / 2
                       2 exp[(2 / 2)(y it - x  β - u v i )2 ]
    
                 Ti
                t 1
                                                it
                                                                      (v i )dv i
     
                                           2
                  E vi L i (β, u , 2 | v i )
                                      

  (1) Numerical (Gauss-Hermite) quadrature for integrals of this form is
  remarkably accurate;
         
                                   
                                       H
     
                    2
             e  v g(v)dv              h 1
                                              w hg(ah )
        



Example: Hermite Quadrature Nodes and Weights, H=5
Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018
Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532
Applications usually use many more points, up to 96 and
Much more accurate (more digits) representations.
                                   Quadrature

A change of variable is needed to get it into the right form: Each
term then becomes
                    1              2 exp[(2 / 2)(y it - x  β - uah )2 ]
L i,Q   h1 w h
          H                 Ti
                           t 1
                                                           it

                                    2
and the problem is solved by maximizing with respect to β, 2 , u
                                                            

logL Q   i1 logL i,Q
              N



(Maximization be continued later in the semester.)
             Gauss-Hermite Quadrature
                  2 exp[(2 / 2)(y it - x  β - u v i )2 ]

            Ti
           t 1
                                           it
                                                                     (v i )dv i
 
                                               2
(v i )  exp( v i2 / 2) / 2
Make a change of variable to ai  v i / 2 ,v i= 2 ai , dv i = 2 dai
1 1                                          2 exp[(2 / 2)(y it - x  β - u 2ai )2 ]
                   
                                    2   Ti                             it
                           exp(a )i   t 1                                                   2 dai
2 2                
                                                                     2
1 2                                          2 exp[(2 / 2)(y it - x  β - u 2ai )2 ]
                   
                                    2   Ti                             it
                           exp(a )i   t 1                                                 ] dai
2 2                
                                                                     2
 1 
2 
           
                                
                   exp(ai2 )  tT12 exp[(2 / 2)(y it - x  β - u 2ai ) 2 ] dai
                                  i
                                                            it                       
 1                                                1
                  exp(ai2 )g  ai  dai                H1w hg(ah )
                                                           h
2          
                                                    2 
                            Simulation
The unconditional log likelihood is an expected value;
logL i (β, u , 2 )
                 

              Ti 2 exp[(2 / 2)(y it - x  β - u v i )2 ] 
   = log   t 1                               it
                                                                (v i )dv i
         
                                           2                 
    logE vi L i (β, u , 2 | v i ) = E v g(v i )
                           

An expected value can be 'estimated' by sampling observations
and averaging them
             1 R  Ti 2 exp[(2 / 2)(y it - x  β - u v ir )2 ] 
E v g(v i )   r 1  t 1 
ˆ                                              it
                                                                   
             R                        2                          
The unconditional log likelihood function is then
          1 R  Ti 2 exp[(2 / 2)(y it - x  β - u v ir )2 ] 
 i1 log R  r 1  t 1 
   N                                        it
                                                                
                                   2                          
This is a function of (β, 2 , u| y i , X i , v i,1 ,..., v i,R ),i  1,...,N
                           

The random draws on v i,r become part of the data, and the function
is maximized with respect to the unknown parameters.
                 Convergence Results
Target is expected log likelihood: logE vi [L(β,2|v i )]
                                                 

Simulation estimator based on random sampling from population of v i
LogL S (β,2 )
           

         1 R  Ti 2 exp[(2 / 2)(y it - x  β - u v ir )2 ] 
log  i1  r 1  t 1 
      N                                    it
                                                               
         R                        2                          
The essential result is
plim(R  )LogL S (β,2 )  logE vi [L(β,2|v i )]
                                         

Conditions:
(1) General regularity and smoothness of the log likelihood
(2) R increases faster than N. ('Intelligent draws' - e.g. Halton sequences
makes this somewhat ambiguous.)
Result:
Maximizer of LogL S (β,2 ) converges to the maximizer of logE vi [L(β,2|v i )].
                                                                       
MSL vs. ML - Application
             Two Level Panel Data
   Nested by construction
   Unbalanced panels
       No real obstacle to estimation
       Some inconvenient algebra.
       In 2 step FGLS of the RE, need “1/T” to solve for an
        estimate of σu2. What to use?
        Q 1/ T
        (1/T)=(1/N)N 1 (1 / Ti ) (early NLOGIT
                    i

        QH=[N (1/Ti )]1/N
             i=1               (Stata)
        (TSP, current NLOGIT, do not use this.)
      Balanced Nested Panel Data

   Zi,j,k,t = test score for student
              t, teacher k, school j, district i
   L = 2 school districts, i = 1,…,L
   Mi = 3 schools in each district, j = 1,…,Mi
   Nij = 4 teachers in each school, k = 1,…,Nij
   Tijk = 20 students in each class, t = 1,…,Tijk

Antweiler, W., “Nested Random Effects Estimation in Unbalanced
Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313.
                    Nested Effects Model

y ijkt  x    uijk  v ij  wi  ijkt
           ijkt

Strict exogeneity, all parts uncorrelated.
Normality assumption added later
                                2
Var[uijk  v ij  wi  ijkt ]=u  2  2  2
                                     v    w    

Overall covariance matrix Ω is block diagonal over i, each diagonal block
is block diagonal over j, each of these, in turn, is block diagonal over k,
and each lowest level block has the form of Ω we saw earlier.
     GLS with Nested Effects
Define
2
 
 2     2
1  Tu  2
            
                                                 2
                                         2  Tu
                                           
              2
2  NT2  Tu  2
 2      v          
                                            2
                                          1  NT2
                                                   v
                      2
2  MNT2  NT2  Tu  2  2  MNT2
 v       w      v              2       w

GLS is equivalent to OLS regression of
                                                      
y ijkt  y ijkt  1    y ijk .       y ij ..       y i ...
                     1             1 2             2 3 
on the same transformation of x ijkt . FGLS estimates are
obtained by "three group-wise between estimators and
the within estimator for the innermost group."
        Unbalanced Nested Data
   With unbalanced panels, all the preceding
    results fall apart.
   GLS, FGLS, even fixed effects become
    analytically intractable.
   The log likelihood is very tractable
   Note a collision of practicality with
    nonrobustness. (Normality must be assumed.)
                  Log Likelihood (1)
                    2
                   u        2       2
Define :       u  2 ,  v  2 , w  w .
                              v

                                  2
                                       

                                                Nij       Tijk
Construct: ijk  1  Tijk u , ij           k 1
                                                          ijk
                                             Mi
                                                    ij
               ij  1  ij v , i       j 1
                                                    ij
               i  1  w i
                                 T
Sums of squares: A ijk   t 1e ijkt , e ijkt  y ijkt  x  β
                             ijk 2
                                                            ijkt

                                 Tijk                            Nij    B ijk              Mi
                                                                                                  B ij
                        B ijk   e , B ij  
                                 t 1 ijkt                       k 1           , Bi     j 1
                                                                        ijk                      ij
            Log Likelihood (2)

H  total number of observations
      -1
logL= [Hlog(22 )  L1 {
                       i
       2
              log i  Mi1 {
                        j
                            N
               log ij  k ij 1 {
                            
                                  2           2
                              u Bijk
                            A ijk        v Bij    w Bi2
               log ijk  2       2
                                      }-      2
                                                }-      2
                                                          }]
                          ijk       ij     i 
(For 3 levels instead of 4, set L = 1 and w = 0.)
                Maximizing Log L
   Antweiler provides analytic first derivatives for
    gradient methods of optimization. Ugly to
    program.
   Numerical derivatives:
    Let δ be the full vector of K+4 parameters.
    Let r  perturbation vector, with =max(0 ,1 | r |)
    in the rth position and zero in the other K+3 positions.
     logL logL(δ   r )  logL(δ   r )
          
      r                2
  Asymptotic Covariance Matrix


"Even with an analytic gradient, however, the Hessian
matrix, Ψ is typically obtained through numeric approximation
methods." Read "the second derivatives are too complicated
to derive, much less program." Also, since logL is not a sum
of terms, the BHHH estimator is not useable. Numerical
second derivatives were used.
          An Appropriate Asymptotic
              Covariance Matrix
The expected Hessian is block diagonal. We can isolate β.
   2 logL   1           N       Tijk
-           2 L1Mi1k ij 1 t 1 x ijkt x 
                i   j                         ijkt
   ββ    

             
                 W L Mi Nij 1
                 2
                  
                    i1 j1k 1
                                   ijk
                                          
                                          Tijk
                                         t 1 x ijkt       Tijk
                                                               t 1   x
                                                                       ijkt   
              v L Mi 1  Nij 1                                       Nij 1                          
              2 i1 j1  k 1
                        ij     ijk
                                          Tijk
                                                
                                         t 1 x ijkt                k 1
                                                                            ijk
                                                                                    Tijk
                                                                                    t 1 x 
                                                                                            ijkt      
                                                                                                       
                                                                                                    
          u L  Mi 1  Nij 1                          M 1  Nij 1                                              
         2 i1   j1  k 1
                     ij     ijk
                                        Tijk
                                                    
                                       t 1 x ijkt     ji1  k 1
                                                             ij 
                                                                        ijk
                                                                                Tijk
                                                                               t 1 x 
                                                                                       ijkt                      
                                                                                                                   
                                                                                                             
The inverse of this, evaluated at the MLEs provides the appropriate
                                           ˆ
estimated asymptotic covariance matrix for β. Standard errors for the
variance estimators are not needed.
              Some Observations
   Assuming the wrong (e.g., nonnested) error
    structure
       Still consistent – GLS with the wrong weights
       Standard errors (apparently) biased downward
        (Moulton bias)
   Adding “time” effects or other nonnested
    effects is “very challenging.” Perhaps do with
    “fixed” effects (dummy variables).
                An Application
   Y1jkt = log of atmospheric sulfur dioxide
    concentration at observation station k at time t,
    in country i.
   H = 2621, 293 stations, 44 countries, various
    numbers of observations, not equally spaced
   Three levels, not 4 as in article.
   Xjkt =1,log(GDP/km2),log(K/L),log(Income),
    Suburban, Rural,Communist,log(Oil price),
    average temperature, time trend.
                     Estimates
      Dimension      Random Effects       Nested Effects
x1      . . .         -10.787 (12.03)      -7.103 (5.613)
x2     C S T            0.445 (7.921)       0.202 (2.531)
x3     C . T            0.255 (1.999)       0.371 (2.345)
x4     C   .     T    -0.714    (5.005)     -0.477    (2.620)
x5     C   S     T    -0.627    (3.685)     -0.720    (4.531)
x6     C   S     T    -0.834    (2.181)     -1.061    (3.439)
x7     C     .   .     0.471    (2.241)      0.613    (1.443)
x8      . . T         -0.831 (2.267)        -0.089 (2.410)
x9     C S T          -0.045 (4.299)        -0.044 (3.719)
x10    . . T          -0.043 (1.666)        -0.046 (10.927)
2
                      0.330                0.329
u                     1.807                1.017
v                                          1.347
logL                  -2645.4               -2606.0
(t ratios in parentheses)
                       Rotating Panel-1
The structure of the sample and selection of individuals in a rotating sampling
design are as follows: Let all individuals in the population be numbered
consecutively. The sample in period 1 consists of N, individuals. In period 2, a
fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new
individuals from the population. In period 3 another fraction of the sample in the
period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and
so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The
procedure of dropping met-1 individuals selected in period t - 1 and replacing
them by mit individuals from the population in period t is called rotating
sampling. In this framework total number of observations and individuals
observed are ΣtNt and N1 + Σt=2 to Tmit respectively.

Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30,
1998, pp. 919-930
                      Rotating Panel-2
The outcome of the rotating sample for farms producing dairy products is given
in Table 1. Each of the annual sample is composed of four parts or subsamples.
For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first
three parts (79, 62, and 98) are those not replaced during the transition from
1979 to 1980. The last subsample contains 74 newly included farms from the
population. At the same time 85 farms are excluded from the sample in 1979.
The difference between the excluded part (85) and the included part (74)
corresponds to the changes in the rotating sample size between these two
periods, i.e. 313-324 = -11. This difference includes only the part of the sample
where each farm is observed consecutively for four years, Nrot. The difference in
the non-rotating part, N„„„, is due to those farms which are not observed
consecutively. The proportion of farms not observed consecutively, Nnon in the
total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per
cent.
                  Rotating Panels-3

   Simply an unbalanced panel
       Treat with the familiar techniques
       Accounting is complicated
   Time effects may be complicated.
       Biorn and Jansen (Scand. J. E., 1983) households cohort 1 has
        T = 1976,1977 while cohort 2 has T=1977,1978.
   But,… “Time in sample bias…” may require special
    treatment. Mexican labor survey has 3 periods rotation.
    Some families in 1 or 2 or 3 periods.
                     Pseudo Panels
T different cross sections.
y i(t),t  x    ui(t)  i(t),t , i(t)=1,...,N(t); t=1,...,T
             i(t),t
            T
These are  t=1N(t) independent observations.
Define C cohorts - e.g., those born 1950-1955.
y c,t  x    uc,t  c,t , c=1,...,C; t=1,...,T
          c,t

Cohort sizes are Nc (t). Assume large. Then
uc,t  uc for each cohort. Creates a fixed effects model:
y c,t  x    uc  c,t , c=1,...,C; t=1,...,T.
          c,t

(See Baltagi 10.3 for issues relating to measurement error.)

								
To top