16E Lecture Notes 16 by blue123


									Applied Econometrics with EViews ... Lecture 16                                                  1
M Daniel Westbrook

Panel Data
Suppose that we have data on three firms' output and labor for a given year and that the scatter
diagram looks like this:

As time goes by we collect additional data; the next scatter diagram shows our regression line
after we have collected data for three years.
Applied Econometrics with EViews ... Lecture 16                                                     2
M Daniel Westbrook

After seven years have passed we have a regression that looks like this:

It seems clear that the regression line does not accurately represent the relationship between
the explanatory variable and the dependent variable for a particular firm. In fact, the following
set of regressions seems more appropriate:
Applied Econometrics with EViews ... Lecture 16                                                       3
M Daniel Westbrook

Clearly, we could represent the data for these three firms with a regression that includes a pair
of dummy variables, but we will find that the POOL object of EViews, which is designed to
handle panel data is much more flexible and powerful than anything we can easily manage with
dummy variables.

Advantages of panel data: estimation of parameters when the cross-section dimension is very
short, elimination of the omitted variables bias in certain cases, rich specification of the error
covariance structure.

A panel data set consists of a set of time-series observations on a set of cross-sectional units.
If each of N time-series has the same length T, then the NxT observations comprise a
balanced panel. Unbalanced panels do not require special analytical methods, but the notation
is a bit more complex than for balanced panels. Most panel data software can handle
unbalanced panels with ease. We will concentrate on balanced panels. EViews requires that
the representation of the panel data conforms to a balanced panel but the data that fill the
representation need not be balanced.

It is easiest to think of panel data sets as time-series vectors stacked vertically, though this is
only one of several ways that panel data can be represented in EViews POOL objects. For
example, if variable Y is the dependent variable in a panel data model, we can envision its
elements arranged as:

          Y1,1 
               
          Y1,2 
          M 
          Y1,T 
               
          Y2,1 
         Y 
          2,2 
          M 
          Y2,T 
               
          M 
          YN,1 
               
          YN,2 
          M 
               
          YN,T 
               

This vector of data contains NxT elements. The first subscript indexes cross-sectional units;
the second indexes time periods.

Observations on regressors and the symbols for the unobservable stochastic disturbance terms
can be arranged similarly.
Applied Econometrics with EViews ... Lecture 16                                                         4
M Daniel Westbrook

Linear Models for Panel Data

The basic element for building a linear model for panel data is this equation:

         Yit = α i + β i1 X1i t + β i2 X 2 i t + L + β iK X K i t + ε it

This specification allows the regression intercept and partial regression coefficients to vary
across cross-sectional units but not over time. This is the class of models that EViews' POOL
object estimates most easily. Notice that we do not assume that X1it = 1. The intercepts are
separately represented by the symbols α i .

This equation allows the investigator to represent a variety of models depending on the
assumptions made abut the regression intercepts, the partial regression coefficients, and the
stochastic disturbance terms. The richness of the panel data set allows a rich variety of
specifications and requires sophisticated treatment of the stochastic disturbance terms. The
taxonomy of panel data models reflects choices across the following dimensions:

Alternative Assumptions for the Intercept

1.      None:                       α i = 0 for every i.

2.      Common:                     α i = α for every i.

3.      Fixed Effects:              α i differ across cross-sectional units and E[ α i ε i t ] = 0

4.      Random Efffects:           cross-sectional units have different intercepts that are realizations
                                   of a random variable:

                                    α i = α + ν i and E[ ν i ε i t ] = 0 and E[ ν i X k i t ] = 0

Alternative Assumptions for the Partial Regression Coefficients

1.      Common:                    at least some of the partial regression coefficients are
                                   common across cross-sectional units.

2.      Cross-Section-Specific:

                                   at least some of the partial regression coefficients are different for
                                   different cross-sectional units.
Applied Econometrics with EViews ... Lecture 16                                                    5
M Daniel Westbrook

Assumptions about the Stochastic Error Terms

The error terms may have complicated variance and covariance structures. Consider a fixed-
effects model of firms' output in which we assume that all firms have the same output elasticity
with respect to capital and the same output elasticity with respect to labor, but they have
different intercepts, perhaps because they differ in the frequency with which their electricity
supplies are interrupted.

         log ( Y1 t     )   = α1 + β1 log ( K 1 t    )       + β 2 log ( L 1 t   )   + ε1t

         log ( Y2 t     )   = α 2 + β1 log ( K 2 t       )    + β 2 log ( L 2 t      )   + ε2 t


         log ( YN t     )   = α N + β1 log ( K N t       )    + β 2 log ( L N t      )   + εN t

        The stochastic error terms may be heteroskedastic across firms and non-
        heteroskedasticic within firms:

            [ ]
         E ε i2t = σ i2          Note that there is no t subscript on the variance.

        If the firms experience similar random shocks contemporanesously, the stochastic error
        terms may exhibit contemporaneous cross-equation correlation:

         E εi t ε j t   ]= σ    ij

        Each firm may have a stochastic error term that is autocorrelated; the autocorrelation
        may be common across firms, or it may be distinct across firms:

         εi t = ρ εi ( t -1 ) + ξi t      or ε i t = ρ i ε i ( t - 1 ) + ξ i t

        Notice that the combination of the second and third assumption allows the stochastic
        errors of one firm to be autocorrelated with the past stochastic errors of other firms.

EViews provides very convenient tools to estimate all of these cases.
Applied Econometrics with EViews ... Lecture 16                                                      6
M Daniel Westbrook

Estimation Concepts

The appropriate estimation technique is determined by the configuration of the regression
coefficients and the assumptions concerning the stochastic disturbance term. The simplest
cases arise when the classical assumptions (no heteroskedasticity, no contemporaneous cross-
equation correlation, and no autocorrelation) pertain to the behavior of ε i t .

Under these circumstances, OLS yields Best Linear Unbiased Estimators for a variety of
models. The simplest model would be the one in which all cross-sectional units share the same
intercepts and partial regression coefficients:

         Yi , t = α + β1 X1, i , t + β 2 X 2 , i , t + L + β K X K , i , t + ε i , t

This is a highly-restricted model and the hypothesis that all cross-sectional units share the same
intercept is often rejected.

Within Estimation

The most common specification for panel data may be the fixed-effects model. The partial
regression coefficients are assumed to be common across cross-sectional units, but the
regression intercepts are taken to be distinct across cross-sectional units. The equations look
like this:

         Y1, t = α1 + β1 X1, 1, t + β 2 X 2, 1, t + L + β K X K, 1, t + ε 1, t

         Y2, t = α 2 + β1 X1, 2, t + β 2 X 2, 2, t + L + β K X K, 2, t + ε 2 , t


         YN, t = α N + β1 X1, N, t + β 2 X 2, N, t + L + β K X K, N, t + ε N , t

This set of equations could be estimated by OLS with an appropriate set of dummy variables. If
an investigator wants to use all of the dummies, omission of the regression intercept will enable
her to avoid the dummy variable trap.

On the other hand, if the panel is a large one, it may prove quite tedious to construct the set of
dummy variables required. The "within" approach allows us to estimate the partial regression
coefficients without specifying any dummy variables, and it also allows us to illustrate the
circumstances under which panel data enables us to avoid the omitted variables bias.
Applied Econometrics with EViews ... Lecture 16                                                                                                         7
M Daniel Westbrook

Assume that associated with each cross-sectional unit is some unobservable variable that does
not change over time. For example, consider an equation meant to relate grades to hours
studied. We might collect data for a set of students over several semesters and run a
regression, but one variable that we cannot easily observe is intelligence.

Write the intercept of each equation so that it looks like this:

         α i = α′ + γ i Z i

Keep in mind that Zi does not change over time so Z i = Z i t .

Define the cross-section specific means of the data like this:

         yi =
                     t =1

Obviously          zi =
                                   t =1
                                          it   = Zi

Now, re-cast the model in terms of deviations from cross-sectional means:

         Yi , t - y i =     ( αi   - αi ) + γ i ( Z i - zi   )   + β1 ( X1, i, t - x 1, i   )+L+           β K ( X K , i , t - x K, i ) +   ( ε i, t - εi )

Interestingly, the terms associated with the intercept and with the unobservable variable vanish:

         Yi , t - y i = β1 ( X1, i, t - x 1, i   )+L+             β K ( X K , i , t - x K, i ) +   ( ε i, t - εi )

OLS performed on the data in deviation terms is referred to as "within" estimation because all of
the variation between cross sections has been subtracted out: the only variation with which the
partial regression coefficients are estimated is the variation within the cross sectional units over
Applied Econometrics with EViews ... Lecture 16                                                   8
M Daniel Westbrook

Cross-Section Specific Partial Regression Coefficients

Going beyond the fixed effects model to models in which some of the partial regression
coefficients are cross-section specific can easily be accomplished by creating dummy variable /
interaction terms for the variables whose coefficients vary across cross sections. Doing this
mechanically is very tedious if many cross-sectional dummies are involved; EViews does this
very easily, however.

Generalizations of the Error Covariance Structure

If the classical assumptions for the stochastic error terms do not hold, then OLS estimators
remain unbiased but are not efficient. Worse, the estimated coefficient variance-covariance
matrix is biased and inconsistent, so the t-statistics and F-statistics are invalid.

As mentioned above, in panel data models a variety of assumptions about the stochastic error
covariance structure are more realistic than the classical assumptions. Under these alternative
sets of assumptions the appropriate estimation technique is Feasible Generalized Least
Squares (FGLS).

In this section we focus on specification of the error covariance matrix.

EViews and Panel Data

EViews POOL objects operate on variables that have special two-part names. The first part is
the name of the variable, and the second part of the name is the cross-section identifier that
indicates which cross-sectional unit the variable belongs to.

I generally begin cross-section identifiers with an underscore mark to make the full variable
names more readable.

Example: I want to work with a panel data set on the USA, Canada, and Mexico. The variables
that I want to use are GDP, Population, and Trade Flows.

        Variable Name First Part:

Applied Econometrics with EViews ... Lecture 16                                                    9
M Daniel Westbrook

        Variable Name Second Part (Cross-Section Identifier)



                 GDP_USA           GDP_CAN        GDP_MEX
                 POP_USA           POP_CAN        POP_MEX
                 TRA_USA           TRA_CAN        TRA_MEX

Thus, there are nine variables in my EViews workfile. It is easy to see that panel data sets can
quickly become very large. For example, Thay Randy is working with a panel of 61
Vietnamese provinces for which he has ten-year time series on 8 variables related to
agriculture: his workfile has (61 x 8) = 488 variables.

After you have named all of your variables and have got the data into an EViews workfile, you
are ready to create the POOL object.

You do this by clicking the following sequence:

        Objects / New Object / Pool

A window will open with space for you to list your cross-section identifiers:
Applied Econometrics with EViews ... Lecture 16                                                  10
M Daniel Westbrook

In the POOL object you refer to variables by the first part name and the questionmark. Thus, if I
type a command using GDP? EViews uses all three GDP series for the USA, Canada, and

Notice the button PoolGenr. PoolGenr is used to create new variables according to rules that
are similar to the rules for ordinary Genr. For example, if I want to create GDP Per Capita for all
three countries in my POOL, I would click PoolGenr and then type the equation:

        GDPPC? = GDP? / POP?

For estimation, EViews has one window in which the user specifies the equation and the
assumptions regarding the stochastic disturbance term. That window is shown here:

We will describe each element of the specification window.

Dependent Variable

The dependent variable will be typed in according to its name and question mark. For example,
you might use GDPPC?
Applied Econometrics with EViews ... Lecture 16                                                        11
M Daniel Westbrook

Common Coefficients

In this field you list all explanatory variables that you assume have the same partial regression
coefficient for every cross-sectional unit. Use the format VAR? You may use AR(p)
specifications if you want to model autocorrelation. Keep in mind that your panel data set
should have a rather long time series dimension in order to get reliable estimators of the
autocorrelation coefficients.

Cross-Section Specific Coefficients

In this window you type the names of all explanatory variables that you assume have different
partial regression coefficient values for different cross-sectional units. Use the format VAR?


Here you specify whether your model has

No intercept ...                   this case is rare.
Common intercept ...               this case is unusual.
Fixed Effects ...                  the typical specification.
Random Effects ...                 this specification is not often used because it requires strong
                                   assumptions that are difficult to meet in practice.


Here, weighting refers to "feasible weighted least squares."

No weighting ...                   no equation-specific heteroskedasticity.
Cross-section weights ...          feasible WLS to correct for equation-specific heteroskedasticity.
SUR ...                            accounts for contemporaneous cross-equation correlation of
                                   errors and equation-specific heteroskedasticity. To use this, the
                                   time-series dimension must exceed the cross-section dimension
                                   (T > N).

Iterate to Convergence ...         causes the program to compute new residuals based on the
                                   feasible GLS coefficient estimators, then update the feasible GLS
                                   coefficient estimators; compute new residuals based on the new
                                   GLS coefficient estimators, then update the feasible GLS
                                   coefficient estimators, etc ....
Applied Econometrics with EViews ... Lecture 16                                                12
M Daniel Westbrook


There is only one option: Whites HCCM can be produced if you do not choose SUR.

Hypothesis Testing

In panel data models (as in single-equation multiple-regression models) we are interested in
testing two types of hypotheses: hypotheses about the variances and covariances of the
stochastic error terms and hypotheses about the regression coefficients.

A bit of art is involved, but the general to simple procedure provides a good guide.

Before testing hypotheses about the regression coefficients, it is important to have a good
specification of the error covariance matrix so that the test statistics for the regression
coefficients are reliable.

Testing Hypotheses About The Error Covariance Matrix

It is helpful to think about restricted and unrestricted error covariance matrices.

An error covariance matrix is a square matrix with the error variances of the individual cross-
sectional equations along the diagonal and with the contemporaneous error covariances on the
off-diagonal elements. All covariance matrices are symmetric, so if we specify an error
covariance matrix for a panel model with five cross-sectional units we have a (5 x 5) matrix with
five diagonal units and ten off-diagonal units:

         σ1    σ 12    σ 13   σ 14   σ 15
                 2    σ 23    σ 24   σ 25
                       σ3     σ 34   σ 35
                               4     σ 45

If we click the button for SUR, EViews will estimate all of these parameters. On the other hand,
if we believe that the cross-sectional units do not have any contemporaneous cross-equation
error covariances, we would click the button for Cross-Section Weighting and EViews would
impose zero restrictions on all of the off-diagonal elements of the matrix. Only the diagonal
elements would be estimated:
Applied Econometrics with EViews ... Lecture 16                                                    13
M Daniel Westbrook

         σ1     0      0      0       0
                2      0      0       0
                      σ3      0       0
                               4      0

The second model involves imposing ten restrictions, compared to the first model.

Finally, if we assumed that our stochastic disturbances were free of cross-sectional
heteroskedasticity, we click the No Weighting button and EViews would estimate only one
diagonal element instead of five: four restrictions would be imposed, compared to the second

         σ2     0      0      0       0
              σ        0      0       0
                      σ       0       0
                              σ       0

Testing these restrictions is easily accomplished by means of a test called a likelihood ratio test.
In earlier estimation adventures you may have noticed a statistic called the Log-Likelihood
reported among the EViews output. This is an estimator of the joint probability of the observed
sample, given the point estimates of the parameters. As such, it is a number bounded by zero
and one.

All of our estimation methods aim to maximize this log-likelihood. In many applications,
maximizing the log-likelihood leads to exactly the same estimator as the Least-Squares method
does, but the analytical work required is heavier, so we follow the Least-Squares approach.

Our interest here is in the extent to which imposing restrictions on the error covariance matrix
reduces the log-likelihood statistic.

If we form a ratio of the likelihood of a restricted model L R divided by the likelihood of an
unrestricted model L U , we expect the ratio to be less than 1 because the maximum likelihood
subject to a restriction can be no greater than the maximum likelihood of the unrestricted model.
Applied Econometrics with EViews ... Lecture 16                                                       14
M Daniel Westbrook

Define the likelihood ratio:                 l =      . Then 0 ≤ l ≤ 1

If the restricted model is not significantly different from the unrestricted model we expect the
likelihood ratio to be close to 1. The distribution theory of the likelihood ratio is a bit
cumbersome. However, it is well-known that the distribution of − 2 x l is asymptotically Chi-
Square, so that in any application with a sufficiently large sample size we can use:

                          ( ( ) - log( L ) ) approx ~ χ
         − 2 x l = - 2 x log L R       ˆ
                                                                  q   where q is the number of

Under the null hypothesis we expect − 2 x l to be close to zero; we reject the null hypothesis
if the realized value of the likelihood ratio statistic exceeds an appropriate critical value or if the
p-value of the test is smaller than the pre-selected significance level.

Maintained Model

While testing hypotheses about restrictions on the error covariance matrix, some specification of
the panel data regression model must be maintained. It is recommended that the maintained
model be "general" in the sense that we used that term in describinb the "general - to - simple"
modeling strategy.

Testing Restrictions on the Panel Data Model

After a sound specification for the error covariance structure has been established, tests
associated with the general to simple modeling strategy may be undertaken. These tests may
be the usual Wald tests or t-tests on individual coefficients.

Keep in mind that when the cross-section weights or SUR methods or any AR(p) specification is
used, the results all asymptotically based so that the t-stats are approximately standard normal
and the Wald F-stats are approximately Chi-Squared.

Unrestricted Model

The completely unrestricted model is this one:

         Yit = α i + β i1 X1i t + β i2 X 2 i t + L + β iK X K i t + ε it
Applied Econometrics with EViews ... Lecture 16                                                     15
M Daniel Westbrook

In this model, the intercepts and the partial regression coefficients vary across cross-sectional
units. If either the no-weighting or cross-sectional weights option is chosen for the error
covariance structure, then the results will be exactly the same as applying OLS to the data for
each cross sectional unit.

If the SUR option is chosen, then efficiency will be enhanced by exploiting the information
contained in the cross-equation error covariances. Remember that ( T > N ) is required to use
this option.

Partially Restricted Model

In many panel data sets the time-series dimension is quite short so it is impractical to estimate
the model for which all parameters vary across cross-sectional units. In this case the most
general feasible model is the fixed-effects model: only the intercepts vary across cross-
sectional units; the partial regression coefficients are the same for all cross-sectional units.

Of course, there are models in which some partial regression coefficients are identical across
cross-sectional units while others vary.

Restricted Model

The most restrictive model is the one in which the intercepts and the partial regression
coefficients are identical for all cross-sectional units.

Testing model restrictions may be done via the Wald Coefficient test or via the likelihood ratio
test. The two methods are asymptotically equivalent, though they may give different results for
a particular finite sample.

As you move through the general - to -simple modeling strategy it is sensible to re-check the
error covariance structure as you impose restrictions on the model's partial regression
coefficients and intercepts. Even though you may fail to reject the hypotheses that represents
restrictions that you impose, the hypotheses may not be perfectly true, and that may affect the
estimators and tests of the error covariances.

Problem Set 16 guides you through the entire process for a panel consisting of five firms over
thirty years.

To top