VIEWS: 31 PAGES: 15 POSTED ON: 8/26/2010
Applied Econometrics with EViews ... Lecture 16 1 M Daniel Westbrook Panel Data Suppose that we have data on three firms' output and labor for a given year and that the scatter diagram looks like this: As time goes by we collect additional data; the next scatter diagram shows our regression line after we have collected data for three years. Applied Econometrics with EViews ... Lecture 16 2 M Daniel Westbrook After seven years have passed we have a regression that looks like this: It seems clear that the regression line does not accurately represent the relationship between the explanatory variable and the dependent variable for a particular firm. In fact, the following set of regressions seems more appropriate: Applied Econometrics with EViews ... Lecture 16 3 M Daniel Westbrook Clearly, we could represent the data for these three firms with a regression that includes a pair of dummy variables, but we will find that the POOL object of EViews, which is designed to handle panel data is much more flexible and powerful than anything we can easily manage with dummy variables. Advantages of panel data: estimation of parameters when the cross-section dimension is very short, elimination of the omitted variables bias in certain cases, rich specification of the error covariance structure. A panel data set consists of a set of time-series observations on a set of cross-sectional units. If each of N time-series has the same length T, then the NxT observations comprise a balanced panel. Unbalanced panels do not require special analytical methods, but the notation is a bit more complex than for balanced panels. Most panel data software can handle unbalanced panels with ease. We will concentrate on balanced panels. EViews requires that the representation of the panel data conforms to a balanced panel but the data that fill the representation need not be balanced. It is easiest to think of panel data sets as time-series vectors stacked vertically, though this is only one of several ways that panel data can be represented in EViews POOL objects. For example, if variable Y is the dependent variable in a panel data model, we can envision its elements arranged as: Y1,1 Y1,2 M Y1,T Y2,1 Y 2,2 M Y2,T M YN,1 YN,2 M YN,T This vector of data contains NxT elements. The first subscript indexes cross-sectional units; the second indexes time periods. Observations on regressors and the symbols for the unobservable stochastic disturbance terms can be arranged similarly. Applied Econometrics with EViews ... Lecture 16 4 M Daniel Westbrook Linear Models for Panel Data The basic element for building a linear model for panel data is this equation: Yit = α i + β i1 X1i t + β i2 X 2 i t + L + β iK X K i t + ε it This specification allows the regression intercept and partial regression coefficients to vary across cross-sectional units but not over time. This is the class of models that EViews' POOL object estimates most easily. Notice that we do not assume that X1it = 1. The intercepts are separately represented by the symbols α i . This equation allows the investigator to represent a variety of models depending on the assumptions made abut the regression intercepts, the partial regression coefficients, and the stochastic disturbance terms. The richness of the panel data set allows a rich variety of specifications and requires sophisticated treatment of the stochastic disturbance terms. The taxonomy of panel data models reflects choices across the following dimensions: Alternative Assumptions for the Intercept 1. None: α i = 0 for every i. 2. Common: α i = α for every i. 3. Fixed Effects: α i differ across cross-sectional units and E[ α i ε i t ] = 0 4. Random Efffects: cross-sectional units have different intercepts that are realizations of a random variable: α i = α + ν i and E[ ν i ε i t ] = 0 and E[ ν i X k i t ] = 0 Alternative Assumptions for the Partial Regression Coefficients 1. Common: at least some of the partial regression coefficients are common across cross-sectional units. 2. Cross-Section-Specific: at least some of the partial regression coefficients are different for different cross-sectional units. Applied Econometrics with EViews ... Lecture 16 5 M Daniel Westbrook Assumptions about the Stochastic Error Terms The error terms may have complicated variance and covariance structures. Consider a fixed- effects model of firms' output in which we assume that all firms have the same output elasticity with respect to capital and the same output elasticity with respect to labor, but they have different intercepts, perhaps because they differ in the frequency with which their electricity supplies are interrupted. log ( Y1 t ) = α1 + β1 log ( K 1 t ) + β 2 log ( L 1 t ) + ε1t log ( Y2 t ) = α 2 + β1 log ( K 2 t ) + β 2 log ( L 2 t ) + ε2 t M log ( YN t ) = α N + β1 log ( K N t ) + β 2 log ( L N t ) + εN t The stochastic error terms may be heteroskedastic across firms and non- heteroskedasticic within firms: [ ] E ε i2t = σ i2 Note that there is no t subscript on the variance. If the firms experience similar random shocks contemporanesously, the stochastic error terms may exhibit contemporaneous cross-equation correlation: [ E εi t ε j t ]= σ ij Each firm may have a stochastic error term that is autocorrelated; the autocorrelation may be common across firms, or it may be distinct across firms: εi t = ρ εi ( t -1 ) + ξi t or ε i t = ρ i ε i ( t - 1 ) + ξ i t Notice that the combination of the second and third assumption allows the stochastic errors of one firm to be autocorrelated with the past stochastic errors of other firms. EViews provides very convenient tools to estimate all of these cases. Applied Econometrics with EViews ... Lecture 16 6 M Daniel Westbrook Estimation Concepts The appropriate estimation technique is determined by the configuration of the regression coefficients and the assumptions concerning the stochastic disturbance term. The simplest cases arise when the classical assumptions (no heteroskedasticity, no contemporaneous cross- equation correlation, and no autocorrelation) pertain to the behavior of ε i t . Under these circumstances, OLS yields Best Linear Unbiased Estimators for a variety of models. The simplest model would be the one in which all cross-sectional units share the same intercepts and partial regression coefficients: Yi , t = α + β1 X1, i , t + β 2 X 2 , i , t + L + β K X K , i , t + ε i , t This is a highly-restricted model and the hypothesis that all cross-sectional units share the same intercept is often rejected. Within Estimation The most common specification for panel data may be the fixed-effects model. The partial regression coefficients are assumed to be common across cross-sectional units, but the regression intercepts are taken to be distinct across cross-sectional units. The equations look like this: Y1, t = α1 + β1 X1, 1, t + β 2 X 2, 1, t + L + β K X K, 1, t + ε 1, t Y2, t = α 2 + β1 X1, 2, t + β 2 X 2, 2, t + L + β K X K, 2, t + ε 2 , t M YN, t = α N + β1 X1, N, t + β 2 X 2, N, t + L + β K X K, N, t + ε N , t This set of equations could be estimated by OLS with an appropriate set of dummy variables. If an investigator wants to use all of the dummies, omission of the regression intercept will enable her to avoid the dummy variable trap. On the other hand, if the panel is a large one, it may prove quite tedious to construct the set of dummy variables required. The "within" approach allows us to estimate the partial regression coefficients without specifying any dummy variables, and it also allows us to illustrate the circumstances under which panel data enables us to avoid the omitted variables bias. Applied Econometrics with EViews ... Lecture 16 7 M Daniel Westbrook Assume that associated with each cross-sectional unit is some unobservable variable that does not change over time. For example, consider an equation meant to relate grades to hours studied. We might collect data for a set of students over several semesters and run a regression, but one variable that we cannot easily observe is intelligence. Write the intercept of each equation so that it looks like this: α i = α′ + γ i Z i i Keep in mind that Zi does not change over time so Z i = Z i t . Define the cross-section specific means of the data like this: T 1 yi = T ∑Y t =1 it T 1 Obviously zi = T ∑Z t =1 it = Zi Now, re-cast the model in terms of deviations from cross-sectional means: Yi , t - y i = ( αi - αi ) + γ i ( Z i - zi ) + β1 ( X1, i, t - x 1, i )+L+ β K ( X K , i , t - x K, i ) + ( ε i, t - εi ) Interestingly, the terms associated with the intercept and with the unobservable variable vanish: Yi , t - y i = β1 ( X1, i, t - x 1, i )+L+ β K ( X K , i , t - x K, i ) + ( ε i, t - εi ) OLS performed on the data in deviation terms is referred to as "within" estimation because all of the variation between cross sections has been subtracted out: the only variation with which the partial regression coefficients are estimated is the variation within the cross sectional units over time. Applied Econometrics with EViews ... Lecture 16 8 M Daniel Westbrook Cross-Section Specific Partial Regression Coefficients Going beyond the fixed effects model to models in which some of the partial regression coefficients are cross-section specific can easily be accomplished by creating dummy variable / interaction terms for the variables whose coefficients vary across cross sections. Doing this mechanically is very tedious if many cross-sectional dummies are involved; EViews does this very easily, however. Generalizations of the Error Covariance Structure If the classical assumptions for the stochastic error terms do not hold, then OLS estimators remain unbiased but are not efficient. Worse, the estimated coefficient variance-covariance matrix is biased and inconsistent, so the t-statistics and F-statistics are invalid. As mentioned above, in panel data models a variety of assumptions about the stochastic error covariance structure are more realistic than the classical assumptions. Under these alternative sets of assumptions the appropriate estimation technique is Feasible Generalized Least Squares (FGLS). In this section we focus on specification of the error covariance matrix. EViews and Panel Data EViews POOL objects operate on variables that have special two-part names. The first part is the name of the variable, and the second part of the name is the cross-section identifier that indicates which cross-sectional unit the variable belongs to. I generally begin cross-section identifiers with an underscore mark to make the full variable names more readable. Example: I want to work with a panel data set on the USA, Canada, and Mexico. The variables that I want to use are GDP, Population, and Trade Flows. Variable Name First Part: GDP POP TRA Applied Econometrics with EViews ... Lecture 16 9 M Daniel Westbrook Variable Name Second Part (Cross-Section Identifier) _USA _CAN _MEX Variables GDP_USA GDP_CAN GDP_MEX POP_USA POP_CAN POP_MEX TRA_USA TRA_CAN TRA_MEX Thus, there are nine variables in my EViews workfile. It is easy to see that panel data sets can quickly become very large. For example, Thay Randy is working with a panel of 61 Vietnamese provinces for which he has ten-year time series on 8 variables related to agriculture: his workfile has (61 x 8) = 488 variables. After you have named all of your variables and have got the data into an EViews workfile, you are ready to create the POOL object. You do this by clicking the following sequence: Objects / New Object / Pool A window will open with space for you to list your cross-section identifiers: Applied Econometrics with EViews ... Lecture 16 10 M Daniel Westbrook In the POOL object you refer to variables by the first part name and the questionmark. Thus, if I type a command using GDP? EViews uses all three GDP series for the USA, Canada, and Mexico. Notice the button PoolGenr. PoolGenr is used to create new variables according to rules that are similar to the rules for ordinary Genr. For example, if I want to create GDP Per Capita for all three countries in my POOL, I would click PoolGenr and then type the equation: GDPPC? = GDP? / POP? For estimation, EViews has one window in which the user specifies the equation and the assumptions regarding the stochastic disturbance term. That window is shown here: We will describe each element of the specification window. Dependent Variable The dependent variable will be typed in according to its name and question mark. For example, you might use GDPPC? Applied Econometrics with EViews ... Lecture 16 11 M Daniel Westbrook Common Coefficients In this field you list all explanatory variables that you assume have the same partial regression coefficient for every cross-sectional unit. Use the format VAR? You may use AR(p) specifications if you want to model autocorrelation. Keep in mind that your panel data set should have a rather long time series dimension in order to get reliable estimators of the autocorrelation coefficients. Cross-Section Specific Coefficients In this window you type the names of all explanatory variables that you assume have different partial regression coefficient values for different cross-sectional units. Use the format VAR? Intercept Here you specify whether your model has No intercept ... this case is rare. Common intercept ... this case is unusual. Fixed Effects ... the typical specification. Random Effects ... this specification is not often used because it requires strong assumptions that are difficult to meet in practice. Weighting Here, weighting refers to "feasible weighted least squares." No weighting ... no equation-specific heteroskedasticity. Cross-section weights ... feasible WLS to correct for equation-specific heteroskedasticity. SUR ... accounts for contemporaneous cross-equation correlation of errors and equation-specific heteroskedasticity. To use this, the time-series dimension must exceed the cross-section dimension (T > N). Iterate to Convergence ... causes the program to compute new residuals based on the feasible GLS coefficient estimators, then update the feasible GLS coefficient estimators; compute new residuals based on the new GLS coefficient estimators, then update the feasible GLS coefficient estimators, etc .... Applied Econometrics with EViews ... Lecture 16 12 M Daniel Westbrook Options There is only one option: Whites HCCM can be produced if you do not choose SUR. Hypothesis Testing In panel data models (as in single-equation multiple-regression models) we are interested in testing two types of hypotheses: hypotheses about the variances and covariances of the stochastic error terms and hypotheses about the regression coefficients. A bit of art is involved, but the general to simple procedure provides a good guide. Before testing hypotheses about the regression coefficients, it is important to have a good specification of the error covariance matrix so that the test statistics for the regression coefficients are reliable. Testing Hypotheses About The Error Covariance Matrix It is helpful to think about restricted and unrestricted error covariance matrices. An error covariance matrix is a square matrix with the error variances of the individual cross- sectional equations along the diagonal and with the contemporaneous error covariances on the off-diagonal elements. All covariance matrices are symmetric, so if we specify an error covariance matrix for a panel model with five cross-sectional units we have a (5 x 5) matrix with five diagonal units and ten off-diagonal units: 2 σ1 σ 12 σ 13 σ 14 σ 15 σ2 2 σ 23 σ 24 σ 25 2 σ3 σ 34 σ 35 σ2 4 σ 45 2 σ5 If we click the button for SUR, EViews will estimate all of these parameters. On the other hand, if we believe that the cross-sectional units do not have any contemporaneous cross-equation error covariances, we would click the button for Cross-Section Weighting and EViews would impose zero restrictions on all of the off-diagonal elements of the matrix. Only the diagonal elements would be estimated: Applied Econometrics with EViews ... Lecture 16 13 M Daniel Westbrook 2 σ1 0 0 0 0 σ2 2 0 0 0 2 σ3 0 0 σ2 4 0 2 σ5 The second model involves imposing ten restrictions, compared to the first model. Finally, if we assumed that our stochastic disturbances were free of cross-sectional heteroskedasticity, we click the No Weighting button and EViews would estimate only one diagonal element instead of five: four restrictions would be imposed, compared to the second model. σ2 0 0 0 0 2 σ 0 0 0 2 σ 0 0 2 σ 0 σ2 Testing these restrictions is easily accomplished by means of a test called a likelihood ratio test. In earlier estimation adventures you may have noticed a statistic called the Log-Likelihood reported among the EViews output. This is an estimator of the joint probability of the observed sample, given the point estimates of the parameters. As such, it is a number bounded by zero and one. All of our estimation methods aim to maximize this log-likelihood. In many applications, maximizing the log-likelihood leads to exactly the same estimator as the Least-Squares method does, but the analytical work required is heavier, so we follow the Least-Squares approach. Our interest here is in the extent to which imposing restrictions on the error covariance matrix reduces the log-likelihood statistic. ˆ If we form a ratio of the likelihood of a restricted model L R divided by the likelihood of an ˆ unrestricted model L U , we expect the ratio to be less than 1 because the maximum likelihood subject to a restriction can be no greater than the maximum likelihood of the unrestricted model. Applied Econometrics with EViews ... Lecture 16 14 M Daniel Westbrook ˆ LR Define the likelihood ratio: l = . Then 0 ≤ l ≤ 1 ˆ LU If the restricted model is not significantly different from the unrestricted model we expect the likelihood ratio to be close to 1. The distribution theory of the likelihood ratio is a bit cumbersome. However, it is well-known that the distribution of − 2 x l is asymptotically Chi- Square, so that in any application with a sufficiently large sample size we can use: ( ( ) - log( L ) ) approx ~ χ ˆ − 2 x l = - 2 x log L R ˆ U 2 q where q is the number of restrictions. Under the null hypothesis we expect − 2 x l to be close to zero; we reject the null hypothesis if the realized value of the likelihood ratio statistic exceeds an appropriate critical value or if the p-value of the test is smaller than the pre-selected significance level. Maintained Model While testing hypotheses about restrictions on the error covariance matrix, some specification of the panel data regression model must be maintained. It is recommended that the maintained model be "general" in the sense that we used that term in describinb the "general - to - simple" modeling strategy. Testing Restrictions on the Panel Data Model After a sound specification for the error covariance structure has been established, tests associated with the general to simple modeling strategy may be undertaken. These tests may be the usual Wald tests or t-tests on individual coefficients. Keep in mind that when the cross-section weights or SUR methods or any AR(p) specification is used, the results all asymptotically based so that the t-stats are approximately standard normal and the Wald F-stats are approximately Chi-Squared. Unrestricted Model The completely unrestricted model is this one: Yit = α i + β i1 X1i t + β i2 X 2 i t + L + β iK X K i t + ε it Applied Econometrics with EViews ... Lecture 16 15 M Daniel Westbrook In this model, the intercepts and the partial regression coefficients vary across cross-sectional units. If either the no-weighting or cross-sectional weights option is chosen for the error covariance structure, then the results will be exactly the same as applying OLS to the data for each cross sectional unit. If the SUR option is chosen, then efficiency will be enhanced by exploiting the information contained in the cross-equation error covariances. Remember that ( T > N ) is required to use this option. Partially Restricted Model In many panel data sets the time-series dimension is quite short so it is impractical to estimate the model for which all parameters vary across cross-sectional units. In this case the most general feasible model is the fixed-effects model: only the intercepts vary across cross- sectional units; the partial regression coefficients are the same for all cross-sectional units. Of course, there are models in which some partial regression coefficients are identical across cross-sectional units while others vary. Restricted Model The most restrictive model is the one in which the intercepts and the partial regression coefficients are identical for all cross-sectional units. Testing model restrictions may be done via the Wald Coefficient test or via the likelihood ratio test. The two methods are asymptotically equivalent, though they may give different results for a particular finite sample. As you move through the general - to -simple modeling strategy it is sensible to re-check the error covariance structure as you impose restrictions on the model's partial regression coefficients and intercepts. Even though you may fail to reject the hypotheses that represents restrictions that you impose, the hypotheses may not be perfectly true, and that may affect the estimators and tests of the error covariances. Problem Set 16 guides you through the entire process for a panel consisting of five firms over thirty years.