Document Sample

4. Using panel data 4.1 The basic idea 4.2 Linear regression 4.3 Logit and probit models 4.4 Other models 1 4.1 The basic idea Panel data = data that are pooled for Company Year the same companies across time. In panel data, there are likely to be A 1996 unobserved company-specific characteristics that are relatively constant over time. A 1997 I have already explained that it is necessary to control for this time- series dependence in order to obtain B 1996 unbiased standard errors. In STATA we can do this using the B 1997 robust cluster () option 2 4.1 The basic idea The first advantage of panel data is that we are using a larger sample compared to the case where we have only one observation per company. The larger sample permits greater estimation power, so the coefficients can be estimated more precisely. Since the standard errors are lower (even when they are adjusted for time-series dependence), we are more likely to find statistically significant coefficients. use "C:\phd\Fees.dta", clear gen fye=date(yearend, "mdy") format fye %d gen year=year(fye) sort year gen lnaf=ln(auditfees) gen lnta=ln(totalassets) by year: reg lnaf lnta, robust cluster(companyid) reg lnaf lnta, robust cluster(companyid) 3 4.1 The basic idea The second advantage of panel data is that we can estimate “dynamic” models. For example, suppose we believe that audit fees depend not only on the company’s size but also its rate of growth sort companyid fye gen growth= lnta- lnta[_n-1] if companyid== companyid[_n-1] reg lnaf lnta growth, robust cluster( companyid) We find that audit firms offer lower fees to companies that are growing more quickly If we had had only one year of data, we would not have been able to estimate this model. 4 4.1 The basic idea The third – and most important – advantage of panel data is that we are able to control for unobservable company-specific effects that are correlated with the observed explanatory variables Let’s start with a simple regression model: Let’s assume that the error term has an unobserved company-specific component that does not vary over time and an idiosyncratic component that is unique to each company-year observation: 5 4.1 The basic idea Putting the two together: Recall that the standard error of will be biased if we do not adjust for time-series dependence this adjustment is easy using the robust cluster () option The OLS estimate of the coefficient will be unbiased as long as the unobservable company- specific component (ui) is uncorrelated with Xit 6 4.1 The basic idea Unfortunately, the assumption that ui is uncorrelated with Xit is unlikely to hold in practice. If ui is correlated with Xit then it is also correlated with Xit The OLS estimate of will be biased if it is correlated with Xit (recall our previous discussion and notes on omitted variable bias) 7 4.1 The basic idea An example can illustrate this bias. Go to http://ihome.ust.hk/~accl/Phd_teaching.htm use "C:\phd\beatles.dta", clear list This dataset is a panel of four individuals observed over three years (1968-70) In each year they were asked how satisfied they are with their lives this is the lsat variable which takes larger values for increasing satisfaction You want to test how age affects life satisfaction reg lsat age It appears that they became slightly more satisfied as they got older. 8 4.1 The basic idea Suppose you now include dummy variables for each individual tab persnr, gen(dum_) Recall that you must omit one dummy variable or the intercept in order to avoid perfect collinearity (see the previous notes about multicollinearity) reg lsat age dum_1 dum_2 dum_4 reg lsat age dum_1 dum_2 dum_3 dum_4, nocons There now appears to be a highly significant negative impact of age on life satisfaction What’s going on here? 9 4.1 The basic idea Recall that fitting a simple OLS model (lsat on age) is equivalent to plotting a line of best fit through the data twoway (lfit lsat age) (scatter lsat age) 10 4.1 The basic idea I am now going to introduce a new command, separate , by() separate lsat, by(persnr) This creates four separate life satisfaction variables for each of the four individuals Now graph the relationship between life satisfaction and age for each of the four people twoway (lfit lsat1 age) (scatter lsat1 age) twoway (lfit lsat2 age) (scatter lsat2 age) twoway (lfit lsat3 age) (scatter lsat3 age) twoway (lfit lsat4 age) (scatter lsat4 age) 11 12 It is clear that each of the four individuals became less satisfied as they got older. The simple OLS regression was biased because John and Ringo (who happened to be older) were generally more satisfied than Paul and George (who happened to be younger) The multiple OLS regression controlled for these idiosyncratic differences by including dummy variables for each person We can see this by plotting the simple OLS results and the multiple OLS results reg lsat age dum_1 dum_2 dum_3 dum_4, nocons predict lsat_hat separate lsat_hat, by(persnr) twoway (line lsat_hat1-lsat_hat4 age) (lfit lsat age) (scatter lsat1-lsat4 age) 13 14 4.1 The basic idea What does all this have to do with panel data being advantageous? Without panel data we would not have been able to control for the idiosyncracies of the four individuals. If we had had data for only one year, we would not have known that the age coefficient was biased in the simple regression. We can demonstrate this by running a regression of lsat on age for each year in the sample sort time by time: reg lsat age Without panel data, we would have incorrectly concluded that people get happier as they get older 15 4.1 The basic idea In the multiple regression, we include dummy variables (dum_1 dum_2 dum_3 dum_4) which control for the individual-specific effects (ui) Without including the person dummies, our estimate of would be biased because the dummies are correlated with age. The person dummies “explain” all the cross-sectional variation in life satisfaction across the four individuals. The only variation that is left is the change in satisfaction within each person as he gets older. Therefore, the model with dummies is sometimes called the “within” estimator or the “fixed-effects” model. 16 4.1 The basic idea In small datasets like this, it is easy to create dummy variables for each person (or each company). In large datasets, we may have thousands of individuals or companies. The number of variables in STATA is restricted due to memory limits. Also it is not very inconvenient to have results for thousands of dummy variables (just imagine how long your log file would be!). 17 4.1 The basic idea Instead of including dummy variables, we can control for idiosyncratic effects by transforming the Y and X variables. Taking averages of eq. (1) over time gives: Subtracting eq. (2) from eq. (1) gives: The key thing to note here is that the individual-specific effects (ui) have been “differenced out” so they will not bias our estimate of . 18 4.1 The basic idea Another transformation that will do the same trick is to take differences rather than subtract means Lagging by one period Subtracting eq. (2) from eq. (1) gives: Again the individual-specific effects (ui) have been “differenced out” so they will not bias our estimate of . 19 Class exercise 4a Estimate the following models, where Y = life satisfaction and X = age. Compare the age coefficients in these models to the age coefficient in the untransformed model with person dummies (ignore the standard errors of the age coefficients because they are biased) 20 Class exercise 4a You should find that the age coefficients are exactly the same. First, we create the variables: sort persnr time gen chlsat=lsat-lsat[_n-1] if persnr==persnr[_n-1] gen chage=age-age[_n-1] if persnr==persnr[_n-1] (NB: the chage variable is just a constant because each person gets older by one from one year to the next; list persnr time chage) by persnr: egen avlsat=mean(lsat) by persnr: egen avage=mean(age) gen difflsat=lsat-avlsat gen diffage=age-avage Next, we run the three regressions without constant terms (recall that the chage variable is a constant) reg chlsat chage, nocons reg difflsat diffage, nocons reg lsat age dum_1 dum_2 dum_3 dum_4, nocons 21 4.2 Linear regression using panel data (xtreg, fe i()) Fortunately, STATA has a command that: allows us to avoid creating dummy variables for each person corrects the standard errors xt is a prefix that tells STATA we want to estimate a panel data model The fe option tells STATA we want to estimate a fixed effects model in OLS this is equivalent to including dummy variables to control for person-specific effects The i() term tells STATA the variable that identifies each unique person xtreg lsat age , fe i(persnr) 22 23 Note that the age coefficient and t-statistic are exactly the same as in the OLS model that includes person dummies reg lsat age dum_1 dum_2 dum_3 dum_4, nocons There are 12 person-years, 3 persons, and the minimum, average and maximum number of observations per person is 4. 24 Since we are estimating a within-effects model, it is the within R2 that is directly relevant (93.2%). If we used the same independent variables to estimate a “between-effects” model, we would have an R2 of 88.4% (I will explain later what we mean by the “between-effects” model). If we used the same independent variables to estimate a simple OLS model, we would get an R2 of 16.5%. (reg lsat age) The F-statistic is a test that the coefficient(s) on the X variable(s) (i.e., age) are all zero. 25 sigma_u is the standard deviation of the estimates of the fixed effects, ui (u) sigma_e is the standard deviation of the estimates of the residuals, eit (e) rho = u2 / (u2 + e2) = 4.932 / (4.932 + 0.472) = 0.99 26 The correlation between uit and Xit is -0.83. This correlation appears to be high confirming our prior finding that the fixed effects are correlated with age. The F-test allows us to reject the hypothesis that there are no fixed effects. If we had not rejected this hypothesis, we could estimate a simple OLS instead of the fixed-effects model. 27 4.2 Linear regression (predict) running the fixed-effects model, we After can obtain various predicted statistics using the predict command predict , xb predict ,u predict ,e predict , ue 28 4.2 Linear regression (predict) For example: xtreg lsat age , fe i(persnr) drop lsat_hat predict lsat_hat, xb predict lsat_u, u predict lsat_e, e predict lsat_ue, ue Checking that lsat_ue = lsat_u + lsat_e list lsat_u lsat_e lsat_ue Checking that the correlation between uit and Xit is -0.83 corr lsat_hat lsat_u 29 4.2 Linear regression Ihave explained that there are three main advantages of panel data: The larger sample increases power, so the coefficients are estimated more precisely We can estimate models that incorporate dynamic variables (e.g., the effect of growth on audit fees) We can control for unobservable fixed effects (e.g., company-specific or person-specific characteristics) by estimating fixed-effects models. 30 4.2 Linear regression Are there any disadvantages? Yes, unfortunately we cannot investigate the effect of explanatory variables that are held constant over time. From a technical point of view, this is because the time-invariant variable would be perfectly collinear with the person dummies. From an economic point of view, this is because fixed-effect models are designed to study what causes the dependent variable to change within a given person. A time-invariant characteristic cannot cause such a change. 31 4.2 Linear regression For example, suppose that the height of the four persons is constant over the three years. Let’s create a height variable and test the effect of height on life satisfaction gen height=185 if dum_1==1 replace height=180 if dum_2==1 replace height=175 if dum_3==1 replace height=170 if dum_4==1 list persnr height Note that the height variable is a constant for each person. We can estimate the effect of height as long as we do not control for unobservable person-specific effects reg lsat age height 32 4.2 Linear regression If we try to control for person-specific effects by including dummy variables: reg lsat age height dum_1 dum_2 dum_3 dum_4, nocons Note that STATA has to throw away either a dummy variable or the height variable. The reason is that the height variable is collinear with the four dummy variables. The only way we can include dummies for each person is if we do not include the height variable. reg lsat age dum_1 dum_2 dum_3 dum_4, nocons If we try to estimate the effect of height using the xtreg, fe i() command, STATA will inform us that there is a problem of perfect collinearity xtreg lsat age height, fe i( persnr) 33 4.2 Linear regression Note that the height coefficient can be estimated if there is some variation over time for one or more persons. The fixed-effects estimator can exploit this time variation to estimate the effect of height on life satisfaction. For example, suppose that each person became 1cm taller in 1970. replace height= height+1 if time==1970 xtreg lsat age height, fe i( persnr) 34 The xtreg, fe i() command estimates the following fixed-effects model: Recall that we derived this model by taking averages: The averages model is sometimes called the “between” estimator because the comparison is cross-sectional between persons rather than over time. Like OLS, the between estimator provides unbiased estimates of only if the unobservable company-specific component (ui) is uncorrelated with Xit If we wanted to estimate the “between effects” model, the command in STATA is xtreg , be i() xtreg lsat age, be i( persnr) 35 36 Note that the age coefficient is positive the reason is that we are not controlling for person-specific effects, which are correlated with age. therefore, the between-effects estimate of the age coefficient is biased. Since we are estimating a between-effects model, it is the between R2 that is relevant (88.4%). Note that this is also the between-effects R2 that was previously reported using the fixed-effects model. Note that the R2 for the between-effects model is high despite that the age coefficient is severely biased. Again, this reinforces the fact that a high R2 does not imply that the model is well specified. 37 The between estimator is also less efficient than simple OLS because it throws away all the variation over time in the dependent and independent variables. In fact the between estimator is equivalent to estimating an OLS model on the averages for just one year Recall that we have already created averages for the lsat and age variables (avlsat avage) reg avlsat avage if time==1968 reg avlsat avage if time==1969 reg avlsat avage if time==1970 xtreg lsat age height, be i( persnr) Since we actually have three years of data, it seems silly (and it is silly) to throw data away 38 4.2 Linear regression (xtreg) Normally, then, we would never be interested in estimating a between-effects model: The estimates are biased if the person-specific effects are correlated with the X variables The estimates are inefficient because we are ignoring any time-series variation in the data The fixed effects estimator is attractive because it controls for any correlation between ui and Xit An unattractive feature is that it is forced to estimate a fixed parameter for each person or company in the data you can think of these parameters as being the coefficients on the person dummy variables 39 4.2 Linear regression (xtreg) An alternative is the “random effects” model in which the ui are assumed to be randomly distributed with a mean of zero and a constant variance (ui ~ IID(0, 2u) rather than fixed. Intuitively, the random effects model is like having an OLS model where the constant term varies randomly across individuals i. Like simple OLS, the random effects model assumes that there is zero correlation between ui and Xit If ui and Xit are correlated, the random-effects estimates are biased. 40 4.2 Linear regression (xtreg) The random-effects model can be thought of as an intermediate case of OLS and the fixed-effects model: 41 4.2 Linear regression (xtreg) The OLS model corresponds to = 0. The fixed-effects model corresponds to = 1. The random-effects model (0 1) is also known as the “generalized least squares” model (i.e., it is more general than OLS or the fixed-effects model). 42 4.2 Linear regression (xtreg) If we want to estimate a random effects model, the command is xtreg , re i() For example: xtreg lsat age, re i( persnr) Note that because we have controlled for (random) unobserved person effects, the age coefficient is estimated with the correct negative sign. 43 The rest of the output is similar to the fixed- effects model except: We use a Wald statistic instead of an F statistic to test the significance of the independent variables. Here we can reject the hypothesis that age is insignificant. • The Wald statistic is used because only the asymptotic properties of the random-effects estimator are known. The output explicitly tells us that we have imposed the assumption that ui and Xit are uncorrelated. • This is the key difference between the random-effects and fixed-effects models. 44 We can test whether ui and Xit are correlated. If they are correlated, we should use the fixed-effects model rather than OLS or the random-effects model (otherwise the coefficients are biased). If they are not correlated, it is better to use the random-effects model (because it is more efficient). The test was devised by Hausman if ui and Xit are correlated, the random-effects estimates are biased (inconsistent) while the fixed-effects coefficients are unbiased (consistent) • In this case, there will be a large difference between the random- effects and fixed-effects coefficient estimates if ui and Xit are uncorrelated, the random-effects and fixed-effects coefficients are both unbiased (consistent); the fixed-effects coefficients are inefficient while the random-effects coefficients are efficient. • In this case, there will not be a large difference between the random-effects and fixed-effects coefficient estimates The Hausman test indicates whether the two sets of coefficient estimates are significantly different 45 Null hypothesis (H0): ui and Xit are uncorrelated The Hausman statistic is distributed as chi2 and is computed as If the chi2 statistic is positive and statistically significant, we can reject the null hypothesis. This would mean that the fixed-effects model is preferable because the coefficients are consistent. If the chi2 statistic is not positive and statistically significant, we cannot reject the null hypothesis. This would mean that the random- effects model is preferable because the coefficients are consistent and efficient. NB: The (Vc-Ve)-1 matrix is guaranteed to be positive only asymptotically. In small samples, this asymptotic result may not hold in which case the computed chi2 statistic will be negative. 46 4.2 Linear regression (estimates store, hausman) The procedure for executing a Hausman test is as follows: Save the coefficients that are consistent even if the null is not true: • xtreg lsat age, fe i( persnr) • estimates store fixed_effects Save the coefficients that are inconsistent if the null is not true: • xtreg lsat age, re i( persnr) • estimates store random_effects The command for the Hausman test is: • hausman name_consistent name_efficient • hausman fixed_effects random_effects 47 b is the fixed-effects coefficient while B is the random-effects coefficient. The (Vc-Ve)-1 matrix has a negative value on the leading diagonal and, as a result, the square root of the leading diagonal is undefined. This is why the Chi2 statistic is negative. Since the Chi2 statistic is not significantly positive, we might decide that we cannot reject the null hypothesis (see p. 57 of the STATA reference manual for the Hausman test). On the other hand, this result is not very reliable because the asymptotic assumption fails to hold in this small sample. 48 If we reject the null hypothesis that ui and Xit are uncorrelated, the fixed-effects model is preferable to the OLS and random-effects models. If we cannot reject the null hypothesis that ui and Xit are uncorrelated, we need to determine whether the ui are distributed randomly across individuals. Recall that the random-effects model is like having an OLS model where the constant term varies randomly across individuals i. Therefore, we need to test whether there is significant variation in ui across individuals. 49 rho = u2 / (u2 + e2) = 1.032 / (1.032 + 0.472) = 0.83 u2 captures the variation in ui across individuals. If u2 is significantly positive, the random- effects model is preferable to the OLS model. The Breusch and Pagan (1980) Lagrange multiplier test is used to investigate whether u2 is significantly positive. 50 We perform the Breusch-Pagan test by typing xttest0 after xtreg, re Our estimate of u2 is 1.067 (note that u is estimated to be 1.032 which is the same as sigma_u on the previous slide). We are unable to reject the hypothesis that u2 = 0. Therefore, we cannot conclude that the random-effects model is preferable to the OLS model. NB: Our Hausman and LM tests lack power because the sample consists of only 12 observations. In larger samples, we are more likely to reject the hypothesis that u2 = 0 and we are more likely to reject the hypothesis that ui and Xit are uncorrelated. 51 Class exercise 4b Estimate models in which the dependent variable is the log of audit fees. Estimate the models using: OLS without controlling for ui Fixed-effects models Random-effects models How do the coefficient estimates vary across the different models? Which of these models is preferable? 52 Class exercise 4b The lnta coefficients are largest in the OLS model that does not control for ui The lnta coefficients are smallest in the fixed- effects model The Hausman test rejects the hypothesis that ui and Xit are uncorrelated. Therefore, the fixed- effects model is preferable. The LM test rejects the hypothesis that u2 = 0 (given that ui and Xit are significantly correlated, we would not actually need to carry out this test). 53 Class exercise 4b use "C:\phd\Fees.dta", clear gen fye=date(yearend, "mdy") format fye %d gen year=year(fye) sort year gen lnaf=ln(auditfees) gen lnta=ln(totalassets) reg lnaf lnta xtreg lnaf lnta, fe i(companyid) estimates store fixed_effects xtreg lnaf lnta, re i(companyid) estimates store random_effects hausman fixed_effects random_effects xttest0 54 4.2 Linear regression Compared to economics and finance, there are not many accounting studies that exploit panel data in order to control for unobserved company-specific effects (ui). Most studies simply report OLS estimates on the pooled data. Some studies even fail to adjust the OLS standard errors for time-series dependence this can be a very serious mistake especially when the panels are long (e.g., the sample period covers many years). If you use the xtreg command, STATA automatically recognizes that you are using panel data and it will give you the correct standard errors. Therefore, there is no need to use the robust cluster() option and, in fact, there is no robust cluster() option with xtreg • xtreg lnaf lnta, fe i(companyid) robust cluster(companyid) 55 4.2 Linear regression Ke and Petroni (2004) is an example of an accounting study that estimates fixed-effects regressions to control for unobservable company-specific effects. Their dependent variable is the change in the ownership of institutional investors in companies. They test whether there are significant changes in institutional ownership prior to a break in a string of consecutive quarterly earnings increases. Bhattacharya et al. (2003) is an example of an accounting study that estimates fixed-effects regressions to control for unobservable country-specific effects. Their dependent variable is the cost of equity for 34 countries between 1984-1998 (they are using a cross-country panel) They test how earnings opacity at the country level affects the cost of equity They acknowledge that there is a potentially serious problem of omitted variable bias 56 Bhattacharya et al. (2003) argue that they largely avoid this problem because they control for fixed country- specific effects 57 4.2 Linear regression It is important to recognize that the fixed effects estimator relies only on the time-series variation in Y and X within a given company If the extent of time-series variation is small, either or will be close to zero. In this case, the fixed effects estimator is not reliable because there is insufficient variation in either the dependent or treatment variable. 58 4.2 Linear regression As in any model, we require a reasonable amount of variation in the Y and X variables. If either variable displays little variation, the results may be unreliable. We saw an example of this previously. Except for one observation, the independent variable is a constant. As a result the fitted regression line is unreliable. 59 4.2 Linear regression This point was made by Zhou (JFE, 2001) who criticized the use of fixed effects models when the treatment variable is management ownership. Because management ownership usually remains constant from one year to the next, the term is typically equal to zero (or very small). 60 4.3 Logit and probit models When the dependent variable is continuous, it is easy to transform the model such that unobserved firm-specific effects are “washed” away When the dependent variable is binary, the required transformation is different and more complicated if you are interested in the derivation, see the Baltagi textbook (pages 178-180). in the fixed-effects logit, the fixed effects (ui) are not actually estimated, instead they are “conditioned” out of the model. the fixed-effects logit model is not equivalent to logit + dummy variables. 61 4.3 Logit models (xtlogit) We can estimate a fixed-effects logit model using the command xtlogit , fe i() NB: Your version of STATA 9.0 may have a problem with estimating the fixed effects logit model. You can instead use version 8.0 or 10.0. version 8.0 Before we estimate the fixed-effects logit model, we need to understand a complication that arises because the dependent variable is binary. 62 Suppose we have five annual Id Year Y observations on two companies. 1 2000 0 1 2001 0 For company 1, there is no variation in the dependent variable over time 1 2002 0 (Y = 0 in every year). 1 2003 0 A fixed effect for this company will 1 2004 0 perfectly predict the outcome (Y = 0) 2 2000 1 2 2001 1 Consequently, the first company will be dropped from the estimation 2 2002 0 sample. 2 2003 0 In fact, the fixed-effects logit model 2 2004 1 will drop all companies that exhibit no variation in the dependent variable over time. 63 4.3 Logit models (xtlogit) use "C:\phd\xtlogit.dta", clear list The sample consists of three companies. Company 1 exhibits no variation in the dependent variable over time while companies 2 & 3 do exhibit time-series variation. There is no problem estimating this model on the full sample if we do not control for fixed effects logit y x Running a fixed effects logit model results in the first company being thrown away xtlogit y x, fe i(id) 64 4.3 Logit models (xtlogit) In many empirical settings, we are likely to find a large number of companies that exhibit no variation in the binary dependent variable during the sample period. Example #1: Yit = 1 if company i is engaged in fraud in year t; Yit = 0 otherwise. The vast majority of companies do not engage in fraud at any point in time (Yit = 0 for all t). All such non-fraud companies would be dropped from the estimation sample. The estimation sample would include only the companies that commit fraud at some point during the sample period. 65 4.3 Logit models (xtlogit) Example #2: Yit = 1 if company i hires a Big 6 auditor in year t; Yit = 0 if company i hires a non-Big 6 auditor in year t. The vast majority of companies keep the same auditor in the following year and switches between Big 6 and non-Big 6 auditors are especially rare. All companies that do not switch between Big 6 and Non-Big 6 auditors would be dropped from the sample. The estimation sample would include only the companies that switch between Big 6 and Non-Big 6 auditors at some point during the sample period. 66 4.3 Logit models (xtlogit) Alternatively, we can estimate a random-effects logit model using the command xtlogit , re i() The company effects (ui) are now assumed to be random rather than fixed. Consequently, the random effects model does not throw away companies that lack time-series variation in the dependent variable. For example: xtlogit y x, re i(id) 67 The estimation sample is now 15 rather than 10 (i.e., all 3 companies are included in the sample). lnsig2u = ln(u2) = -1.625 sigma_u = u = 0.444 = [exp(-1.625)]0.5 rho = u2 / (u2 + e2) = 0.056 68 If rho = u2 / (u2 + e2) = 0, there would be no variation in the ui across companies (i.e., each company would have the “same” ui). In this case, there would be no need to control for company-specific effects, i.e., we could rely on logit instead of estimating xtlogit , re i() The likelihood-ratio statistic tests the null hypothesis that rho equals zero. If we reject this hypothesis, the random effects model is preferable to ordinary logit. In our data, we are unable to reject, so we could use an ordinary logit model instead of the random effects logit model. This would be a good idea because the ordinary logit is more efficient (fewer parameters need to be estimated). 69 4.3 Logit models (xtlogit) Recall that we previously used a Hausman test to determine whether the xtreg, fe i() or xtreg, re i() model is preferable. Fortunately, we can do the same test in STATA for deciding whether the fixed-effects or random-effects logit models are preferable. The only difference is that we have to use the equations() option with the Hausman test [actually, this point is not explained in the STATA manual but a question and answer were posted about this topic on the statalist (www.stata.com/statalist/archive/2004-01/msg00669.html)] the equations() option specifies, by number, the pairs of equations that are to be compared. usually, we are estimating just one equation in each model, in which case the option is equations(1:1) 70 4.3 Logit models (xtlogit) For example: xtlogit y x, fe i(id) estimates store fixed_effects xtlogit y x, re i(id) estimates store random_effects hausman fixed_effects random_effects STATA is telling us there is an error (we need to specify the equation numbers) hausman fixed_effects random_effects, eq(1:1) The Chi2 statistic is negative (again there is a small sample problem which causes the asymptotic assumption to fail). 71 Class exercise 4c Open the fee.dta data set. Estimate models in which big6 is the dummy dependent variable using: ordinary logit fixed-effects logit random-effects logit Why is the estimation sample much smaller in the fixed effects model? Which of the three models is most preferable? 72 Class exercise 4c use "C:\phd\Fees.dta", clear gen lnta=ln(totalassets) logit big6 lnta, robust cluster(companyid) xtlogit big6 lnta, fe i(companyid) estimates store fixed_effects xtlogit big6 lnta, re i(companyid) estimates store random_effects hausman fixed_effects random_effects, eq(1:1) The estimation sample is much smaller in the fixed effects model because the majority of companies do not switch between Big 6 and Non-Big 6 auditors during the sample period. The likelihood ratio test of rho = 0 indicates that the random-effects model is preferable to the ordinary logit. The Hausman test indicates that the fixed-effects model is preferable to the random-effects logit. 73 4.3 Probit models (xtprobit) Recall that there are two commands available when the dependent variable is binary (“ordinary” logit and probit). There is no command for a fixed-effects probit model because no-one has yet found a transformation that will allow the fixed effects to be “washed” out. If you type xtprobit big6 lnta, fe i(companyid) you will get an error message. A random-effects probit model is available, however: xtprobit big6 lnta, re i(companyid) Just as with the random-effects logit model, there is a likelihood ratio test that helps us to choose between the random-effects probit and the ordinary probit models. In our data, we can reject the hypothesis that rho = 0, so we may decide not to use an ordinary probit model. 74 4.4 Other models Dependent Examples Estimation STATA variable (Y) method(s) Discrete and Method of transport unordered (train, bus, car, bicycle) Multinomial logit mlogit (Y = 0, 1, 2,..) Type of company Multinomial probit mprobit (private, public unquoted, quoted) Discrete and ordered Type of peer review report (adverse, Ordered probit oprobit (Y = 0, 1, 2,..) modified, unmodified) Ordered logit ologit Discrete count data Number of weaknesses disclosed in Poisson poisson (Y = 0, 1, 2, …) peer review report Negative binomial nbreg Continuous and censored Non-audit fees Tobit tobit (kL Y < kH) Football attendance Interval regression intreg Duration data Duration of unemployment (often censored) CEO tenure Cox proportional stcox kL Y < kH Company survival hazards 75 4.4 Other models If you look at the STATA manual for panel data (“Cross-Sectional Time-series”), you will find: Fixed-effects and random-effects models are available for count data (xtpoisson and xtnbreg) • We can test which model is preferable using a Hausman Random-effects models are available for censored data (xttobit and xtintreg) • fixed-effects models are not available • therefore there is no need for a Hausman test 76 4.4 Other models Duration data is, by its very nature, in the form of panel data. What about the multinomial and ordered models that we previously looked at (mlogit, mprobit, ologit, oprobit)? It appears that STATA does not have random- or fixed-effects versions of these models. 77 4.4 Other models You can use the search command in STATA to find out if a command is available. The search command looks through official STATA commands, frequently asked questions (on the STATA website), the STATA journal (SJ) and the STATA technical bulletins (STBs) The SJ and STBs are where you can sometimes find commands that will appear in future versions of STATA search multinomial logit We can find the multinomial logit command but there does not appear to be any command specifically for the multinomial model with panel data 78 4.4 Other models Even if the command you want is not available from STATA, you may be able to find a STATA user who has already written the program that you need. Statalist (www.stata.com/statalist/) is an email listserver where over 2,500 Stata users discuss all things statistical and Stata. Click on “Archives provided by Statacorp” and search the archives 79 4.4 Other models For example, suppose you want to estimate a random-effects ordered probit Typing this into the statalist archive I found that someone has written a program with this command (reoprob) www.stata.com/statalist/archive/2006- 02/msg00509.html The message tells us we can download it to STATA by typing findit reoprob 80 4.4 Other models If you cannot find someone who has already written the program and if it is a command that you really do need, you will either have to write the program yourself or wait for someone else to do it. In fact, it is not too difficult to learn how to write new programs in STATA you would need to take a STATA programming course www.stata.com/netcourse/ • net courses 151 & 152 81 Summary There are three advantages to using panel data: We can control for unobservable fixed effects that might otherwise bias the coefficient estimates. • these unobservable fixed effects can be company-specific, country-specific, or person-specific. The larger sample means that the coefficients are estimated more precisely. We can include lagged or change variables in our models. 82 Summary The xtreg command is used to estimate fixed- effects and random-effects models (where the dependent variable is continuous). We can test whether the fixed-effects or random- effects model is preferable using the hausman test. If there is a significant correlation between ui and Xit, the fixed effects model is preferable to the OLS and random effects models. If there is no significant correlation between ui and Xit, we can test whether the OLS or random-effects model is preferable using a LM test. 83 Summary When the dependent variable is binary we can estimate fixed-effects or random- effects logit models. Again, we can test which model is preferable using a Hausman test. Only the random-effects model is available in the case of the probit model. 84

DOCUMENT INFO

Shared By:

Categories:

Tags:
Sanjoy Ghose, Emerald Group Publishing Limited, Journal Information, European Journal of Marketing, Journal Issue, User Guides, Icon Key, Michael Fay, Issue 10, Adrian Palmer

Stats:

views: | 13 |

posted: | 4/19/2010 |

language: | English |

pages: | 84 |

OTHER DOCS BY liwenting

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.