# Managers' perceptions of product

Document Sample

```					        4. Using panel data
 4.1 The basic idea
 4.2 Linear regression
 4.3 Logit and probit models
 4.4 Other models

1
4.1 The basic idea
 Panel data = data that are pooled for     Company Year
the same companies across time.
 In panel data, there are likely to be
A       1996
unobserved company-specific
characteristics that are relatively
constant over time.                       A       1997
 I have already explained that it is
necessary to control for this time-
series dependence in order to obtain      B       1996
unbiased standard errors.
   In STATA we can do this using the   B       1997
robust cluster () option

2
4.1 The basic idea
   The first advantage of panel data is that we are using a larger
sample compared to the case where we have only one observation
per company.
   The larger sample permits greater estimation power, so the
coefficients can be estimated more precisely.
   Since the standard errors are lower (even when they are adjusted
for time-series dependence), we are more likely to find statistically
significant coefficients.
   use "C:\phd\Fees.dta", clear
   gen fye=date(yearend, "mdy")
   format fye %d
   gen year=year(fye)
   sort year
   gen lnaf=ln(auditfees)
   gen lnta=ln(totalassets)
   by year: reg lnaf lnta, robust cluster(companyid)
   reg lnaf lnta, robust cluster(companyid)

3
4.1 The basic idea
   The second advantage of panel data is that we can
estimate “dynamic” models.
   For example, suppose we believe that audit fees depend
not only on the company’s size but also its rate of growth
   sort companyid fye
   gen growth= lnta- lnta[_n-1] if companyid== companyid[_n-1]
   reg lnaf lnta growth, robust cluster( companyid)
   We find that audit firms offer lower fees to companies
that are growing more quickly
   If we had had only one year of data, we would not have
been able to estimate this model.

4
4.1 The basic idea
   The third – and most important – advantage of panel
data is that we are able to control for unobservable
company-specific effects that are correlated with the
observed explanatory variables

   Let’s assume that the error term has an unobserved
company-specific component that does not vary over
time and an idiosyncratic component that is unique to
each company-year observation:

5
4.1 The basic idea
   Putting the two together:

   Recall that the standard error of  will be biased
if we do not adjust for time-series dependence
   this adjustment is easy using the robust cluster ()
option
   The OLS estimate of the  coefficient will be
unbiased as long as the unobservable company-
specific component (ui) is uncorrelated with Xit

6
4.1 The basic idea
   Unfortunately, the assumption that ui is uncorrelated with
Xit is unlikely to hold in practice.
   If ui is correlated with Xit then it is also correlated with Xit

   The OLS estimate of  will be biased if it is correlated
with Xit (recall our previous discussion and notes on
omitted variable bias)

7
4.1 The basic idea
   An example can illustrate this bias.
   Go to http://ihome.ust.hk/~accl/Phd_teaching.htm
   use "C:\phd\beatles.dta", clear
   list
   This dataset is a panel of four individuals observed over
three years (1968-70)
   In each year they were asked how satisfied they are with
their lives
   this is the lsat variable which takes larger values for increasing
satisfaction
   You want to test how age affects life satisfaction
   reg lsat age
   It appears that they became slightly more satisfied as they got
older.

8
4.1 The basic idea
   Suppose you now include dummy variables for
each individual
   tab persnr, gen(dum_)
   Recall that you must omit one dummy variable
or the intercept in order to avoid perfect
collinearity (see the previous notes about
multicollinearity)
   reg lsat age dum_1 dum_2 dum_4
   reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
 There now appears to be a highly significant
negative impact of age on life satisfaction
 What’s going on here?

9
4.1 The basic idea
   Recall that fitting a simple OLS model (lsat on age) is
equivalent to plotting a line of best fit through the data
 twoway (lfit lsat age) (scatter lsat age)

10
4.1 The basic idea
   I am now going to introduce a new command,
separate , by()
   separate lsat, by(persnr)
   This creates four separate life satisfaction variables
for each of the four individuals
   Now graph the relationship between life
satisfaction and age for each of the four people
   twoway (lfit lsat1 age) (scatter lsat1 age)
   twoway (lfit lsat2 age) (scatter lsat2 age)
   twoway (lfit lsat3 age) (scatter lsat3 age)
   twoway (lfit lsat4 age) (scatter lsat4 age)

11
12
 It is clear that each of the four individuals
became less satisfied as they got older.
 The simple OLS regression was biased because
John and Ringo (who happened to be older)
were generally more satisfied than Paul and
George (who happened to be younger)
 The multiple OLS regression controlled for these
idiosyncratic differences by including dummy
variables for each person
 We can see this by plotting the simple OLS
results and the multiple OLS results
   reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
   predict lsat_hat
   separate lsat_hat, by(persnr)
   twoway (line lsat_hat1-lsat_hat4 age) (lfit lsat age)
(scatter lsat1-lsat4 age)
13
14
4.1 The basic idea
 What does all this have to do with panel data being
 Without panel data we would not have been able to
control for the idiosyncracies of the four individuals.
 If we had had data for only one year, we would not have
known that the age coefficient was biased in the simple
regression.
 We can demonstrate this by running a regression of lsat
on age for each year in the sample
   sort time
   by time: reg lsat age
   Without panel data, we would have incorrectly concluded
that people get happier as they get older

15
4.1 The basic idea
   In the multiple regression, we include dummy variables
(dum_1 dum_2 dum_3 dum_4) which control for the
individual-specific effects (ui)

   Without including the person dummies, our estimate of 
would be biased because the dummies are correlated
with age.
   The person dummies “explain” all the cross-sectional
variation in life satisfaction across the four individuals.
   The only variation that is left is the change in satisfaction
within each person as he gets older.
   Therefore, the model with dummies is sometimes called
the “within” estimator or the “fixed-effects” model.
16
4.1 The basic idea
 In small datasets like this, it is easy to create
dummy variables for each person (or each
company).
 In large datasets, we may have thousands of
individuals or companies.
 The number of variables in STATA is restricted
due to memory limits.
 Also it is not very inconvenient to have results
for thousands of dummy variables (just imagine
how long your log file would be!).

17
4.1 The basic idea
   Instead of including dummy variables, we can control for
idiosyncratic effects by transforming the Y and X variables.

   Taking averages of eq. (1) over time gives:

   Subtracting eq. (2) from eq. (1) gives:

   The key thing to note here is that the individual-specific
effects (ui) have been “differenced out” so they will not bias
our estimate of .
18
4.1 The basic idea
   Another transformation that will do the same trick is to take
differences rather than subtract means

   Lagging by one period

   Subtracting eq. (2) from eq. (1) gives:

   Again the individual-specific effects (ui) have been “differenced out”
so they will not bias our estimate of .

19
Class exercise 4a
   Estimate the following models, where Y = life
satisfaction and X = age.

   Compare the age coefficients in these models to the
age coefficient in the untransformed model with
person dummies (ignore the standard errors of the
age coefficients because they are biased)

20
Class exercise 4a
   You should find that the age coefficients are exactly the same.
   First, we create the variables:
   sort persnr time
   gen chlsat=lsat-lsat[_n-1] if persnr==persnr[_n-1]
   gen chage=age-age[_n-1] if persnr==persnr[_n-1]
   (NB: the chage variable is just a constant because each person gets
older by one from one year to the next; list persnr time chage)
   by persnr: egen avlsat=mean(lsat)
   by persnr: egen avage=mean(age)
   gen difflsat=lsat-avlsat
   gen diffage=age-avage
   Next, we run the three regressions without constant terms (recall
that the chage variable is a constant)
   reg chlsat chage, nocons
   reg difflsat diffage, nocons
   reg lsat age dum_1 dum_2 dum_3 dum_4, nocons

21
4.2 Linear regression using panel
data (xtreg, fe i())
   Fortunately, STATA has a command that:
   allows us to avoid creating dummy variables for each
person
   corrects the standard errors
 xt is a prefix that tells STATA we want to
estimate a panel data model
 The fe option tells STATA we want to estimate a
fixed effects model
   in OLS this is equivalent to including dummy variables
to control for person-specific effects
   The i() term tells STATA the variable that
identifies each unique person
   xtreg lsat age , fe i(persnr)
22
23
   Note that the age coefficient and t-statistic are exactly
the same as in the OLS model that includes person
dummies
   reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
   There are 12 person-years, 3 persons, and the
minimum, average and maximum number of
observations per person is 4.

24
   Since we are estimating a within-effects model, it
is the within R2 that is directly relevant (93.2%).
   If we used the same independent variables to
estimate a “between-effects” model, we would have
an R2 of 88.4% (I will explain later what we mean by
the “between-effects” model).
   If we used the same independent variables to
estimate a simple OLS model, we would get an R2 of
16.5%. (reg lsat age)
   The F-statistic is a test that the coefficient(s) on
the X variable(s) (i.e., age) are all zero.
25
 sigma_u  is the standard deviation of the
estimates of the fixed effects, ui (u)
 sigma_e is the standard deviation of the
estimates of the residuals, eit (e)
 rho = u2 / (u2 + e2)
= 4.932 / (4.932 + 0.472) = 0.99

26
 The  correlation between uit and Xit is -0.83.
 This correlation appears to be high
confirming our prior finding that the fixed
effects are correlated with age.
 The F-test allows us to reject the
hypothesis that there are no fixed effects.
   If we had not rejected this hypothesis, we
could estimate a simple OLS instead of the
fixed-effects model.

27
4.2 Linear regression (predict)
running the fixed-effects model, we
 After
can obtain various predicted statistics
using the predict command

   predict   , xb
   predict   ,u
   predict   ,e
   predict   , ue

28
4.2 Linear regression (predict)
   For example:
   xtreg lsat age , fe i(persnr)
   drop lsat_hat
   predict lsat_hat, xb
   predict lsat_u, u
   predict lsat_e, e
   predict lsat_ue, ue
   Checking that lsat_ue = lsat_u + lsat_e
   list lsat_u lsat_e lsat_ue
   Checking that the correlation between uit and Xit is -0.83
   corr lsat_hat lsat_u

29
4.2 Linear regression
Ihave explained that there are three main
   The larger sample increases power, so the
coefficients are estimated more precisely
   We can estimate models that incorporate
dynamic variables (e.g., the effect of growth
on audit fees)
   We can control for unobservable fixed effects
(e.g., company-specific or person-specific
characteristics) by estimating fixed-effects
models.

30
4.2 Linear regression
 Yes, unfortunately we cannot investigate the
effect of explanatory variables that are held
constant over time.
   From a technical point of view, this is because the
time-invariant variable would be perfectly collinear
with the person dummies.
   From an economic point of view, this is because
fixed-effect models are designed to study what
causes the dependent variable to change within a
given person. A time-invariant characteristic cannot
cause such a change.

31
4.2 Linear regression
   For example, suppose that the height of the four persons
is constant over the three years.
   Let’s create a height variable and test the effect of height
on life satisfaction
   gen height=185 if dum_1==1
   replace height=180 if dum_2==1
   replace height=175 if dum_3==1
   replace height=170 if dum_4==1
   list persnr height
   Note that the height variable is a constant for each
person.
   We can estimate the effect of height as long as we do
not control for unobservable person-specific effects
   reg lsat age height

32
4.2 Linear regression
   If we try to control for person-specific effects by including
dummy variables:
   reg lsat age height dum_1 dum_2 dum_3 dum_4, nocons
   Note that STATA has to throw away either a dummy
variable or the height variable.
   The reason is that the height variable is collinear with the
four dummy variables.
   The only way we can include dummies for each person
is if we do not include the height variable.
   reg lsat age dum_1 dum_2 dum_3 dum_4, nocons
   If we try to estimate the effect of height using the
xtreg, fe i() command, STATA will inform us that there is
a problem of perfect collinearity
   xtreg lsat age height, fe i( persnr)

33
4.2 Linear regression
 Note that the height coefficient can be estimated
if there is some variation over time for one or
more persons.
 The fixed-effects estimator can exploit this time
variation to estimate the effect of height on life
satisfaction.
 For example, suppose that each person became
1cm taller in 1970.
   replace height= height+1 if time==1970
   xtreg lsat age height, fe i( persnr)

34
   The xtreg, fe i() command estimates the following fixed-effects
model:

   Recall that we derived this model by taking averages:

 The averages model is sometimes called the “between”
estimator because the comparison is cross-sectional between
persons rather than over time.
 Like OLS, the between estimator provides unbiased estimates
of  only if the unobservable company-specific component (ui)
is uncorrelated with Xit
 If we wanted to estimate the “between effects” model, the
command in STATA is xtreg , be i()
 xtreg lsat age, be i( persnr)

35
36
   Note that the age coefficient is positive
   the reason is that we are not controlling for person-specific
effects, which are correlated with age.
   therefore, the between-effects estimate of the age coefficient is
biased.
   Since we are estimating a between-effects model, it is
the between R2 that is relevant (88.4%).
   Note that this is also the between-effects R2 that was previously
reported using the fixed-effects model.
   Note that the R2 for the between-effects model is high
despite that the age coefficient is severely biased. Again,
this reinforces the fact that a high R2 does not imply that
the model is well specified.

37
   The between estimator is also less efficient than simple
OLS because it throws away all the variation over time in
the dependent and independent variables.
   In fact the between estimator is equivalent to estimating
an OLS model on the averages for just one year
   Recall that we have already created averages for the lsat
and age variables (avlsat avage)
   reg avlsat avage if time==1968
   reg avlsat avage if time==1969
   reg avlsat avage if time==1970
   xtreg lsat age height, be i( persnr)
   Since we actually have three years of data, it seems silly
(and it is silly) to throw data away

38
4.2 Linear regression (xtreg)
   Normally, then, we would never be interested in
estimating a between-effects model:
   The estimates are biased if the person-specific effects are
correlated with the X variables
   The estimates are inefficient because we are ignoring any
time-series variation in the data
   The fixed effects estimator is attractive because it
controls for any correlation between ui and Xit
   An unattractive feature is that it is forced to estimate a
fixed parameter for each person or company in the data
   you can think of these parameters as being the coefficients
on the person dummy variables

39
4.2 Linear regression (xtreg)
 An alternative is the “random effects” model in
which the ui are assumed to be randomly
distributed with a mean of zero and a constant
variance (ui ~ IID(0, 2u) rather than fixed.
 Intuitively, the random effects model is like
having an OLS model where the constant term
varies randomly across individuals i.
 Like simple OLS, the random effects model
assumes that there is zero correlation between
ui and Xit
 If ui and Xit are correlated, the random-effects
estimates are biased.

40
4.2 Linear regression (xtreg)
   The random-effects model can be thought of as an
intermediate case of OLS and the fixed-effects model:

41
4.2 Linear regression (xtreg)

   The OLS model corresponds to  = 0.

   The fixed-effects model corresponds to  = 1.

   The random-effects model (0    1) is also known as
the “generalized least squares” model (i.e., it is more
general than OLS or the fixed-effects model).

42
4.2 Linear regression (xtreg)
 If we want to estimate a random effects model,
the command is xtreg , re i()
 For example:
   xtreg lsat age, re i( persnr)

   Note that because we have controlled for (random)
unobserved person effects, the age coefficient is
estimated with the correct negative sign.

43
   The rest of the output is similar to the fixed-
effects model except:
   We use a Wald statistic instead of an F statistic to test
the significance of the independent variables. Here
we can reject the hypothesis that age is insignificant.
• The Wald statistic is used because only the asymptotic
properties of the random-effects estimator are known.
   The output explicitly tells us that we have imposed the
assumption that ui and Xit are uncorrelated.
• This is the key difference between the random-effects and
fixed-effects models.

44
   We can test whether ui and Xit are correlated.
   If they are correlated, we should use the fixed-effects model
rather than OLS or the random-effects model (otherwise the
coefficients are biased).
   If they are not correlated, it is better to use the random-effects
model (because it is more efficient).
   The test was devised by Hausman
   if ui and Xit are correlated, the random-effects estimates are
biased (inconsistent) while the fixed-effects coefficients are
unbiased (consistent)
• In this case, there will be a large difference between the random-
effects and fixed-effects coefficient estimates
   if ui and Xit are uncorrelated, the random-effects and fixed-effects
coefficients are both unbiased (consistent); the fixed-effects
coefficients are inefficient while the random-effects coefficients
are efficient.
• In this case, there will not be a large difference between the
random-effects and fixed-effects coefficient estimates
   The Hausman test indicates whether the two sets of
coefficient estimates are significantly different
45
   Null hypothesis (H0): ui and Xit are uncorrelated
   The Hausman statistic is distributed as chi2 and is computed as

   If the chi2 statistic is positive and statistically significant, we can
reject the null hypothesis. This would mean that the fixed-effects
model is preferable because the coefficients are consistent.
   If the chi2 statistic is not positive and statistically significant, we
cannot reject the null hypothesis. This would mean that the random-
effects model is preferable because the coefficients are consistent
and efficient.
   NB: The (Vc-Ve)-1 matrix is guaranteed to be positive only
asymptotically. In small samples, this asymptotic result may not hold
in which case the computed chi2 statistic will be negative.
46
4.2 Linear regression
(estimates store, hausman)
   The procedure for executing a Hausman test is
as follows:
   Save the coefficients that are consistent even if the
null is not true:
• xtreg lsat age, fe i( persnr)
• estimates store fixed_effects
   Save the coefficients that are inconsistent if the null is
not true:
• xtreg lsat age, re i( persnr)
• estimates store random_effects
   The command for the Hausman test is:
• hausman name_consistent name_efficient
• hausman fixed_effects random_effects
47
   b is the fixed-effects coefficient while B is the random-effects
coefficient.
   The (Vc-Ve)-1 matrix has a negative value on the leading diagonal
and, as a result, the square root of the leading diagonal is
undefined. This is why the Chi2 statistic is negative.
   Since the Chi2 statistic is not significantly positive, we might decide
that we cannot reject the null hypothesis (see p. 57 of the STATA
reference manual for the Hausman test).
   On the other hand, this result is not very reliable because the
asymptotic assumption fails to hold in this small sample.
48
 If we reject the null hypothesis that ui and Xit are
uncorrelated, the fixed-effects model is
preferable to the OLS and random-effects
models.
 If we cannot reject the null hypothesis that ui and
Xit are uncorrelated, we need to determine
whether the ui are distributed randomly across
individuals.
 Recall that the random-effects model is like
having an OLS model where the constant term
varies randomly across individuals i.
 Therefore, we need to test whether there is
significant variation in ui across individuals.
49
   rho = u2 / (u2 + e2)
= 1.032 / (1.032 + 0.472)
= 0.83
   u2 captures the
variation in ui across
individuals.
   If u2 is significantly
positive, the random-
effects model is
preferable to the OLS
model.
   The Breusch and
Pagan (1980)
Lagrange multiplier test
is used to investigate
whether u2 is
significantly positive.

50
 We perform the Breusch-Pagan test by
typing xttest0 after xtreg, re
 Our estimate of u2 is 1.067 (note that
u is estimated to be 1.032 which is the
same as sigma_u on the previous
slide).
 We are unable to reject the hypothesis
that u2 = 0. Therefore, we cannot
conclude that the random-effects
model is preferable to the OLS model.
 NB: Our Hausman and LM tests lack
power because the sample consists of
only 12 observations. In larger
samples, we are more likely to reject
the hypothesis that u2 = 0 and we are
more likely to reject the hypothesis that
ui and Xit are uncorrelated.

51
Class exercise 4b
 Estimate  models in which the dependent
variable is the log of audit fees.
 Estimate the models using:
   OLS without controlling for ui
   Fixed-effects models
   Random-effects models
 How  do the coefficient estimates vary
across the different models?
 Which of these models is preferable?

52
Class exercise 4b
 The lnta coefficients are largest in the OLS
model that does not control for ui
 The lnta coefficients are smallest in the fixed-
effects model
 The Hausman test rejects the hypothesis that ui
and Xit are uncorrelated. Therefore, the fixed-
effects model is preferable.
 The LM test rejects the hypothesis that u2 = 0
(given that ui and Xit are significantly correlated,
we would not actually need to carry out this test).
53
Class exercise 4b
   use "C:\phd\Fees.dta", clear
   gen fye=date(yearend, "mdy")
   format fye %d
   gen year=year(fye)
   sort year
   gen lnaf=ln(auditfees)
   gen lnta=ln(totalassets)
   reg lnaf lnta
   xtreg lnaf lnta, fe i(companyid)
   estimates store fixed_effects
   xtreg lnaf lnta, re i(companyid)
   estimates store random_effects
   hausman fixed_effects random_effects
   xttest0

54
4.2 Linear regression
   Compared to economics and finance, there are not many
accounting studies that exploit panel data in order to
control for unobserved company-specific effects (ui).
   Most studies simply report OLS estimates on the pooled
data.
   Some studies even fail to adjust the OLS standard errors
for time-series dependence
   this can be a very serious mistake especially when the panels
are long (e.g., the sample period covers many years).
   If you use the xtreg command, STATA automatically recognizes
that you are using panel data and it will give you the correct
standard errors.
   Therefore, there is no need to use the robust cluster() option
and, in fact, there is no robust cluster() option with xtreg
• xtreg lnaf lnta, fe i(companyid) robust cluster(companyid)

55
4.2 Linear regression
   Ke and Petroni (2004) is an example of an accounting
study that estimates fixed-effects regressions to control
for unobservable company-specific effects.
   Their dependent variable is the change in the ownership of
institutional investors in companies.
   They test whether there are significant changes in institutional
ownership prior to a break in a string of consecutive quarterly
earnings increases.
   Bhattacharya et al. (2003) is an example of an
accounting study that estimates fixed-effects regressions
to control for unobservable country-specific effects.
   Their dependent variable is the cost of equity for 34 countries
between 1984-1998 (they are using a cross-country panel)
   They test how earnings opacity at the country level affects the
cost of equity
   They acknowledge that there is a potentially serious problem of
omitted variable bias
56
   Bhattacharya et al. (2003) argue that they largely avoid
this problem because they control for fixed country-
specific effects

57
4.2 Linear regression
   It is important to recognize that the fixed effects
estimator relies only on the time-series variation
in Y and X within a given company

 If the extent of time-series variation is small,
either          or          will be close to zero.
 In this case, the fixed effects estimator is not
reliable because there is insufficient variation in
either the dependent or treatment variable.

58
4.2 Linear regression
 As in any model, we require a reasonable amount of
variation in the Y and X variables.
 If either variable displays little variation, the results may
be unreliable.

 We saw an example of this
previously.
 Except for one
observation, the
independent variable is a
constant.
 As a result the fitted
regression line is
unreliable.
59
4.2 Linear regression
   This point was made by Zhou (JFE, 2001) who criticized
the use of fixed effects models when the treatment
variable is management ownership.
   Because management ownership usually remains
constant from one year to the next, the          term is
typically equal to zero (or very small).

60
4.3 Logit and probit models
   When the dependent variable is continuous, it is easy to
transform the model such that unobserved firm-specific
effects are “washed” away

   When the dependent variable is binary, the required
transformation is different and more complicated
   if you are interested in the derivation, see the Baltagi textbook
(pages 178-180).
   in the fixed-effects logit, the fixed effects (ui) are not actually
estimated, instead they are “conditioned” out of the model.
   the fixed-effects logit model is not equivalent to logit + dummy
variables.

61
4.3 Logit models (xtlogit)
   We can estimate a fixed-effects logit model
using the command xtlogit , fe i()
   NB: Your version of STATA 9.0 may have a problem
with estimating the fixed effects logit model. You can
instead use version 8.0 or 10.0.
version 8.0
 Before we estimate the fixed-effects logit model,
we need to understand a complication that
arises because the dependent variable is binary.

62
   Suppose we have five annual               Id   Year   Y
observations on two companies.            1    2000   0
1    2001   0
   For company 1, there is no variation
in the dependent variable over time       1    2002   0
(Y = 0 in every year).                    1    2003   0

   A fixed effect for this company will      1    2004   0
perfectly predict the outcome (Y = 0)     2    2000   1
2    2001   1
   Consequently, the first company will
be dropped from the estimation            2    2002   0
sample.                                   2    2003   0

   In fact, the fixed-effects logit model    2    2004   1
will drop all companies that exhibit no
variation in the dependent variable
over time.

63
4.3 Logit models (xtlogit)
   use "C:\phd\xtlogit.dta", clear
   list
 The sample consists of three companies.
 Company 1 exhibits no variation in the
dependent variable over time while companies 2
& 3 do exhibit time-series variation.
 There is no problem estimating this model on
the full sample if we do not control for fixed
effects
   logit y x
   Running a fixed effects logit model results in the
first company being thrown away
   xtlogit y x, fe i(id)

64
4.3 Logit models (xtlogit)
   In many empirical settings, we are likely to find a large
number of companies that exhibit no variation in the
binary dependent variable during the sample period.
   Example #1:
   Yit = 1 if company i is engaged in fraud in year t; Yit = 0
otherwise.
   The vast majority of companies do not engage in fraud at any
point in time (Yit = 0 for all t).
   All such non-fraud companies would be dropped from the
estimation sample.
   The estimation sample would include only the companies that
commit fraud at some point during the sample period.

65
4.3 Logit models (xtlogit)
   Example #2:
   Yit = 1 if company i hires a Big 6 auditor in year t; Yit =
0 if company i hires a non-Big 6 auditor in year t.
   The vast majority of companies keep the same
auditor in the following year and switches between
Big 6 and non-Big 6 auditors are especially rare.
   All companies that do not switch between Big 6 and
Non-Big 6 auditors would be dropped from the
sample.
   The estimation sample would include only the
companies that switch between Big 6 and Non-Big 6
auditors at some point during the sample period.

66
4.3 Logit models (xtlogit)
 Alternatively, we can estimate a random-effects
logit model using the command xtlogit , re i()
 The company effects (ui) are now assumed to be
random rather than fixed.
 Consequently, the random effects model does
not throw away companies that lack time-series
variation in the dependent variable.
 For example:
   xtlogit y x, re i(id)

67
   The estimation sample is now 15 rather than 10 (i.e., all 3
companies are included in the sample).
   lnsig2u = ln(u2) = -1.625
   sigma_u = u = 0.444 = [exp(-1.625)]0.5
   rho = u2 / (u2 + e2) = 0.056

68
   If rho = u2 / (u2 + e2) = 0, there would be no variation in the ui
across companies (i.e., each company would have the “same” ui).
   In this case, there would be no need to control for company-specific
effects, i.e., we could rely on logit instead of estimating xtlogit , re i()
   The likelihood-ratio statistic tests the null hypothesis that rho equals
zero.
   If we reject this hypothesis, the random effects model is preferable
to ordinary logit.
   In our data, we are unable to reject, so we could use an ordinary
logit model instead of the random effects logit model. This would be
a good idea because the ordinary logit is more efficient (fewer
parameters need to be estimated).

69
4.3 Logit models (xtlogit)
   Recall that we previously used a Hausman test to
determine whether the xtreg, fe i() or xtreg, re i() model
is preferable.
   Fortunately, we can do the same test in STATA for
deciding whether the fixed-effects or random-effects logit
models are preferable.
   The only difference is that we have to use the
equations() option with the Hausman test
   [actually, this point is not explained in the STATA manual but a
(www.stata.com/statalist/archive/2004-01/msg00669.html)]
   the equations() option specifies, by number, the pairs of
equations that are to be compared.
   usually, we are estimating just one equation in each model, in
which case the option is equations(1:1)

70
4.3 Logit models (xtlogit)
   For example:
   xtlogit y x, fe i(id)
   estimates store fixed_effects
   xtlogit y x, re i(id)
   estimates store random_effects
   hausman fixed_effects random_effects
   STATA is telling us there is an error (we need to
specify the equation numbers)
   hausman fixed_effects random_effects, eq(1:1)
   The Chi2 statistic is negative (again there is a
small sample problem which causes the
asymptotic assumption to fail).
71
Class exercise 4c
 Open the fee.dta data set.
 Estimate models in which big6 is the dummy
dependent variable using:
   ordinary logit
   fixed-effects logit
   random-effects logit
   Why is the estimation sample much smaller in the
fixed effects model?
   Which of the three models is most preferable?

72
Class exercise 4c
   use "C:\phd\Fees.dta", clear
   gen lnta=ln(totalassets)
   logit big6 lnta, robust cluster(companyid)
   xtlogit big6 lnta, fe i(companyid)
   estimates store fixed_effects
   xtlogit big6 lnta, re i(companyid)
   estimates store random_effects
   hausman fixed_effects random_effects, eq(1:1)
   The estimation sample is much smaller in the fixed
effects model because the majority of companies do not
switch between Big 6 and Non-Big 6 auditors during the
sample period.
   The likelihood ratio test of rho = 0 indicates that the
random-effects model is preferable to the ordinary logit.
   The Hausman test indicates that the fixed-effects model
is preferable to the random-effects logit.
73
4.3 Probit models (xtprobit)
 Recall that there are two commands available when the
dependent variable is binary (“ordinary” logit and probit).
 There is no command for a fixed-effects probit model
because no-one has yet found a transformation that will
allow the fixed effects to be “washed” out.
 If you type xtprobit big6 lnta, fe i(companyid) you will get
an error message.
 A random-effects probit model is available, however:
   xtprobit big6 lnta, re i(companyid)
   Just as with the random-effects logit model, there is a likelihood
ratio test that helps us to choose between the random-effects
probit and the ordinary probit models.
   In our data, we can reject the hypothesis that rho = 0, so we may
decide not to use an ordinary probit model.

74
4.4 Other models
Dependent              Examples                               Estimation            STATA
variable (Y)                                                  method(s)
Discrete and           Method of transport
unordered              (train, bus, car, bicycle)             Multinomial logit     mlogit
(Y = 0, 1, 2,..)       Type of company                        Multinomial probit    mprobit
(private, public unquoted, quoted)
Discrete and ordered   Type of peer review report (adverse,   Ordered probit        oprobit
(Y = 0, 1, 2,..)       modified, unmodified)                  Ordered logit         ologit

Discrete count data    Number of weaknesses disclosed in      Poisson               poisson
(Y = 0, 1, 2, …)       peer review report                     Negative binomial     nbreg

Continuous and
censored               Non-audit fees                         Tobit                 tobit
(kL  Y < kH)          Football attendance                    Interval regression   intreg

Duration data          Duration of unemployment
(often censored)       CEO tenure                             Cox proportional      stcox
kL  Y < kH            Company survival                       hazards
75
4.4 Other models
   If you look at the STATA manual for panel data
(“Cross-Sectional Time-series”), you will find:
   Fixed-effects and random-effects models are
available for count data (xtpoisson and xtnbreg)
• We can test which model is preferable using a Hausman

   Random-effects models are available for censored
data (xttobit and xtintreg)
• fixed-effects models are not available
• therefore there is no need for a Hausman test

76
4.4 Other models
 Duration  data is, by its very nature, in the
form of panel data.
 What about the multinomial and ordered
models that we previously looked at
(mlogit, mprobit, ologit, oprobit)? It
appears that STATA does not have
random- or fixed-effects versions of these
models.

77
4.4 Other models
 You can use the search command in STATA to
find out if a command is available.
 The search command looks through official
(on the STATA website), the STATA journal (SJ)
and the STATA technical bulletins (STBs)
 The SJ and STBs are where you can sometimes
find commands that will appear in future
versions of STATA
   search multinomial logit
   We can find the multinomial logit command but there
does not appear to be any command specifically for
the multinomial model with panel data
78
4.4 Other models
 Even if the command you want is not available
from STATA, you may be able to find a STATA
user who has already written the program that
you need.
 Statalist (www.stata.com/statalist/) is an email
listserver where over 2,500 Stata users discuss
all things statistical and Stata.
 Click on “Archives provided by Statacorp” and
search the archives

79
4.4 Other models
 For example, suppose you want to estimate a
random-effects ordered probit
 Typing this into the statalist archive I found that
someone has written a program with this
command (reoprob)
www.stata.com/statalist/archive/2006-
02/msg00509.html
STATA by typing
   findit reoprob

80
4.4 Other models

 If you cannot find someone who has already
written the program and if it is a command that
you really do need, you will either have to write
the program yourself or wait for someone else to
do it.
 In fact, it is not too difficult to learn how to write
new programs in STATA
   you would need to take a STATA programming
course
   www.stata.com/netcourse/
• net courses 151 & 152
81
Summary
   There are three advantages to using panel data:
   We can control for unobservable fixed effects that
might otherwise bias the coefficient estimates.
• these unobservable fixed effects can be company-specific,
country-specific, or person-specific.
   The larger sample means that the coefficients are
estimated more precisely.
   We can include lagged or change variables in our
models.

82
Summary
   The xtreg command is used to estimate fixed-
effects and random-effects models (where the
dependent variable is continuous).
   We can test whether the fixed-effects or random-
effects model is preferable using the hausman test.
   If there is a significant correlation between ui and Xit,
the fixed effects model is preferable to the OLS and
random effects models.
   If there is no significant correlation between ui and Xit,
we can test whether the OLS or random-effects model
is preferable using a LM test.

83
Summary
 When  the dependent variable is binary we
can estimate fixed-effects or random-
effects logit models.
   Again, we can test which model is preferable
using a Hausman test.
   Only the random-effects model is available in
the case of the probit model.

84

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 13 posted: 4/19/2010 language: English pages: 84