Discussion-questions: Philosophy of Science and Theory Building by we9mj6AB

VIEWS: 3 PAGES: 9

									Lab session 13.03.12

Lab-session 4


1. In the attached table are two variables (x and y) with 20 observations each, y is the
   dependent and x the independent variable, i indicates the observation number. Please
   estimate the linear relationship between the 2 variables: calculating the two parameters
   alpha and beta by simple ordinary least square estimation of the model: y_i = alpha +
   beta*x_i + epsilon_i


                       i            x            y
                                1            6           11
                                2           -1           -5
                                3            3            5
                                4           -1          -21
                                5            8           53
                                6            0          -11
                                7           -3            2
                                8            1           10
                                9            4           14
                               10            7           29
                               11            2           23
                               12            1           -4
                               13            7           31
                               14           -4          -10
                               15            4           30
                               16            4           30
                               17            4           22
                               18           -1          -12
                               19           -3          -28
                               20            3           25

2. Calculate the standard error of beta.
3. Calculate a two sided t-test for the significance of beta with a significance level of 5 %!
   Note: the critical values for n=20 (n-2=18) are 2.101 and -2.101
4. Calculate the 95% confidence interval for beta.
5. Calculate the R-squared for the above model.

6. download the file garmit_lab.dta into your m-drive: this file contains a panel of 18 OECD
   countries from 1961 till 1994. Backgroud-reading: Garrett-Mitchell paper on government
   spending (sent around by e-mail)

7. open stata

8. change the working directory to your m drive:
      cd "c:\...\..."
      open the data file in stata:
      use garmit_lab.dta

9. summarize the main characteristics of the interesting variables:
Lab session 13.03.12

   sum spend unem growthpc depratio left cdem trade lowwage fdi skand

10. estimate a basic model for government spending and interpret the regression table:
    Interpret the regression results, coefficients, standard errors, confidence intervals, R², F-
    Test

   reg spend unem growthpc depratio left cdem trade lowwage fdi


11. now change the confidence levels (note the default setting is 95% levels)

       set level 99
       set level 90

estimate the same regression again, what has changed, how and why?

Set the level back to default:

       set level 95

estimate the model again:

   reg spend unem growthpc depratio left cdem trade lowwage fdi


12. Now run the same regression but estimate standardized coefficients – how to interpret
    standardized coefficients? Why do we need standardized coefficients?

   reg spend unem growthpc depratio left cdem trade lowwage fdi, beta


Heteroscedasticity and Omitted Variable Bias

13. calculate predicted values and residuals:

       predict spend_hat
       predict spend_resid, resid

14. Now create scatterplots for the residuals against some of the explanatory variables and the
country codes: what can you see? Omitted variable bias, heteroskedasticity?

       twoway (scatter spend_resid unem)
       twoway (scatter spend_resid cc)
       twoway (scatter spend_resid growthpc)

15. Stata provides build in tests for omitted variable bias and heteroskedasticity: How to
interprete the test results?

       estat hettest
       estat ovtest
       estat szroeter unem growthpc depratio left cdem trade lowwage fdi
Lab session 13.03.12



16. What can we do about Heteroskedasticity and omitted variable bias?

       reg spend unem growthpc depratio left cdem trade lowwage fdi skand

What can be observed, how to interpret the coefficient for "skand"?

17. Now do the same tests again for the new model and interpret the results:

       predict spend_hat2
       predict spend_resid2, resid
       estat hettest
       estat ovtest
       twoway (scatter spend spend_hat2 if year==1984, mlabel(country))
       twoway (scatter spend_resid2 cc)
       twoway (scatter spend_resid2 unem)
       estat szroeter unem growthpc depratio left cdem trade lowwage fdi skand

18. Just treating the standard errors: robust White SEs:

       reg spend unem growthpc depratio left cdem trade lowwage fdi, robust

       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(robust)

       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(cluster cc)
       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(bootstrap)
       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(jacknife)
       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc2)
       reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc3)

19. GLS: robust Huber-White Sandwich Estimator

       xtgls spend unem growthpc depratio left cdem trade lowwage fdi, p(h)


open stata
change the working directory to your m drive:

        cd "c:\...\..."

open the data file in stata:
       use garmit_lab.dta

open log file

       log using lab5.log

estimate a basic model for government spending and interpret the regression table:

       reg spend unem growthpc depratio left cdem trade lowwage fdi
Lab session 13.03.12

Multicollinearity:

calculate correlation coefficients for all explanatory variables, do you find problems of multi-
collinearity?

       corr unem growthpc depratio left cdem trade lowwage fdi skand

What is the trade-off between multi-collinearity and omitted variable bias?

If there is complete multicollinearity we don't have to worry, STATA does the job:

Linear transformations of a variable are perfect multicollinear to the original variable:

       gen unem2=2*unem+3
       corr unem unem2
       reg spend unem unem2 if year==1986

Was does Stata do in case of perfect multicollinearity?

Now look at two variables that are highly correlated but not multicollinear, what is the
problem?

       gen unem3=2*unem+3*invnorm(uniform())
       corr unem unem3
       reg spend unem unem3 if year==1986
       reg spend unem skand if year==1986
       reg spend unem unem3 skand if year==1986


run the original regression again:

       reg spend unem growthpc depratio left cdem trade lowwage fdi

Run a variance inflation factor test for higher order multicollinearity:

       estat vif
       estat vif, uncentered

How to interprete the vif, what does vif measure?



AV-plots (additional) variables are for multivariate models to show what ceteris paribus effect
a single independent variable has:

       avplots
       avplot unem

for linear OLS models the marginal effect of a single independent variable is equal for all
values of the independent variable:
Lab session 13.03.12

       mfx
       mfx compute, at(mean)
       mfx compute, at(median)


Autocorrelation:

now lets look at a single time-series, e.g. Germany (or UK) – what has changed, what is the
difference to the above models?

       reg spend unem growthpc depratio left cdem trade lowwage fdi if
       country=="Germany"

Do the same tests again: what can you observe?

       estat hettest
       estat ovtest

Now let's turn to another violation of Gauss-Markov assumptions: autocorrelation, serial
correlation and test for it: Durbin-Watson statistic and Breusch-Godfrey test, how to interprete
the results?

       estat dwatson
       estat bgodfrey

Or a simpler test: what can we observe?

       predict spend_residger, resid
       gen lagspend_residger=l.spend_residger

       reg spend_residger lagspend_residger unem growthpc depratio left cdem trade
       lowwage fdi if country == "Germany"

the simplest way to deal with serial correlation is to include the lagged values of the
dependent variable to the right hand side of the model – the LDV (lagged dependent variable;
BUT with including and excluding variables from the model and do some more Omitted
variable bias and heteroskedasticity tests…

Now do the same for the UK!

…

: opens another can of worms… We will talk about this in more detail when we talk about
time series models): How to interpret the coefficient of the LDV?

       reg spend spendl unem growthpc depratio left cdem trade lowwage fdi if
       country=="Germany"


And do the tests again:
Lab session 13.03.12

       estat dwatson
       estat bgodfrey

Run a Prais-Winsten model:

       prais spend unem growthpc depratio left cdem trade lowwage fdi if
       country=="Germany"


Play a little around


Interpretatipon of a dummy variable and interaction effects:

       reg spend skand if year==1986
       reg spend unem skand
       gen skand_unem=unem*skand
       reg spend unem skand skand_unem
       reg spend unem skand skand_unem growthpc depratio left cdem trade lowwage fdi



Interaction effects:

       gen unem_trade=unem*trade

       reg spend unem trade growthpc depratio left cdem lowwage fdi

       reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi

       sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi,
       i(unem trade unem_trade)

       sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi,
       i(trade unem unem_trade)

graphical display of IA effects:

       sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi,
       i(unem trade unem_trade) graph

       sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi,
       i(trade unem unem_trade) graph

      sum trade unem
Program marginal effects of IA effects:


capture drop MV-lower

reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi
generate MV=((_n-1)*10)
Lab session 13.03.12

replace   MV=. if _n>17

matrix b=e(b)
matrix V=e(V)

scalar b1=b[1,1]
scalar b2=b[1,2]
scalar b3=b[1,3]

scalar varb1=V[1,1]
scalar varb2=V[2,2]
scalar varb3=V[3,3]

scalar covb1b3=V[1,3]
scalar covb2b3=V[2,3]

scalar list b1 b2 b3 varb1 varb2 varb3 covb1b3 covb2b3

gen conb=b1+b3*MV if _n<=17
gen conse=sqrt(varb1+varb3*(MV^2)+2*covb1b3*MV) if _n<=171

gen a=1.96*conse
gen upper=conb+a
gen lower=conb-a


graph twoway (line conb   MV, clwidth(medium) clcolor(blue)
clcolor(black))/*
            */(line upper MV, clpattern(dash) clwidth(thin)
clcolor(black))/*
             */(line lower MV, clpattern(dash) clwidth(thin)
clcolor(black)) , /*
            */ xlabel(0 20 40 60 80 100 120 140 160, labsize(2.5)) /*
            */ ylabel(-0.5 0 0.5 1 1.5 2,   labsize(2.5)) yscale(noline)
xscale(noline) legend(col(1) order(1 2) label(1 "Marginal Effect of
Unemployment on Spending") label(2 "95% Confidence Interval") /*
            */ label(3 " ")) yline(0, lcolor(black)) title("Marginal Effect
of Unemployment on Spending as Trade Openness changes", size(4))/*
            */ subtitle(" " "Dependent Variable: Government Spending" " ",
size(3)) xtitle( Trade Openness, size(3) ) xsca(titlegap(2))
ysca(titlegap(2)) /*
            */ ytitle("Marginal Effect of Unemployment", size(3))
scheme(s2mono) graphregion(fcolor(white))

graph export      m:\...\unem_trade1.eps, replace

translate @Graph m:\...\unem_trade1.wmf, replace

capture drop MV-lower




or much easier:
download grinter.ado from Fred Boehmke

      net from http://myweb.uiowa.edu/fboehmke/stata/grinter

      reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi
Lab session 13.03.12

        grinter unem, inter (unem_trade) const02(trade) depvar(spend) kdensity yline(0)



test for functional form:

        reg spend unem growthpc depratio left cdem trade lowwage fdi

        acprplot unem, mspline
        acprplot left, mspline

        etc…

        gen ln_unem = ln(unem)
        reg spend ln_unem growthpc depratio left cdem trade lowwage fdi

        gen sqrt_unem=unem^2
        reg spend unem sqrt_unem growthpc depratio left cdem trade lowwage fdi


        what effect can we observe? Calculate the “turning point”!

        gen sqrt_unem=unem^2



Outliers:

test for outliers:

        reg spend unem growthpc depratio left cdem trade lowwage fdi

        dotplot spend
        symplot spend

        rvfplot

        lvr2plot
        lvr2plot, ml(country)

        dfbeta

Solution: jacknife and bootstrapping:

        jacknife _b _se, eclass saving(name.dta): reg spend unem growthpc depratio left cdem
        trade lowwage fdi skand

        bootstrap _b _se, reps(1000) saving(name.dta): reg spend unem growthpc depratio left
        cdem trade lowwage fdi skand
Lab session 13.03.12

								
To top