Stata Tutorial 2
1. Load the data
cd "C:\Documents and Settings\Owner\Desktop"
insheet using survey.csv
2. Regression
reg friendhrs internet socialpos time2school
friendhrs Coef. Std. Err. t P>t [95% Conf. Interval]
internet -.3251806 .8691222 -0.37 0.711 -2.105497 1.455136
socialpos -.1412255 4.701476 -0.03 0.976 -9.771763 9.489312
time2school .3710433 .2256273 1.64 0.111 -.0911332 .8332198
_cons 11.19527 6.42476 1.74 0.092 -1.965257 24.35579
reg friendhrs gamehrs socialpos time2school
friendhrs Coef. Std. Err. t P>t [95% Conf. Interval]
gamehrs 1.838271 .7095352 2.59 0.015 .3824248 3.294117
socialpos -2.493186 4.314445 -0.58 0.568 -11.3457 6.359324
time2school .492213 .2095013 2.35 0.026 .0623518 .9220743
_cons 3.930471 4.580964 0.86 0.398 -5.468892 13.32983
P value: the lower the p value, the less likely the result, assuming the null hypothesis, so the more
significant the result.
3. F test
F test tests the joint significance of the independent variables. When testing the significance of the
goodness of fit, our null hypothesis is that the independent variables jointly equal to zero.
RSSR RSSu / m
F
RSSu / n k
m number of restrictions
k parameters in unrestricted mod el
RSSu unrestricted RSS
RSSR restricted RSS
If our F-statistic is below the critical value we fail to reject the null and therefore we say the
goodness of fit is not significant.
test gamehrs socialpos time2school
(1) gamehrs = 0
(2) socialpos = 0
(3) time2school = 0
F( 3, 27)= 3.59 (check F table, and find that critical value is 2.96)
Prob > F = 0.026
m=3: number of restrictions is 3
n-k=27: df is 27
4. Predicting Y
Obtain predictions:
We have known the coefficient estimates and the x (independent variable) values, we want to find
the values for y.
predict friendhrshat
predict yhat
(note: the two command produce the same results, use “list” command to check)
Calculate standard errors of the predictions
predict e, stdp
5. Ramsey RESET / Davidson MacKinnon specification tests
The RESET test is designed to detect omitted variables and incorrect functional form. It proceeds
as follows:
Suppose we have
After doing OLS, we obtain coefficient estimates, and by using the prediction command which we
mentioned above, we obtain yhat.
Consider the artificial model:
A test for misspecification is a test of against the alternative .
Rejection of the null (which means is different from zero) implies the original model is
inadequate and can be improved. Failure to reject the null says the test has not been able to detect
misspecifications.
Ramsey RESET test using powers of the fitted values of friendhrs
estat ovtest
Ho: model has no omitted variables
F(3, 24) = 2.74
Prob > F = 0.0654
The null is not rejected at 5% level.
If rejected, try to correct the model by including new independent variables or change the
functional form.
6. BP or White test for heteroskedasticity
One of the 4 assumptions for classical linear regression is homoskedasticity. i.e. the variance of
error terms are constant across observations. If the assumptions is violated (heteroskedasticity),
OLS estimator will be biased. We can use BP test or White test to check heteroskedasticity.
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
estat hettest
Ho: Constant variance
Variables: fitted values of friendhrs
chi2(1) = 0.67
Prob > chi2 = 0.4135
The null hypothesis is not rejected, and the variances are constant.
(note: a large p value or a small chi2 value would indicate the null is not rejected,
homoskedasticity assumption holds; a small p value or a large chi2 value indicates
heteroskedasticity is present)
White test for heteroskedasticity
imtest, white
White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(8) = 3.03
Prob > chi2 = 0.9324
Cameron & Trivedi's decomposition of IM-test
Source chi2 df p
Heteroskedasticity 3.03 8 0.9324
Skewness 3.11 3 0.3744
Kurtosis 1.14 1 0.2859
Total 7.28 12 0.8384
7. Robust standard errors
We can use robust standard errors to correct heteroskedaticity. Under contamination, RSE leads a
smaller bias.
reg friendhrs gamehrs socialpos time2school, robust
Robust
friendhrs Coef. Std. Err. t P>t [95% Conf. Interval]
gamehrs 1.838271 .747191 2.46 0.021 .3051614 3.37138
socialpos -2.493186 3.74317 -0.67 0.511 -10.17354 5.187164
time2school .492213 .2217612 2.22 0.035 .0371966 .9472295
_cons 3.930471 4.856262 0.81 0.425 -6.033755 13.8947
Reference:
Baltagi, B. (2001). Econometric Analysis of Panel Data, second edition, New York, John Wiley &
Sons.
Online resources:
www.nd.edu/~rwilliam/stats2/l25.pdf
http://homepages.nyu.edu/~sc129/econometrics_handouts/hetero_tests_stata.pdf
http://www.economics.soton.ac.uk/courses/ECON3012/Lecture2-2.pdf