UNIVERSITY OF OSLO
DEPARTMENT OF ECONOMICS
Exam:
ECON 301B - APPLIED STATISTICS AND ECONOMETRICS
Date of exam: Monday, 9 December 2002
Time for exam: 9 a.m. - 3 p.m
The problem set covers 7 pages including computer output
Resources allowed: All printed books and private notes as well as
calculators.
All questions should be answered.
The grade scale is A,B,C,D,Fail (with A as best grade) .
Scientific journals constitute the medium of communication between scientists, and
also the memory (storage) of science. The economics of (scientific) journals is
interesting. Bergstrom1 argues that journals owned by private publishers are grossly
overpriced, and he recommends several actions to reduce the large profits made by
these publishers. Bergstrom provides data to substantiate his case. There are 180
economic journals in his database, of which 16 are published by scholarly societies
such as the American Economic Association. These 16 journals are published on a
non-profit basis, as opposed to the remaining journals that have private publishers.
We shall concentrate on the following variables:
P : Library subscription price for the journal per year.
Y : Number of libraries subscribing to the journal.
C : Total number of times papers in the journal were cited in 1998.
A : Age of the journal.
N : Number of pages in the journal in 1998.
S : Binary variable (dummy); 1 if non-profit (scholarly society), 0 otherwise.
It is rare that an article in an economics journal is as explicit as Bergstrom in its
policy recommendations aimed at reducing the profits of economic agents, but
Bergstrom clearly has a dual role: as disinterested analyst, and as an academic
economist with an economic interest. In his section “What can we do”, Bergstrom
1
Bergstrom, T.C. 2000. Free Labor for Costly Journals? Journal of Economic Perspectives. 15: 183-
198.
2
suggests: (i) To expand the much cheaper and also generally better non-profit journals
owned by professional societies. (ii) To support new electronic journals. And (iii) to
punish overpriced journals by cancelling library subscriptions, defecting editorial
boards, not sending good papers to these journals, and refuse to referee papers from
them.
(a) In the Figure 1 in the appendix, price P is plotted against number of pages N .
The circles represent non-profit journals. Comment on the graph.
(b) Figure 1 does not show a relationship between P and N that agrees well with
the classical assumptions behind OLS. Why? Explain from the figure why
LP ln( P) might be close to linear in LN ln( N ) , and that the classical
assumptions might be better satisfied on this log-log scale. Use L as a prefix to
denote logged variables throughout.
(c) A matrix of pair-wise scatter plots for logged variables is given in Figure 2 for
non-profit journals, and in Figure 3 for privately published journals. Regarding
LP as the response variable, how does this variable seem to respond to the
other variables? You might comment further on the plots, but be brief.
(d) Consider the regression
LP 1 2 S 3 LN u ,
where u is a stochastic error term, and S is the dummy variable defined on
page 1. The OLS results for this regression are given in Table 1 in the appendix.
Explain what is meant by R-squared and Adj R-squared. What are the
interpretation of 2 and 3 respectively?
(e) A more general model to consider is
LP 1 2 S 3 LN 4 LA 5 LC 6 ( S LN ) 7 ( S LA) 8 (S LC ) 9 LN 2 u
Would you interpret 2 differently for this model than for the model in (d)?
The OLS results for this model are given in Table 2, where SLN S LN etc.,
and LN 2 LN 2 . Calculate a 95% confidence interval for 3 . What is your
point estimate of the elasticity e ln( P) / ln( N ) for private journals of
median number of pages, LN 6.54 ? What is the estimated elasticity for a non-
profit journal of the same size?
(f) A rationale for introducing the interaction term S LC is that private journals
maximize profit, and the more cited a journal is the more valuable it is.
Comment on the estimated signs of 5 and 8 . Discuss also the estimated signs
for the other coefficients.
(g) A third model is obtained by reducing model (e) to
2
3
LP 1 3 LN 4 LA 5 LC 8 ( S LC ) 9 LN 2 u
The results by OLS are given in Table 3. Which of the three models considered
so far would you prefer? Discuss and test!
(h) Table 4 gives the variance inflation factors for model (g). What do these
numbers tell you? Suggest a change of variables that will reduce the unwanted
effects of large inflation factors, but without changing the essence of model (g).
(i) Returning to Bergstrom’s paper. Do you agree that private journals are over-
priced? Based on your preferred model, describe the pricing policy and profit
generation in private journals.
(j) Are economists in academia loyal to their non-profit journals in the sense that
University libraries are more prone to subscribe to a journal published by a
scholarly society when everything else is equal? To address this question, the
following model is considered.
LY 1 2 S 3 LP 4 LN 5 LA 6 LC u
The OLS results for this model are given in Table (5). Discuss the issue raised.
Note that the supplier side in the journal market is a mixed bag. Non-profit
journals are generally priced according to real production cost, with the hard
work of editing and refereeing done on a no-pay basis. These journals are thus
priced with little regard to what could have been their market price.
(k) Inspecting the empirical residuals from model (j), a pattern is noted. The pattern
seems to be
E ln u 2 2.3 0.82 LA 0.42 LC .
This formula is obtained by regression. Several regression models were
attempted to find a reasonable model. Explain why this finding indicates
heteroscedasticity. How can the formula be used to construct weights for a
weighted regression? The results from such a weighted regression is given in
Table 6. Discuss the pros and cons of using this particular weighted regression
rather than the OLS. Which of the 95% confidence intervals for 2 given in
Table 5 and Table 6 respectively will you prefer?
(l) Our data consists of 180 journals in economics. This is pretty much the
collection of academic journals in this field that use the English language. This
collection is thus not a random sample from some existing population. Explain
the statistical meaning of a confidence interval, say that in point (e), and discuss
the difficulties involved in this interpretation since we do not sample in a
simplistic sense.
3
4
APPENDIX
(Output based on Stata)
price_NonProfit price_Private
2120
20
167 2632
N
Figure 1. Price by number of pages for non-profit and private journals.
2.5 3 3.5 4 4.5 5 6 7 8
7
6
LP
5
4
4.5
4
3.5 LA
3
2.5
9
8
LC 7
6
5
8
7
LN
6
5
4 5 6 7 5 6 7 8 9
Figure 2. Scatter plots for logged variables. The plot for LC (on the y-axis)
versus LN is, for example, found in row 3 and column 4. Non-profit journals.
4
5
2 3 4 5 5 6 7 8
8
6
LP
4
2
5
4
LA
3
2
10
8
LC 6
4
2
8
7
LN
6
5
2 4 6 8 2 4 6 8 10
Figure 3. Scatter plots for logged variables. Private journals.
Source | SS df MS Number of obs = 180
-------------+------------------------------ F( 2, 177) = 27.34
Model | 36.8357611 2 18.4178806 Prob > F = 0.0000
Residual | 119.232662 177 .673630857 R-squared = 0.2360
-------------+------------------------------ Adj R-squared = 0.2274
Total | 156.068423 179 .871890631 Root MSE = .82075
------------------------------------------------------------------------------
price LP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
society S | -1.183172 .2207267 -5.36 0.000 -1.618767 -.7475775
pages LN | .812689 .1315476 6.18 0.000 .5530854 1.072293
_cons | .3748265 .8661977 0.43 0.666 -1.334578 2.084231
------------------------------------------------------------------------------
Table 1. Regression results for model (d). OLS. Stata output with variable text added
(price as a reminder that LP is ln( price) etc.).
5
6
Source | SS df MS Number of obs = 180
-------------+------------------------------ F( 8, 171) = 12.50
Model | 57.5912891 8 7.19891114 Prob > F = 0.0000
Residual | 98.4771338 171 .575889671 R-squared = 0.3690
-------------+------------------------------ Adj R-squared = 0.3395
Total | 156.068423 179 .871890631 Root MSE = .75887
------------------------------------------------------------------------------
price LP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
society S | 2.866088 2.946723 0.97 0.332 -2.950549 8.682725
pages LN | -4.143584 2.472981 -1.68 0.096
age LA | -.4891446 .1040427 -4.70 0.000 -.694518 -.2837712
citations LC | -.0161615 .064683 -0.25 0.803 -.1438415 .1115185
SLN | -.4852809 .4685923 -1.04 0.302 -1.410251 .4396893
SLC | -.1073762 .2245599 -0.48 0.633 -.5506426 .3358902
SLA | .0218138 .3993713 0.05 0.957 -.7665187 .8101464
LN2 | .3910415 .1870774 2.09 0.038 .0217631 .7603198
_cons | 17.69035 8.153315 2.17 0.031 1.596248 33.78446
------------------------------------------------------------------------------
Table 2. Regression results for model (e). OLS.
Source | SS df MS Number of obs = 180
-------------+------------------------------ F( 5, 174) = 19.89
Model | 56.7578429 5 11.3515686 Prob > F = 0.0000
Residual | 99.3105799 174 .570750459 R-squared = 0.3637
-------------+------------------------------ Adj R-squared = 0.3454
Total | 156.068423 179 .871890631 Root MSE = .75548
------------------------------------------------------------------------------
price LP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pages LN | -3.750736 2.416129 -1.55 0.122 -8.51943 1.017957
age LA | -.4836628 .0991712 -4.88 0.000 -.6793961 -.2879295
citations LC | -.0071069 .0617511 -0.12 0.909 -.1289846 .1147708
SLC | -.1690801 .0313662 -5.39 0.000 -.2309874 -.1071729
LN2 | .3557514 .1820416 1.95 0.052 -.0035425 .7150454
_cons | 16.57367 7.994926 2.07 0.040 .7941495 32.35318
------------------------------------------------------------------------------
Table 3. Regression results for model (g). OLS.
Variable | VIF 1/VIF
-------------+----------------------
LN2 | 423.92 0.002359
LN | 419.79 0.002382
LC | 2.11 0.474096
LA | 1.24 0.803837
SLC | 1.21 0.823833
-------------+----------------------
Mean VIF | 169.65
Table 4. Variance inflation factors for model (g).
6
7
Source | SS df MS Number of obs = 180
-------------+------------------------------ F( 5, 174) = 56.40
Model | 139.755649 5 27.9511297 Prob > F = 0.0000
Residual | 86.234402 174 .495600012 R-squared = 0.6184
-------------+------------------------------ Adj R-squared = 0.6074
Total | 225.990051 179 1.26251425 Root MSE = .70399
-----------------------------------------------------------------------------------
subscriptions LY | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------------------------------------------------------------
society S | -.2190234 .2061648 -1.06 0.290 -.6259291 .1878823
price LP | -.4394106 .0695569 -6.32 0.000 -.5766944 -.3021268
pages LN | .3482928 .1591566 2.19 0.030 .0341669 .6624188
age LA | .4272673 .0983372 4.34 0.000 .2331801 .6213546
citations LC | .4110117 .0571062 7.20 0.000 .2983018 .5237217
_cons | 1.209037 .8558679 1.41 0.160 -.4801824 2.898256
-----------------------------------------------------------------------------------
Table 5. OLS results for model (j).
. regress LY S LP LN LA LC [weight=w]
(analytic weights assumed)
(sum of wgt is 3.1952e+03)
Source | SS df MS Number of obs = 180
-------------+------------------------------ F( 5, 174) = 84.20
Model | 157.285638 5 31.4571277 Prob > F = 0.0000
Residual | 65.0093556 174 .373616986 R-squared = 0.7076
-------------+------------------------------ Adj R-squared = 0.6992
Total | 222.294994 179 1.24187147 Root MSE = .61124
----------------------------------------------------------------------------------
subscriptions LY | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------------------------------------------------------------------
society S | -.1202159 .1696996 -0.71 0.480 -.4551505 .2147186
price LP | -.4362878 .0587879 -7.42 0.000 -.552317 -.3202587
pages LN | .3057825 .1365795 2.24 0.026 .0362167 .5753484
age LA | .5066978 .095116 5.33 0.000 .3189682 .6944274
citations LC | .4090865 .0528861 7.74 0.000 .3047057 .5134673
_cons | 1.212265 .7035282 1.72 0.087 -.1762821 2.600813
----------------------------------------------------------------------------------
Table 6. Weighted regression results for model (j).
7