Joint Funding Bodies’ Review of Research Assessment
Response from the Humanities Division – University of Oxford
(Written by Ralph Walker and endorsed by the Divisional Board)
In the hope of aiding comparison with the responses from different groups within the
University I shall follow, as far as possible, the order of paragraphs in Annex B of the
“Invitation to Contribute”.
7. (a) Should the assessments be prospective, retrospective or a combination of the two?
A combination of the two. It would be pointless to reward past activity if that activity were
expected to cease. It would also be unwise to base an assessment entirely on promise alone.
(b) What objective data should assessors consider?
This is difficult to answer without seeming unhelpful, so perhaps it is best to give the
impression of unhelpfulness at the beginning, and to say that the very conception of objective
data is a difficult one. For the purposes of a certain sort of experiment, in which the range of
possible results is quite clear, one can think of the experimental data as straightforwardly
objective, but when the questions themselves are not completely clear the notion of objective
data can become unhelpful and misleading. It is not that the data lack objectivity in
themselves, but that the principles for selecting them are liable to carry with them
assumptions and expectations that may not be appropriate. The types of quantitative data that
are normally suggested in this context are open to this objection. The judgement involved in
choosing, as an indicator of research quality, any one or any weighted group of these factors
is thoroughly contestable (and, of course, in each case it has been thoroughly contested). It is
a mistake to think that by collecting quantitative data one can achieve an objectivity that will
turn aside the accusation that judgement is involved, for in matters like these judgement must
always be involved.
The assessors should determine what quantitative data they require, and should treat them
with considerable caution; and if they are indeed experts that is what they will do. No doubt
they would be interested in volume of publication, for instance, and they might also be
interested in bibliometric measures, but they would handle the data about these in full
knowledge of how misleading they can be.
(c) At what level should assessments be made – individuals, groups, departments,
research institutes, or higher education institutions?
Assessment at the level of departments has become generally accepted both here and in the
States; people find the information useful at that level. Sensitivity to cross-departmental
activity must of course be built in.
(d) Is there an alternative to organising the assessment around subjects or thematic
areas? If this is unavoidable, roughly how many should there be?
It is not clear what alternative would be equally helpful. We have some reservations about the
exact Units of Assessment used by the 2001 RAE, but we do think the number is
(e) What are the major strengths and weaknesses of this approach?
No approach is infallible, nor is anyone’s judgement. But if you want to get a sound
assessment it makes sense to ask the experts. The risk, of course, is that you fail to choose
your experts properly, and choose instead a team of yesterday’s men. This risk however
should be a lot easier to avoid than the risks attendant on the other approaches.
It is not obvious, however, that the way to use the experts is to set up RAE-like teams of
people who assess everyone’s work at longish intervals. An obvious alternative is to ask the
experts in each department at each university to give an annual rating of how they think of all
the other departments in the same subject at other institutions of the same general type (where
research-orientated universities might constitute a type). The main disadvantage of this
approach is of course that it does not involve a massive waste of public money, and that has
so far proved to be a decisive objection.
10. (a) Is it, in principle, acceptable to assess research entirely on the basis of metrics?
No. See 7 (b) above. However it is slightly puzzling to see included in the list in paragraph 8
“measures of reputation based on surveys”. If what is meant here is the sort of assessment that
is widely relied on in the United States, yielding the Peer Assessment Scores given by US
News and World Report or Leiter’s Gourmet Index for Philosophy, then it does seem a
promising method to adopt, but exactly because it is not simply “automatic”, and does “leave
room for subjective assessment”. The system works by collecting a wide range of
assessments, which (being made by people) are presumably “subjective”, but it does not
collect assessments from everybody: it collects them from people who are supposed to know
what they are talking about (and deciding who those are requires judgement again).
As it is operated in the States this system is not perfect, but it provides a model that could be
refined and developed. It is unfortunate that US News itself gets carried away by the search
for a spurious objectivity, supposedly to be gained by producing overall institutional scores
that take into account a variety of unsuitable quantitative measures as well as the Peer
Assessment Scores. However that is largely because US News is trying to do a variety of other
things besides assessing research, so as to provide guidance for various kinds of student about
what to expect.
In any assessment that works along these lines, much depends on the proper selection of
experts. They must genuinely be expert, and their judgement must be based on a familiarity
with the work of the departments they are assessing, including the work of younger scholars.
This however is a requirement in any system that is to be effective.
(b) What metrics are available?
Setting aside “measures of reputation based on surveys”, the list of metrics in paragraph 8
might make some sense for certain Science subjects but certainly does not for the Humanities.
External research income is no sort of guide to our research activity (despite the insistence of
certain bodies in this University on describing it as such). It is not even an inverse reflection
of reseach activity. It is presumably not necessary to rehearse the objections to counting
publications, and it is no doubt boring to reiterate that the nature of “publication” is changing.
But it may be worth saying that it is changing at different rates and in different ways in
different subjects. Publication is a matter of dissemination of ideas to the learned world, and
in certain Humanities subjects (but not equally in all) this is now largely done by putting the
latest cutting-edge material on the web. Material published in traditional form may be out of
date by the time it is printed, so that it is the latest web version that matters. What traditional
publishers can certainly claim is that they provide a way in which people who are relatively
unknown can establish themselves. However this can be done in other ways as well – for
example by successful presentations at major conferences. Traditional publication, never
much of a guide to research quality, is becoming less and less significant except in so far as it
has been given an artificial importance by exercises like the RAE.
(c) Can the available metrics be combined to provide an accurate picture of the location
of research strength?
No. See 7 (b) and 10 (a) above.
(d) If funding were tied to the available metrics, what effects would this have upon
behaviour? Would the metrics themselves continue to be reliable?
If external research income were the metric, it would stimulate the effort to obtain research
grants, as is already happening here because of the RAM. Up to a certain point this is of value
in most subjects, but a substantial proportion of the best research in the Humanities is always
going to be done in a manner largely independent of research grants. Moreover, if external
research income were used as a basis for differential funding of subjects, the Humanities
would suffer because they can never attract grants of the size needed in Medicine and the
The use of publications as a metric would further encourage the present drive to churn out
material, inundating the world with vastly more articles and books than are needed, most of
them making little real contribution because too hurriedly put together in order to meet the
demands of the RAE. They do, however, quote one another assiduously, thus rendering
worthless any attempt at assessment through citations. It is with the bibliometric measures
most particularly that the adoption of the measure reduces any reliability that the measure
might otherwise be thought to have had.
It may be the case in the Sciences, or in some Sciences, that the number of research students
is a reliable guide to the research that is being done. The relationship between research
students and supervisors is entirely different in the Humanities, and counting research
students would be of very little value.
I imagine that “measures of financial sustainability” is included as a joke. By that standard,
the fact that all UK universities are practically bankrupt would presumably establish that all
their research was worthless.
(e) What are the major strengths and weaknesses of this approach?
If we leave out the American use of surveys, which seems cheap, effective, and as reliable as
any other method we are likely to think of, I do not think this approach has any strengths.
Some would claim that its strength lies in its objectivity, but that objectivity is spurious, as I
have pointed out. For its weaknesses, see above.
I can see so little in favour of this approach that I shall not attempt the questions individually.
Of course the key is question (d), How might we credibly validate institutions’ own
assessment of their own work?. The last RAE asked for honesty in self-assessments; such
honesty is not very conspicuous. Self-assessments advantage those who are best at spinning,
because even the most expert of experts will use them as a starting point, and a cleverly
designed self-assessment can be used to cover up much.
16. (a) Is it acceptable to employ a system that effectively acknowledges that the distribution
of research strength is likely to change very slowly?
If the question is put in this way a lot of people will answer No. However changes in research
strength are, on the whole, slow, and this approach would provide a degree of stability and
would make possible meaningful planning – something of a shock, no doubt, since we have
become so accustomed to meaningless planning, but we should get used to it.
If one relied on the assessment of reputation by the use of surveys of the experts in similar
departments in other institutions, one would find that these assessments shifted slowly too. It
is a potential objection to that method that they tend to shift a little too slowly; a department’s
reputation rises or falls slightly later than the actual quality of its work does. However no
method will deliver perfect results all the time, and since this particular defect does have the
advantage of allowing stability, it is a merit as well as a flaw.
This answers questions (b), (c) and (e).
(d) What would be the likely effects upon behaviour?
People could get back to doing research and disseminating it to the academic community,
instead of publishing prematurely and wasting time on RAE forms.
17. (a) What should/could an assessment of the research base be used for?
To promote research (through the funding mechanisms), and through this also to promote two
other goals: the training of those who will carry on research in future, and the education of
those who may not carry out research themselves, but who may gain in knowledge and
understanding by coming to see how research is done and what the point of it is.
It cannot be the responsibility of those carrying out an assessment if others use it for purposes
for which it was not intended. If a research assessment is designed to do something different
from assessing research, on the grounds that somebody is going to do something different
with it, the whole thing becomes hopelessly muddled.
(b) How often should research be assessed?
If we adopted some variant of the American system it could be assessed annually, because the
amount of effort would be so slight; though it is not clear that there would be any value in so
frequent an assessment. However given the British enthusiasm for bureaucracy we need the
assessment to be as infrequent as is compatible with being fair (or as fair as so bureaucratic a
system allows), in order to give some people some time to do some research between bouts of
form-filling. At the moment it looks as though once every six years is about the best we’re
going to get, though perhaps we could suggest once every seven.
(c) What is excellence in research?
I should say that good research is research that enhances our understanding of things in some
original and significant way. However it is true that the Research Councils and the AHRB
have so extended their use of the word “research” that it has no single meaning at all. If the
word has to be used in their way no single answer can be given.
(d) Should research assessment determine the proportion of the available funding
directed towards each subject?
Yes. The standards should be international, and they should be academic. If the national
interest requires that particular avenues should be pursued for economic reasons, then they
should certainly be pursued, but we should be clear that this is an entirely different
consideration from whether the research concerned is excellent. Otherwise we just get
It should however be borne in mind that the value of research and the value of teaching
cannot clearly be separated. It is through being taught by those at the cutting edge of research,
and by being introduced in this way to new problems and new possibilities, that the ablest
minds can best be trained. Only through sustaining first-class research can the best teaching
(e) Should each institution be assessed in the same way?
To compare very different institutions in the same way is both unfair and unhelpful to those
who are least able to compete effectively. The middle position (providing a ladder of
improvement) seems to be the sensible one.
(f) Should each subject or group of subjects be assessed in the same way?
We should rely upon the experts in each subject to determine what the appropriate assessment
is. Who else is as well qualified to do it? There were considerable differences between the
requirements laid down by the Humanities panels in the 2001 RAE, and on reflection they all
seemed to be perfectly comprehensible given the different natures of the subjects involved –
though it might have been better if the divergences had been greater. At the same time, there
does need to be proper comparability between the results of subject groups. In 2001 different
Panels awarded markedly different proportions of 5s and 5*s, and these did not appear to be
justifiable by reference to the international standing of the UK in the subjects concerned.
(g) How much discretion should institutions have in putting together their submissions?
If we adopted some version of the American system this question would not arise. Supposing
we do not, I am puzzled by the thought that a more rigid system, or one in which submissions
were made by individuals or smaller groups, would “provide more objective results”. It is
clear that we have in some cases (e.g. German) suffered from too much control of RAE
entries by individuals or small groups, who have presented things badly. I can see that a rigid
external system might enable people to draw up tables in which numbers were put in boxes
for each department and institution, but in what way is that more “objective”? The standard,
and very very usual, problem with that kind of assessment is that it misses all the fine points
and produces an outcome that is entirely unfair – it’s the method of the League Tables.
(h) How can a research assessment process be designed to support equality of treatment
for all groups of staff in Higher Education?
Any assessment must depend upon judgement, and judgement is made by people. Hence there
is no way to guarantee absolutely that a particular assessment may not reinforce a culture or a
stereotype. What one can do is to get as close as one can to this assurance, by fairly selecting
those experts who carry the assessment out. I do not wish to defend the past research
assessment processes in general; the unfairnesses they produced were however nothing to do
with gender bias, racial bias, or anything of that sort. They were unfair because they favoured
institutions that put a great deal of time, effort, and intelligent guesswork into designing their
submissions in such a way that they met the requirements of the Review Body. It was
particularly unfair, for example, that nobody was told how much it would matter (either in
1996 or in 2001) whether a Unit entered itself as B rather than A. In 1996 we guessed that it
wouldn’t matter much, and we did better than Cambridge as a result. In 2001 we guessed (in
some areas) that it would matter considerably, and as a result Cambridge did better than us.
An “assessment” that produces results of this kind is unjust and unreasonable. It does not,
however, show any of the kinds of bias referred to in the paper.
(i) Priorities: what are the most important features of an assessment process?
Asked to select three from the list supplied, I should say: Not burdensome, fair, minimally
expensive. I take it however that the first of these entails that it be administratively efficient,
which is another item on the list, and itself entails that it be flexible, which is also on the list.
The second entails two further items – to be fair it must be rigorous and of course resistant to
Have we missed anything?
The American method should, at the very least, be closely examined to see whether an
effective British variant can be devised.