Perspectives on Assessment
Tony Gardner-Medwin, UCL
These are personal perspectives on a number of assessment issues, as I see them at
UCL in 2006. Assessment must not be something divorced from teaching and learning.
Firstly, summative assessment obviously determines very much the ways in which
students study and revise, especially given the traditional UK HE scenario where a
degree class is seen as the main 'deliverable' of a university education, to use current
jargon. But more profoundly, the lasting objective of HE ought to be - in my view and in
that of an increasing number of teachers - an ability to be critical and to self-assess one's
own work in a chosen field -- at least as much as the acquisition of specific knowledge
and skills. This requires that assessment, reflection, peer interaction, self-criticism and
judgement of the limits of one's knowledge must be part of the student learning
1. Learning oriented assessment
LOA (Learning Oriented Assessment) is new jargon to some extent supplanting the word
"formative" for assessment that helps student study and learn, as opposed to
"summative" assessment that doesn't . This helps to divorce it from just preparation for
exams (summative assessment). LOA indeed is a concept applying to exams also, since
the way in which exams operate affects the way in which students learn, and should of
course be planned with this in mind. I like the term. It stresses the principal role of
assessment as something to stimulate learning. Assessment that simply ranks or
classifies students is increasingly regarded as a marginal (possibly unnecessary and
even stultifying) role of universities. Most assessment effort should aim to directly
stimulate effective learning and enhance ability to judge one's own work.
2. Marking criteria, defined outcomes and alignment with course objectives
Quality audits tend to stress these. I regard 'marking schemes' for written work (in the
form of lists of points that should be included) as very hazardous, and the notion that they
allow non-experts to mark fairly as nonsense. A more sensible strategy may be to make
different assessment items target distinct circumscribed skills: e.g. "This Q will be marked
on how well you apply principles to an unfamiliar problem"; "In this answer you must be
concise and logical, and you must be careful not to introduce irrelevant information.";
"This Q asks for more than core textbook knowledge, and you may include speculative
ideas but you must ensure that you identify and distinguish points that you understand
and are sure of, from those where you are unsure. ".
3. Degree of choice in examinations
This tends to be an uncontrolled facet of exams and in my experience is often so great as
to encourage students to take the risk of ignoring large parts of a course. Of course
exams often want to test deep knowledge, which may be realistically achievable for only
a fraction of the course material. It is fine for a course explicitly to encourage optional
specialisation if appropriate, but this should be clear in the course objectives, not just in
the structure of the exam. Where breadth of knowledge is also important, this can often
be tested economically alongside the specialised Qs with a form of 'cheap' assessment
(e.g. compulsory True/False or MCQ Qs) that will detect undue selectivity in student
learning. Assessing breadth by marking a large number of essays is extravagant and not
4. Problems in marking large classes
I personally can't mark more than about 30 essays on the same topic in a day. Others
may do better! But this shows the problem of handling classes of 300+. The only solution
is to reduce the amount of regurgitative written work expected. (I know we don't ask them
to regurgitate but they do, and this will generally achieve a pass mark, if not a first). One
way to cope is to use computer-based or Optical Mark Reader assessment (below);
another is simply less assessment by staff, perhaps combined with peer assessment
which though less reliable may provide a better learning experience for the students.
Assessment based on more personal, taxing and constructive Qs may be another, less
likely to produce the mind-numbing regurgitation that makes exam marking so difficult.
E.g. perhaps "Describe something about membrane transport that you had trouble
understanding fully at first, why this was, what you now understand, and how you got
there." I think I could mark 100 of those in a day!
5. Plagiarism and rote learning
In law I guess, plagiarism is a tort and rote learning simply a strategy; but they are closely
related. They are both communication without evidence of understanding. One strategy
is to ask Qs where the student must use understanding, and thus application or
transformation of learned or sourced material in literature or on the web. Plagiarism is
best prevented by clear criteria of what is expected. For example, I always insist with
project students that each idea or conclusion expressed in a report must either have a
clear source where I can read more about it, or be clearly identified as the student's own
idea in whole or in part. Students are often strangely reluctant to tag their own ideas,
though of course these are often what we are looking for in assessing quality work.
Plagiarism software is available to compare submitted text with databases of likely
sources, but detection is only half the problem for staff, and leads to time-consuming
investigation. Prevention is much preferable and requires that students
(a) understand clearly what is and is not OK,
(b) see detection as possible or probable (simply asking for an e-copy of an essay
that might be submitted to plagiarism detection software may help in this, even if it
is seldom used),
(c) see the consequence of breaking the rules as serious.
Plagiarism and regurgitation are seldom a problem with Oxbridge style tutorials, where
the student will be embarrassed and humiliated if unable to discuss something s/he
claims to have written. A less time-consuming strategy may be a short personal viva on
essay and project work. In my view, and for this reason, a viva should be included in the
assessment of all project work - not as a major quantitative part of the assessment but to
motivate an honest approach by the student and to help reveal qualitative problems that
6. Efficacy of feedback from assessments
Assessment that doesn't give effective (acted upon) feedback that improves learning or
performance should be seen as largely a waste of time (1 above). Feedback like "you
fail" or "you got 68%, nearly a first" may stimulate learning slightly, but are not worth the
staff time it takes to generate them. Students often learn more from giving and getting
peer assessment than from teachers' comments. Software systems exist to help teachers
give clearer comments about work from distinct perspectives. The aim of feedback
should always be to help the student improve in future, rather than to mark or rank them.
7. Use of objective assessment (MCQ, T/F, EMQ, computer marked)
This can relieve teachers of some work. It should not be seen as a substitute for teacher
involvement. It allows teachers to focus on tasks that require teachers, while still ensuring
breadth of learning and providing self-assessment and stimulation of effective study.
Computerised assessments cannot replace small group tutorials, but should allow a
smaller number of tutorials to be more interesting and valuable once the task of teaching,
practising and testing basic skills is removed. In exams likewise, they can help focus and
reduce hand marked content and ensure coverage.
8. Certainty-based marking (CBM) to enhance learning and assessment
Although CBM in various guises has been researched a great deal and found to be both
constructive pedagogically and to benefit assessments statistically with practised
students, its use at UCL has been almost certainly the largest scale, the simplest, and the
most successful implementation. This has been a HEFCE funded initiative that I have led,
and it has received much interest nationally and internationally in pedagogy & learning
technology circles. I rather despair however that its benefits (as a study tool and a fairer
and more reliable means of objective exam marking) have been little adopted at UCL
outside Phases 1,2 of the medical course. This is not because anyone explicitly
challenges the benefits - rather I suspect sheer inertia. The CBM principle (refer to the
website www.ucl.ac.uk/lapt ) is popular with students, transparent in operation, easy to
implement with links within WebCT or Moodle, and proven as a more reliable and more
valid measure of knowledge (with medical T/F exams) than simple right/wrong marking. If
you use it in summative assessment you must ensure that students are well practised,
but since one of its primary purposes is to stimulate more critical learning and thinking
habits this is obviously sensible.
CBM rewards a student who can distinguish reliable from unreliable knowledge.
This student gets more marks than one who gives the same answers, but cannot identify
which are sound and which uncertain. It motivates and rewards reflection on justifications
for an answer, to the point that either high certainty is merited, or reasons emerge for
reservation and low confidence: both ways the student will gain. It weights confident
answers more than uncertain ones, and penalises confident errors. It is important to
realise that none of this is about personality traits (self-confidence or diffidence).
Evidence shows that there are no gender or ethnic biases in data from our (well practised)
students. CBM addresses a serious problem that arises for many highly selected UCL
students: they are smart enough that they have sailed through exams with little need to
access more than superficial ideas or associations to get good marks. We need to
provide the incentive to think and study more deeply. A university that fails to adopt CBM
is frankly, in my view, failing its students. If you disagree with this view, I would welcome
9. Optical Mark Reader Technology for computer- marked questions
Both the UCL Records Office and the Medical School office have Optical Mark Reader
(OMR) machines and cards, for running exams or formative tests. OMR cards are
available with or without Certainty-Based Marking (CBM: 8 above), and for True/False,
Best of 5, or EMQ (Best of 12 or so) question formats. Customised OMR cards can also
be ordered through this office or from Speedwell Computing Services
(www.speedwell.co.uk) You need to number your question paper with question text and
graphics in the same way as the cards you will use. OMR technology has recently
become very reliable and the service is much better and cheaper than the old Senate
House system. Elsa Taddesse ( email@example.com ) in the Records Office currently
runs a very good service. In addition to the Speedwell software, there is very versatile
software at www.ucl.ac.uk/lapt/speedwell/analyse.zip that can enhance analysis with or
without CBM. Relevant tips:.
Pencils must be used, with care taken to erase corrections thoroughly.
Barcode labels are best way to ensure correct student candidate numbers. Otherwise
much time can be wasted through incorrect entry of numbers on a grid, which seems
too hard for some students!
Some cards include a "Don't know" option. Advise students never to use this, since
even slightly informed guesses are usually better than chance - unless you warn them
that you are using a very strong negative marking scheme that penalises guesses on
average to be worse than blanks. If you want to reduce the variance due to guessing,
View the format of OMR cards for Certainty-Based Marking (CBM) at:
http://www.ucl.ac.uk/lapt/author/Speedwell_S2065_A.pdf (side A, 180 T/F Qs)
http://www.ucl.ac.uk/lapt/author/Speedwell_S2065_B.pdf (side B, 135 best of 5 MCQs)
http://www.ucl.ac.uk/lapt/author/Speedwell_S2487.pdf (combined Q types)
10. Re-use of questions available for practice
This can be a problem both in written exams and in objective testing. In written exams
(using identical or similar Qs to past papers - especially combined with excessive choice
(3 above)) re-use encourages selective learning and writing without understanding (5
above). In objective tests, as with recent True/False exams in Years 1,2 of the medical
course, it has become a serious problem - the better medical students have complained
vigorously that through sound study they are unable to do as well as others in the class
who have simply memorised banks of questions from past papers. Because of staff
reluctance to generate new Qs, re-use gradually rose to over 50% in 2004. Analysis of
one exam from 2004 revealed an average percentage correct mark that was 5% above a
nominal pass mark of 75% on re-used Qs and 2% below this on new Qs. This is both
clearly an important issue for standard setting as well as having a deleterious effect on
students' learning strategy, encouraging rote-learning and seriously undermining both
care in question design and the use of certainty-based marking.
An issue here is the publication of past exam papers (including MCQ or TF Qs).
UCL has had a tradition of doing this. Some institutions try to keep databases secret for
re-use, but this opens them to abuse by student cliques, e.g. students reconstructing
papers by distributed memorisation ('you write down Q37 after the exam') and
maintaining a black market in past papers. Many past medical MCQ papers are available
on LAPT with CBM (8, above) though currently feedback about individual answers is
withheld online to reduce the student temptation to memorise answers and gamble on re-
use (for discussion, see http://www.ucl.ac.uk/lapt/web/comment.php?feedback ). Partial
re-use may be good practice to help check on drift of standards over the years and to
calibrate the quality of Qs, but it is important where there may be access to previously
used Qs to monitor and research the impact of this on exam performance. The risks of
re-use should be balanced against the benefits of such analysis, when employed.
11. Need for diversity in assessments
People have diverse talents and lacunae, and need to be aware of these. Assessment
cannot identify and stimulate diverse talents if it simply provides single scores and
rankings. To develop students you must test, reward and provide feedback for a range of
skills and types of knowledge, demonstrated under both exam and 'authentic' conditions
(18 below). Every course module should identify in its objectives the activities and skills
that it enhances and tests, and these should range widely in different modules.
12. Stimulating learning through student-student interaction
Students often claim to learn more through interaction with each other than from teachers,
computers or books. Though I doubt this is really true, they probably do learn more per
hour spent on each of these activities. Problem based learning, group work, peer
assessment (13 below), student presentations, e-tutorials and discussion fora (14) are all
constructive modes. A key issue is an atmosphere of constructive comment and criticism.
The statistical reliability of associated assessments is probably not important. The
experience of assessment (both as assessor and assessed) is highly valid as an
13. Peer assessment
I have no experience of this, but people I have heard talk about their experience see it as
hugely beneficial. A missed opportunity is often the occasion of student presentations,
when students should be motivated to think what is good and bad in the work of their
peers. If talks are attended only by staff and by one or two students next in line to talk,
they are a terrible wasted opportunity.
14. Discussion forum assessment
It is hard to get students to overcome inhibitions about raising issues in e-mail or bulletin
board style discussion forums, just as in tutorials. But people who have succeeded in
overcoming this find most students get a lot out of it. Something as crude as awarding a
small element of in-course marks simply for participation, regardless of content, can be
an effective catalyst. Anonymity of contributions is probably not to be encouraged, but it
is as well to offer a route by which it is possible, to help overcome an initial hurdle of
diffidence, or deal with issues where embarrassment could be an issue. Editing of
student postings by staff is a controversial issue. I think it is worthwhile on occasion to
help avoid time-wasting duplication or distracting mistakes, but of course an essential
principle is that any editing should be indicated as such.
15. Key skills assessment
Even crude but focussed assessments for things like IT skills, numeracy, writing skill, oral
presentation, help to make students aware how a portfolio of skills is will contribute to
getting them a good job, and how they can actively seek help and practice at things they
are bad at, or actively generate evidence of things they are good at. 'Assessment for jobs'.
16. Portfolio development and assessment
As with artists, architects and designers, students in almost any field should learn the
value of presenting examples of their own skills. Software exists for handling e-portfolios
and incorporating authenticated staff comments, progressive diaries of reflection ('blogs')
and the like in a package. The objective is self-selected evidence of quality and of the
ability to identify quality. I don't know if assessment of portfolios by teachers is often very
successful, but some universities introduce external lay assessors (often alumni) to
discuss and comment on portfolios in what seem very constructive ways.
17. Assessment of self-monitoring ability
The supreme skill in the workplace is often considered to be ability to know when one is
doing it 'right', as much or more even than the ability to do it right. A useful perspective is
that assessment strategies have failed if a student cannot eventually tell whether a piece
of work will get a good or bad mark. Some people adopt a 'meta-assessment' - marking a
student's reflection on what in an essay is strong and what weak. Once one reaches the
summit of university formal academic assessment, the PhD viva, this becomes almost
the essence of assessment. But it is a valuable approach much earlier, and is almost
instinctive in face-to-face discussion of a student's work. Don't worry if such assessment
may at first seem inappropriate or even unfair to the student. It's the benefit that counts.
Encourage students to be explicit in both course-work and exam-work when they think
there are doubts or weak points in an argument or observation. In my experience it is
hard to get them to do this, but it will never harm their assessment and will often benefit it.
18. Authentic assessment (related to workplace tasks)
This is most relevant I suppose to explicitly vocational courses. Certainly not all of a
student's assessment should be 'academic' in the sense of having little relevance to a
real job. I think there are good examples of effective authentic assessment in clinical
skills, with OSCEs (objective structured clinical exams) and associated formative peer
assessment. Since many students think about their university education in terms of
preparation for the job market rather than simply developing their intellects, even a
modicum of authenticity in assessment can be motivating.
19. Oral assessment
Vivas are often decried as being unreliable (different assessors give widely different
marks). This is a reason they should not be the only or dominant element in an
assessment for ranking or pass/fail. But an excellent reason to include them is that they
are the strongest motivator to a student to ensure s/he knows what s/he is talking about.
Even the most rank bullshitter is capable of some humiliation in a viva, while a student
who thinks carefully but slowly can shine in a viva. They are also the only kind of
assessment where I at any rate feel I am genuinely in touch with whether the student
knows anything - which brings job satisfaction. It is controversial whether a good viva
should be grounds for passing a student who has failed an exam on the basis of written
work, and certainly the range over which this can happen should be very limited. But
pass/fail vivas are valuable in other ways: for example to help students psychologically
come to terms with the fact that they really don't know very much, and to help identify the
nature of their problem (for example, language comprehension, poor expression,
inclination to guessing, etc.).
20. Identity in assessment (who did the work?)
This is a potential problem with most forms of in-course assessment and distance
assessment (e.g. non-invigilated computer work). It ceases to be a problem though once
assessment is seen by both students and staff as an exercise to stimulate and improve
learning, using a diversity of styles and with marks much less important than feedback.
Of course some assessments must feed into the processes of progression and reward to
individual students. Where course-work may possibly not have been completed by the
student who submits, or where the possibility of more expert help is an issue, a useful
strategy is to pair such assessment, relying on student honesty, with smaller scale
invigilated assessments with related content so that there is at least a potential to follow-
up discrepancies with for example oral assessment. Oral assessment aimed at identifying
plagiarism is probably seldom unreliable in the way that it may be for more general
assessment issues (19, above).
21. Move to transcripts and portfolios, away from (just) classification as the
A full transcript of course marks throughout a student's career is more useful and
indicative of talents, consistency and motivation than is just a degree class, e.g. '2.2', to a
prospective employer. Consistent provision of this information along with academic
references will help motivate students to take seriously their performance over all their
courses in every year. Students should appreciate that the fact that something may count
little towards their final degree classification does not mean it is unimportant. It is
sometimes argued that a poor first year performance should be 'forgotten' if later work
shows that a student has learned to do better. This may often be fair (though 1st year
exams tend to test different, not just more elementary skills than 3rd year exams). But a
prospective employer can perfectly well make this decision in the light of whatever the
student may say about it. Portfolios of quality work generated by the student (16, above)
can provide even more useful information to an employer, but these are different from
transcripts in that it will normally be the student's, not the university's responsibility to
authenticate them as true examples of his/her work.
22. Does UCL assess too much?
Yes, too much ranking assessment. No, not enough learning oriented assessment! We
all hate mindless hours of essay marking with ill defined objectives. Assessment time
should above all provide feedback that improves learning.
23. The opportunities presented by electronic assessment
It is widely believed nonsense that e-assessment can only tests factual knowledge. There
are good & bad Qs of any type, and each may require factual knowledge and/or logistical
skills. E-assessment should relieve teachers from the job of ensuring that students have
studied throughout the range of their subject, and should allow staff to assess qualities
that only staff can judge, like the ability to select, organise and communicate ideas.
24. Overseas students and cultural responses to assessment
Cultural issues (including linguistic and educational background, communication style,
self-confidence, expectations, etc.) demand above all learning-oriented feedback.
Strategies like peer-assessment, group work, certainty-based assessment and computer
self-assessment can help provide this without being staff intensive.
25. Radical approaches to assessment
The most radical is to abolish ranking and marked assessment and develop portfolio
based, peer & self-assessment (see e.g. www.alverno.edu , with Georgina Loacker a well
known advocate). I think UCL wants to be radical in less radical ways. We have a
different clientele. But we need a much more radical approach to implementing good
ideas, both our own and others'.