Why do we Need
Director, York Trials Unit
• In most areas, education, health, criminal
justice, etc, we want to know WHAT or
WHETHER something works.
» Do ‘bootcamps’ reduce criminal behaviour?
» Are teaching volunteers effective?
» Are computers effective at improving literacy
• Of secondary importance is HOW.
The WHAT question
• The ONLY way we can find out whether
something works or not is by using a
RANDOMISED CONTROLLED TRIAL.
• All other evaluative methods are
INFERIOR ways of answering the WHAT
question and some cannot answer it at all
(e.g., qualitative research).
Structure of Session
• Randomised Controlled Trials ARE the
‘gold-standard’ evaluation method.
» What is wrong with other research methods?
» Why should we do trials
Before and After Methods
Clinical Practice in the 18 th
• "It is incident to physicians, I am afraid,
beyond all other men, to mistake
subsequence for consequence."
Samuel Johnson, 1734
• Traditionally most interventions have been
evaluated using a pre-test post-test or before
and after design.
• Participants are tested treated and then tested
again any improvements are attributable to the
• Currently this is probably the most POPULAR
evaluative method in most fields.
Who uses before and after?
• Policy makers
• Teachers assessing individual children.
• Action researchers.
• We all do.
• Problems include:
» Temporal changes;
» Regression to the mean.
• Self-learning irrespective of teaching
• As children mature they will become better
• Any intervention or treatment is mixed up
with these temporal changes difficult to
Changes in Outcomes
• If we measured outcome on public
examination results we will see an
improvement. Is this because the
intervention has worked? Or is it because
exams have got easier? Or have children
become more intelligent?
• Without a control group we CANNOT know.
Regression to the Mean
• As well as temporal changes before and
after studies are confounded by a
statistical phenomenon known as
‘Regression to or towards the mean’
Regression to the mean
• This is a GROUP phenomenon and occurs
when the group are measured with an
inexact measurement tool and then
remeasured. Those individuals with
‘extreme’ values will have a high
probability of regressing towards the mean
on the second measurement.
History of RTM
• Galton’s work from 1869 started to provide
the understanding of the phenomenon.
• By 1886 Galton had described the
phenomenon among the heights of
children and their parents (children of tall
parents tend to be shorter and vice versa
– regression to mediocrity).
Economists and RTM
• “I suspect that the regression fallacy is the
most common fallacy in the statistical
analysis of economic data”
Milton Friedman 1992
Marking Exam Scripts
• For MSc in Health Sciences system of
double marking markers are blind to
student identity and the other marker’s
• There is a tendency to disagree with
marks at the extreme of the distribution.
• Explanation: Regression to mean.
RTM and exam scripts
R = 0.7788
0 50 100
Annual Increase in offences
2000 2001/02 2002/03
Did the Amnesty work?
• Unclear, the year preceeding the amnesty
had a large, unexpected, increase in
offences, we would expect through
regression to the mean that in the
following year the rate of increase would
‘regress’ back to towards the ‘average’
• Wheldall selected 40 pupils whose reading
was at least 2 years behind their peers.
• Half were exposed to an intervention.
Wheldall Educational Review 2000;52:29.
Before and after reading
Difference highly statistically significant p < 0.001
Before and after reading
Differences between groups NOT statistically significant
• “the mean gain scores translated to impressive
effect sizes of 0.6.”
• “It could be argued that it is asking too much of
any program to demonstrate enhanced efficacy
on top of such high existing efficacy”
• “…control group gains were largely attributable
to pre-existing …literacy programme..”
• Perhaps, BUT much of the gain will be due to
RTM and School Exclusions
• A qualitative and before and after
evaluation of an intervention to reduce
school exclusions said
» “an RCT would not have been able to
adequately address fundamental problems
concerning the reliability and validity of
quantitative data in relation to exclusions”
• Selected schools with HIGH exclusion
rates on which to intervene. Therefore we
would EXPECT exclusions to fall.
• They did by 15%.
• BUT schools with the fewest exclusions
INCREASED exclusions by 55% whilst
schools with the highest exclusions had a
fall of 32%.
• In England, part of the KS3 Strategy
• Backed by Government and private
• ‘Mentoring’ means a lot of different things
• Research evidence is
» Case studies
» Feelings and perceptions of participants
» Completely inadequate to infer impact
Neil Appleby’s Experiment
• A randomised controlled trial involving 20
underachieving Y8 (12-13 year-old) students
• Matched in pairs on ability and gender
• Randomly allocated: in each pair, one mentored, the
• Mentored group had 20 mins individually every two
weeks (11 sessions)
» ‘It nearly killed me’
» Cost estimated at between £170 and £410 per mentored
pupil, represents between 8-19% of the school’s annual per
pupil funding for the whole of their education
What the teachers said about
the mentored students …
• “**** is a changed person this year she
has progressed greatly and is a superb
• “Better now, has achieved more, more
• “Generally a great improvement recently.”
• “****’s attitude and effort have improved
over the year. He is a lot pleasanter and
more willing to participate in lessons
particularly oral work, he responds well to
What they said about the control
• “Has improved overall this term.”
• “****’s attitude and effort have improved over
the last few months, she is now trying very hard
to achieve her target. Great effort.”
• “Commended for attitude and progress.”
• “**** has settled since the beginning of the
• “**** has undergone quite a transformation
since September. Her attitude towards the
teacher and her learning have improved
drastically and she should be congratulated.”
Change in Teachers’ Ratings
of progress, effort and attitude
(English, maths and science combined)
+ group median
-6 -4 -2 0 2 4 6 8 10
Overall rating of change
What this proves
• If you identify a group of underachieving
pupils at a particular time and then come
back to them after a few months, many of
them will have improved, whatever you
• Others (the ‘hard cases’) will not have
improved, whether mentored or left alone.
• The interpretation of this would have been
very different without a ‘control’ group
RTM and League Tables
• RTM GREAT for Governments to help the
credulous into believing what they do works.
• In any league table those at the bottom will tend
to ‘regress’ upwards to the mean whilst those at
the top regress down. This lends support to
naming and shaming or extra financial help to
those at the bottom.
Dealing with RTM
• The only way to reliably deal with the
problem is through randomised trials.
• Which is why before and after data are
generally regarded, by the congnescenti,
as almost USELESS.
History of Controlled Trials
• Because of temporal and regression to
mean effects we MUST have a control
• Many researchers over the centuries have
seen the need for a ‘control’ group to avoid
the inherent biases in the before and after
• Controlled trials have been conducted for
several hundred years probably
occasionally using randomisation.
• Scurvy was a very prevalent condition
among sailors before the 19th Century.
• A controlled trial in the middle of the 18th
Century of 12 sailors showed that the two
sailors allocated to receive lime or orange
juice recovered and were able to care for
their ship mates allocated to vinegar or
Lack of Dissemination
• An even earlier trial in scurvy prevention
used a ‘cluster’ design whereby a whole
ship’s crew were allocated citrus fruit and
were compared with two ships’ crews who
• The treatment worked but lesson forgotten.
• After second trial took Navy 50 years to
• Fisher is usually thought of as the
originator of randomisation in the 1920s in
• He was concerned with the statistical
properties of ‘randomness’ as well as the
formation of unbiased groups.
• In 1937 a classic experiment – the
Cambridge-Somerfield trial was launched.
• The aim was to show that social worker
intervention among ‘delinquent’ boys
would reduce ‘criminality’.
• 650 boys were identified by their teachers
as having delinquent behaviour that put
them at later risk of criminal activity.
• 325 pairs were formed and one from each
pair was allocated a social worker
supported by psychiatrists.
Results – early follow-up % of
boys indulging in crime.
Property Assault Sex Drunk Traffic
Green bar indicates intervention grop
Results later follow-up
• In 1975 ‘boys’ were followed up again
when middle aged men.
• 58% of intervention group had NOT had a
• BUT 68% of control group had NOT had a
• If a control group had not been used
success of the intervention would be
Consequences of the Trial
• The social work profession largely
ABANDONED the RCT as a method of
evaluation as it failed to give the RIGHT
RCTs and education
• Lindquist writing about experimental
methods in 1940 argued that advanced
text books use “all illustrations given are in
the field of agricultural experimentation
and are concerned with “plots” “blocks”
“yields” “treatments” etc, rather than with
“schools” “classes” “scores” “methods”
Lindquist Statistical Analysis in Educational Research, 1940.
The Importance of Design in Educational
• In 1940 in his book on statistics in educational
research Lindquist quite clearly describes
appropriate RCTs for educational research.
• His book is also the first description of the
appropriate techniques to be used in analysing
pupils scores in classes (I.e, cluster analysis),
which was an advance on Fisher’s Design of
• In health statistics Lindquists statistical methods
were largely ignored until the late 1980s when it
became accepted to use the methods he
advocated to analyse clustered data although
even now most cluster trials are badly analysed.
• But 64 years on what about his descriptions on
how to rigorously evaluate educational
Educational Trials: UK
• Not many trials in education have been
undertaken in the UK.
• Most educational trials are from the USA.
• WHY? (my personal view)
» Futility of the ‘paradigm war’;
» Failure to understand their importance;
» Trials often give the ‘wrong’ answer;
» Lack of funding.
Opposition to Trials is
• In health care many doctors will refuse to
believe the results of a trial and argue the
trial was faulty or poorly conducted if the
result was ‘wrong’.
• Recent example: WHI study of hormone
replacement – many doctors REFUSE to
accept the findings of this study that it
INCREASES risk of heart disease.
Opposition to Polio Trial
• “I found but one person who rigidly
adherred to the idea of a placebo control
and he is a bio-statistician who, if he did
not adhere to this view, would have had to
admit his own purposelessness in life”
1950s to 1970s
• The use of trials expanded rapidly within and
• In the social sciences experiments included:
» Negative income tax;
» Public vs private schools;
» Prevention of spousal abuse.
Health Care Trials
• Although ALL new medicines have to be
evaluated using RCTs many medical
treatments do not.
• HOWEVER, health care is ‘fortunate’
because we bury our disasters we KNOW
how important trials are as a protection for
Health Care Disasters
• Opposition to RCTs has declined over the
years, partly due to a number of
catastrophes, from unevaluated
• Harmful treatments are still in widespread
use today – we just don’t know which
Disasters among babies
• Routine practice in 40s and 50s to give
premature infants pure oxygen. At the same
time it was noted that there was an ‘epidemic’ of
blindness among babies. Linked to oxygen use.
• Routine practice in 50s to give prophylactic
antibiotics to premature infants, caused brain
damage and death.
• BOTH of these problems only discovered
AFTER an RCT was undertaken.
• Interestingly an early trial of pure oxygen
for neonates was sabotaged by nurses
who secretly gave oxygen to some of the
controls because they KNEW that it was
• Because of this ARROGANCE they
contributed to the blinding of healthy
• On the basis of ‘before and after’ and
anecdote widespread implementation of
driver education (in the USA) among older
pupils was implemented.
• It was thought that this would reduce car
• Did it? Fortunately, some ‘sceptics’
undertook a series of trials in the USA.
Driver Education - Results
• Roberts and colleagues (see Campbell
Collaboration) reviewed these trials and
undertook a meta-analysis.
• They found that driver education
INCREASED the likelihood of deaths in
car accidents as it increased the
prevalence of young motorists.
UK Policy makers
• Have IGNORED these results and
implemented driver education in some
• This will directly increase deaths among
Computers in Schools
• Introduction of computers into schools has not
been preceded by large RCTs.
• The best evidence we have is from a ‘quasi-
experiment’ from Israel, which showed that
introduction of computers into half the state
schools led to no change in Hebrew literacy but
a DECLINE in maths.
• The Israeli Government has since introduced
computers into all schools!!!
Volunteers in Schools
• The use of volunteers to help children learn to
read is widespread – but are they effective?
• In a systematic review of RCTs only 7 trials
could be identified with largest with ONLY 99
• The effect of volunteering was very slight (0.19, -
0.31 to 0.68) and not statistically significant.
Torgerson et al. 2002 Ed Studies, 28 No 4.
• Virtually all new interventions need to be
evaluated using RCTs.
• Unlike health care children are compelled
to have education. Therefore it is even
more urgent that they should not be
exposed to ineffective educational
We need more trials