Direct Observation of Procedural Skills (DOPS) Instrument by lindayy


More Info
									                                 Direct Observation of Procedural Skills (DOPS)

Instrument description

Direct observation of procedural skills (DOPS) is the observation and evaluation of a procedural skill
performed by a trainee on a real patient. Procedural skills are also known as technical skills or practical
skills. The procedural skills assessed using DOPS range from relatively simple and common procedures,
such as venepuncture, through to more advanced surgical skills, such as endoscopic retrograde
cholangiopancreatography. Evaluation by an experienced doctor is carried out using either a checklist of
defined tasks, a global rating scale, or a combination of both.

To properly define DOPS and distinguish it from several other assessments which bear close resemblance to
DOPS a few key features of this assessment instrument should be pointed out. Firstly, DOPS involves the
assessment of procedural skills rather than other clinical skills such as taking patient histories or performing
physical examination. Observation of these clinical skills would better be described as long cases with
observation. Secondly, DOPS involves the evaluation of a specific patient encounter rather than a rating
given based on observation over a period of time, as is the case in “supervisor” or “ward” evaluations.
Thirdly, DOPS involves the performance of procedures on actual patients rather than cadavers, simulations
or animal models.

DOPS is the term used to describe one of the new assessment instruments being piloted for use in the
Foundation Programme1 in the UK. It is sometimes described as a new instrument in the literature on
Foundation Programmes, however assessment involving direct observation of procedural skills has existed in
some form or another for a long time, although perhaps not in as structured a format.

Competency Domain

DOPS are used for the assessment of procedural (technical) skills. Competency in procedural skills involves
more than just dexterity. Darzi & Mackay, in the context of surgical skills, suggest that in addition to manual
dexterity a surgeon needs judgement and a knowledge base(1). Other important skills include getting consent
and general communication skills. Checklists and rating scales contain items which reflect the necessity of
competency in these domains also(2).

Competency level

Since students are observed performing procedures on patients in real settings, DOPS test at the “does” level.

Assessment context

Several authors comment on the lack of rigorous testing of procedural skills(3). Much of this literature relates
to junior doctors, either undergoing basic skills training during their first few years as doctors or residents
undergoing specialist training. DOPS is an instrument typically used to assess the procedural skills of doctors
at these levels. DOPS is one of the new assessments being piloted in the UK as part of the new “Foundation
Programme” for medical graduates in the first two years as doctors before they begin specialist training(4).
Residents from all three of the major specialisations, namely surgery, internal medicine and general practice,

 The Foundation Programme is a two-year general training programme which forms the bridge between medical school
and specialist/general practice training. Trainees gain experience in a series of placements in a variety of specialties and
healthcare settings.
are also assessed using direct observation. It is an instrument used most prevalent amongst surgical residents
due to the higher frequency of procedures performed by them, and it some authors have suggested that it is
not used as much as it should be in internal medicine(5) and general practice(6, 7).

Although not a new instrument, the use of DOPS for junior doctors has been invigorated in recent years. In
some training programs it is replacing other instruments used of the assessment of procedural skills such as
log books and supervisor evaluations(3, 8). Logbooks record the number and types of procedures performed
over a period of time, including records of complications and informal assessments by supervisors.
Supervisor evaluations, as discussed above, assess general performance over a period of time rather than a
specific encounter. Logbooks are criticised on the grounds that they do not assess performance but rather are
a crude measure of expected competence. Furthermore, there is the issue of determining how many
procedures much be carried out to demonstrate competency; this can vary from procedure to procedure and
from trainee to trainee(5, 9). Supervisor evaluations have been found to be unreliable(10).

DOPS is not widely used for assessing senior doctors. One exception is the use of DOPS by the Royal
Australasian College of Physicians as part of its maintenance of professional standards program since
1994(11). Under this programme, which is technically not mandatory, fellows of the college have the option to
either complete continuing medical education, engage in research, or have their practice reviewed. Practice
review is done by a site visit, which includes an extensive review of case records, observation of
performance with new and revisiting patients, and observation of technical procedures where appropriate.
Only a minority of fellows choose the practice review option, however it has been shown to be feasible and
acceptable. However it may not be as feasible on a larger scale since currently reviewers perform their
services voluntarily. A similar instrument was once used by American Board of Medical Specialties as part
of its re-certification procedures, however it is no longer used due to its high costs(12).

Instrument quality

There is very little psychometric data on DOPS. This is perhaps due to the fact that direct observation is
often carried out informally. However intrinsically it is seen as a high quality instrument since it tests at the
“does” level.

Further studies on the validity of DOPS need to be done. Authors have commented on the lack of studies
assessing the validity and reliability of DOPS, despite it being fairly widely used to assess the competency of
residents(13). A review by Wilkinson et al in 2003 found that there were no validated methods of procedural
performance assessment described in the literature(14). However it is anticipated that there will be several
studies on DOPS as part of the introduction of the Foundation Programme in the UK(2, 14, 15), and thus
evidence regarding instrument quality should start to emerge soon.

Despite the lack of evidence on its quality, direct observation of an individual’s procedural skills certainly
has high face validity(12). Examinees are observed in a situation which very closely resembles normal clinical
practice, since there are real patients and the procedures are selected from normal routine. The only real
authenticity issue is that doctors may not perform according to their normal standards due to the anxiety of
knowing they are being assessed. If a doctor knows they are being observed this may influence their
behaviour, so it may be argued that this method is not assessing performance, but maximum competence(12).
Despite this criticism, the Royal College of Physicians, who developed a DOPS instrument for the
Foundation Programme, anticipate that it will be found to be highly valid and reliable instrument,
particularly compared to the previous logbook based system(2, 14, 15).

Further studies on the reliability of DOPS also need to be done. The main issues appear to be determining the
number of procedures that need to be observed to achieve adequate reliability and determining appropriate
checklists and rating scales for different procedures. The number of encounters need to ensure adequate
reliability will be addressed by pilot studies for the Foundation Programme(14).

In terms of determining appropriate checklists and rating scales for DOPS, one of the issues is the degree to
which they should be structured. Studies which compared the use of checklists and global rating scales in the
context of standardised patient exams have found global ratings to be more reliable(16-18), suggesting that
some degree of flexibility improves reliability. However for the assessment of procedural skills, which are
perhaps more mechanistic than other clinical skills, a structured approach may be required. Reznick, in a
review of instruments used to assess procedural skills amongst surgical residents, argues that the more
objective and structured the criteria for assessment of technical skills the more reliable the process will be(19).
Based on a survey of a small group of anaesthetists, Greaves et al suggest that the assessment of procedural
skills, as opposed to more general clinical skills, is more effective when there is structured observations and
the tasks are broken down into their components(20).

Educational impact
There is little evidence base research on the educational impact of DOPS. However commentators are
generally positive about the educational value to DOPS. The feature of DOPS which is most commonly cited
as being responsible for its high educational value is the opportunity it creates for pertinent feedback from
more experienced doctors. Within the Foundation programme, feedback is given immediately after the
encounter takes place and includes highlighting strengths and weakness and agreeing upon an action plan to
address developmental needs(15).

DOPS have been shown to be a feasible instrument(11), however their feasibility may vary from venue to
venue and may also depend on the particular procedure being assessed. Certainly the idea behind DOPS is
that they can be easily integrated into normal routine, and therefore should be feasible. However some
feasibility issues remain.

The main feasibility issue for DOPS is finding sufficient time for supervisors to observe trainee doctors
performing skills. This problem may become greater as DOPS becomes more widespread, since the logistics
are likely to get more difficult when attempts are made to implement it on a wide scale(12). One study of the
feasibility of DOPS in emergency medicine departments found that over 270 observation periods each
lasting for two hours faculty interacted with trainees on average only 20% of the time(21). This was lower
than what was expected by the authors, especially since the emergency department is considered to be more
team based than most departments. Within the Foundation Programme, the issue of finding sufficient
supervisor time for observation is addressed by making it is the responsibility of the trainee to select when
the assessment takes place and who will assess them(15). A study by Morris et al looked at the practical issues
of implementing DOPS in a pilot study in the Foundation Programme(22). They found that because some
procedures were not frequently required it was difficult to find opportunities to observe the skill. When such
an opportunity did arise it was not always convenient for the assessor to make themselves available at short
notice. Furthermore, at times such procedures were performed outside of normal working hours, when
assessors were not be present. It was easier to arrange DOPS for other common procedures. The emergency
department and routine operating lists were common places where DOPS were performed.

Another feasibility issue is that the checklists and rating scales used for DOPS instruments will need to be
customised for different specialties, since the range of procedures varies greatly from one specialty to the

There is little research on the acceptability of DOPS, however they appear to be acceptable to both
examinees and examiners. Trainees generally welcome the opportunity to be observed by someone more
experienced than themselves and given immediate feedback(15, 22). Greaves et al surveyed a small group of
anaesthetists and found that they felt the procedural skills of trainees could be accurately assessed by more
senior physicians(20). However there are some public safety issues in assessing surgical skills of trainees on
actual patients, and as such bench models provide a more acceptable alternative(24).


For educators
McKinley et al comment on reliability issues as they apply to direct observation of history taking and
physical examination skills, although their observations are probably equally applicable to DOPS(25). There
can be significant inter-case variations in direct observation, which decreases reliability due to both poor
content sampling and significant variation in case difficulty. This problem might be able to be controlled to
some extent by increasing the number of cases on which students are assessed and selecting cases according
to set criteria. There can also be significant inter-rater variation in direct observation. McKinley et al suggest
that examiner variability can be reduced by using multiple assessors, ensuring that they use explicit
assessment criteria and by training them.

Videotaping of the performance of procedures may be an alternative to direct observation. Reznick notes that
the disadvantage of this method is that reviewing videotapes might be time consuming and expensive(8). It
might also be said that any feedback given will note be as immediate as for direct observation, although
perhaps it could be more thorough.

The assessment of the procedural skills of general practitioners poses a significant challenge, due to the
decentralised nature of practice and the wide variety of clinical tasks. In choosing an assessment instrument
there is a trade-off between one which is valid and reliable, and one which is feasible on a large scale. The
use of video observation provides a possible solution to this problem(26).

An alternative to assessing students performing real procedures is the use of bench models. Bench models
are used for teaching and assessing a wide range of surgical skills from simple simulations which test
suturing and knotting to advanced simulations which test actual operative procedures such as laparoscopy(27).
Bench models can be used singularly or as part of a multi-station exam, commonly called an objective
structured assessment of technical skills (OSATS)(24). OSATS are to technical skills what OSCEs are to
clinical skills. Bench models have an educational advantage over DOPS in that surgical skills can be taught
in a less threatening and safer environment. There have been mixed results on the validity and reliability of
using bench models, although the evidence so far appears to be positive(3, 24, 27-31). The have also been shown
to be feasible, although relatively expensive(29). However an extensive review of the literature on bench
models and OSTATS has not been performed.

Another variation of the DOPS is the integrated procedural performance instrument (IPPI), similar to
OSATS, except that candidates are observed remotely by video by assessors who collect data on the
performance and send it back to the supervising clinician(32). The main advantages of the IPPI is that
observing each performance remotely minimises that effect that direct observation can have in altering the
performance of a trainee (due to anxiety), and therefore the assessment approximates clinical reality more
closely. It is however very resource intensive.

Other more novel methods for teaching and assessing surgical skills include hand-motion analysis and virtual
reality technology(3, 33). Hand motion analysis detects how many hand motions a subject uses to perform
standardised procedures. It has been shown to be an effective index of technical skills in both laparoscopic
and open procedures. Virtual reality has been used to assess procedures such as endoscopy. However there
are as yet few studies validating this method. It is suggested that educators will need to keep abreast of the
literature in this field as newer technologies emerge and there are further studies validating them.

For researchers

As discussed above, there a demand for further research to determine the validity and reliability of DOPS.
One of the main issues is determining the number of DOPS required to achieve adequate reliability and
validity(2, 14). Future investigations will need to investigate the use of DOPS with different procedures.

Jowell et al suggest that an alternative to assessing the competence of residents in procedural skills in
endoscopic retrograde cholangiopancreatography (ERCP) based on the number of procedures completed,
which is the method commonly currently used, competency should be assessed based on success rates of
procedures performed(34). Success is determined by successful completion of various components of the
procedure as judged by an attending physician, as well as an overall grading of competence. The reasoning
behind the adoption of success rates as the measure of competence is that ERCP is extremely difficult
compared to most procedures and so residents will often complete only a small proportion of the total
procedure before there supervisor takes over, and yet still be credited for the procedure. The authors suggest
that this methodology could be applied to many other procedures.


1.       Darzi A, Mackay S. Assessment of surgical competence. Quality in Health Care 2001;10:64-69.
2.       Wragg A, Wade W, Fuller G, Cowan G, Mills P. Assessing the performance of specialist registrars.
Clinical Medicine 2003;3(2):131-4.
3.       Sidhu R, Grober E, Musselman L, Reznick R. Assessing competency in surgery: Where to begin?
Surgery 2004;135(1):6-20.
4.       Beard J, Strachan A, Davies H. Developing an education and assessment framework for the
Foundation Programme. Medical Education 2005;39(8):841-51.
5.       Wigton R. Measuring procedural skills. Annals of Internal Medicine 1996;125(12):1003-1004.
6.       Fraser J. Teaching practical procedures in general practice. A primer for supervisors of medical
students and registrars. Australian Family Physician 2003;32(7):540-543.
7.       Tenore J, Sharp L, Lipsky M. A national survey of procedural skill requirements in family practice
residency programs. Family Medicine 2001;33(1):28-38.
8.       Carr S. The Foundation Programme assessment tools: an opportunity to enhance feedback to
trainees? Postgraduate Medical Journal 2006;82(971):576-9.
9.       Wicks A, Robertson G, Veitch P. Structured training and assessment in ERCP has become essential
for the Calman era. Gut 1999;45:154-6.
10.      Turnbull J, Gray J, MacFadyen J. Improving in-training evaluation programs. Journal of General
Internal Medicine 1998;13(5):317-323.
11.      Newble D, Paget N, McLaren B. Revalidation in Australia and New Zealand: approach of the Royal
Australasian College of Physicians. BMJ 1999;319:1185-8.
12.      Hays R, Davies H, Beard J. Selecting performance assessment methods for experienced physicians.
Medical Education 2002;36(10):910-7.
13.      Mandel L, Lentz G, Goff B. Teaching and evaluating surgical skills. Obstetrics and Gynecology
14.      Wilkinson J, Benjamin A, Wade W. Assessing the performance of doctors in training. BMJ
15.      Davies H, Archer J, Heard S. Assessment tools for Foundation Programmes—a practical guide. BMJ
Career Focus 2005;330(7484):195-6.
16.      Gray J. Global rating scales in residency education. Academic Medicine 1996;71(1):s55-63.
17.      Regehr G, MacRae H, Reznick R, Szalay D. Comparing the psychometric properties of checklists
and global rating scales for assessing performance on an OSCE-format Examination. Academic Medicine
18.      Reznick R, Regehr G, Yee G, Rothman A, Blackmore D, Dauphinee D. Process-rating forms versus
task-specific checklists in an OSCE for medical licensure. Academic Medicine 1998;73:S97–99.
19.      Reznick R. Teaching and testing technical skills. American Journal of Surgery 1993;165:358-61.

20.      Greaves J, Grant J. Watching anaesthetists work: using the professional judgement of consultants to
assess the developing clinical competence of trainees. British Journal of Anaesthesia 2000;84(4):525-33.
21.      Chisholm C, Whenmouth L, Daly E, Cordell W, Giles B, Brizendine E. An evaluation of emergency
medicine resident interaction time with faculty in different teaching venues. Academic Emergency Medicine
22.      Morris A, Hewitt J, Roberts C. Practical experience of using directly observed procedures, mini
clinical evaluation examinations, and peer observation in pre-registration house officer (FY1) trainees.
Postgraduate Medical Journal 2006;82:285-88.
23.      Griffiths CEM. Competency assessment of dermatology trainees in the UK. Clinical and
Experimental Dermatology 2004;29(5):571-575.
24.      Reznick R, Regehr G, McRae H, Martin J, McCulloch W. Testing technical skills via an innovative
"bench station" examination. American Journal of Surgery 1997;173(3):226-30.
25.      McKinley R, Fraser R, van der Vleuten C, Hastings A. Formative assessment of the consultation
performance of medical students in the setting of general practice using a modified version of the Leicester
Assessment Package. Medical Education 2000;34(7):573-9.
26.      Ram P, Grol R, Rethans J, Schouten B, van der Vleuten C, Kester A. Assessment of general
practitioners by video observation of communicative and medical performance in daily practice: issues of
validity, reliability and feasibility. Medical Education 1999;33(6):447–54.
27.      Paisley A, Baldwin P, Paterson-Brown S. Validity of surgical simulation for the assessment of
operative skill. British Journal of Surgery 2001;88(11):1525-32.
28.      Datta V, Bann S, Aggarwal R, Mandalia M, Hance J, Darzi A. Technical skills examination for
general surgical trainees. British Journal of Surgery 2006;93(9):1139-1146.
29.      Goff B, Nielsen P, Lentz G, Chow G, Chalmers R, Fenner D, et al. Surgical skills assessment: A
blinded examination of obstetrics and gynecology residents. American Journal of Obstetrics & Gynecology
30.      Goff BA, Lentz GM, Lee D, Houmard B, Mandel LS. Development of an objective structured
assessment of technical skills for obstetric and gynecology residents. Obstet Gynecol 2000;96(1):146-150.
31.      Martin J, Regehr G, Reznick R, Macrae H, Murnaghan J, Hutchison C, et al. Objective structured
assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery 1997;84(2):273-278.
32.      Kneebone R, Nestel D, Yadollahi F, Brown R, Nolan C, Durack J, et al. Assessing procedural skills
in context: exploring the feasibility of an Integrated Procedural Performance Instrument (IPPI). Medical
Education 2006;40(11):1105-1114.
33.      Darzi A, Smith S, Taffinder N. Assessing operative skill: needs to become more objective. BMJ
34.      Jowell PS, Baillie J, Branch MS, Affronti J, Browning CL, Bute BP. Quantitative Assessment of
Procedural Competence: A Prospective Study of Training in Endoscopic Retrograde
Cholangiopancreatography. Ann Intern Med 1996;125(12):983-989.


To top