Learning Center
Plans & pricing Sign in
Sign Out

Effect of Directed Training on Reader Performance for CT


									                                            Note: This copy is for your personal non-commercial use only. To order presentation-ready
                                            copies for distribution to your colleagues or clients, contact us at

                                                                                        Effect of Directed Training on
                                                                                        Reader Performance for CT
                                                                                        Colonography: Multicenter Study1

                           European Society of Gastrointestinal and

                                                                                             Purpose:    To define the interpretative performance of radiologists
                             Abdominal Radiology CT Colonography Study
                                                                                                         experienced in computed tomographic (CT) colonography
                             Group Investigators
                                                                                                         and to compare it with that of novice observers who had
                                                                                                         undergone directed training, with colonoscopy as the ref-
                                                                                                         erence standard.

                                                                                         Materials and   Physicians at each participating center received ethical
                                                                                             Methods:    committee approval and followed the committees’ re-
                                                                                                         quests regarding informed consent. Nine experienced ra-
                                                                                                         diologists, nine trained radiologists, and 10 trained tech-
                                                                                                         nologists from nine centers read 40 CT colonographic
                                                                                                         studies selected from a data set of 51 studies and modeled
                                                                                                         to simulate a population with positive fecal occult blood
                                                                                                         test results: Studies were obtained in eight patients with
                                                                                                         cancer, 12 patients with large polyp, four patients with
                                                                                                         medium polyp, and 27 patients without colonic lesions.
                                                                                                         Findings were verified with colonoscopy. An experienced
                                                                                                         radiologist used 50 endoscopically validated studies to
                                                                                                         train novice observers before they were allowed to partic-
                                                                                                         ipate. Observers used one software platform to read stud-
                                                                                                         ies over 2 days. Responses were collated and compared
                                                                                                         with the known diagnostic category for each subject. The
                                                                                                         number of correctly classified subjects was determined for
                                                                                                         each observer, and differences between groups were ex-
                                                                                                         amined with bootstrap analysis.

                                                                                              Results:   Overall, 28 observers read 1084 studies and detected 121
                                                                                                         cancers, 134 large polyps, and 33 medium polyps; 448
                                                                                                         healthy subjects were categorized correctly. Experienced
                                                                                                         radiologists detected 116 lesions; trained radiologists and
                                                                                                         technologists detected 85 and 87 lesions, respectively.
                                                                                                         Overall accuracy of experienced observers (74.2%) was
                                                                                                         significantly better than that of trained radiologists
                                                                                                         (66.6%) and technologists (63.2%). There was no signifi-
                                                                                                         cant difference (P     .33) between overall accuracy of
                                                                                                         trained radiologists and that of technologists; however,
                             The complete list of investigators and affiliations is                       some trainees reached the mean performance achieved by
                           listed at the end of this article. Received June 15, 2005;                    experienced observers.
                           revision requested August 15; revision received February
                           10, 2006; accepted March 6; final version accepted July          Conclusion:   Experienced observers interpreted CT colonographic im-
                           13. Supported by a grant from the European Association
                                                                                                         ages significantly better than did novices trained with 50
                           of Radiology administered by the European Society of
                           Gastrointestinal and Abdominal Radiology and a Kodak                          studies. On average, no difference between trained radiol-
                           Scholarship administered by the Royal College of Radiolo-                     ogists and trained technologists was found; however, indi-
                           gists, United Kingdom. Address correspondence to                              vidual performance was variable and some trainees out-
                           Steve Halligan, MD, FRCP, FRCR, Department of Radiol-                         performed some experienced observers.
                           ogy, University College London, Level 2 Podium, 235
                           Euston Rd, London NW1 2BU, England (e-mail: s.halligan
                                                                                                          RSNA, 2007

                               RSNA, 2007

                           152                                                                                                       Radiology: Volume 242: Number 1—January 2007
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                            ESGAR CT Colonography Study Group

      omputed tomographic (CT) colonog-        with colonoscopy serving as the refer-          quality, subjects from each center were
      raphy is available for colorectal can-   ence standard.                                  recruited in a strictly chronologically
      cer screening in the United States                                                       consecutive fashion. That is to say, con-
(1,2), and a survey revealed that 36% of                                                       secutive subjects were assigned to an
radiology departments in the United              Materials and Methods                         appropriate diagnostic category until all
Kingdom offered this service (3). To           Physicians at each participating center re-     four categories were full. Thus, the first
date, most research has focused on the         ceived ethical committee approval and           patient with cancer completed recruit-
technical capabilities of CT colonogra-        followed the committee’s requests re-           ment to this category, whereas three
phy; however, it is increasingly being         garding informed patient consent. Voxar         consecutive patients with large polyps
realized that observer experience and          (Edinburgh, Scotland) provided the soft-        were necessary to complete recruit-
training are equally important (4,5). A        ware that was used with laptop comput-          ment to this category. Patients with
study in which 18 radiologists were            ers, and E-Z-Em (Westbury, NY) pro-             multiple lesions were assigned to a cate-
asked to interpret CT colonographic im-        vided the two workstations and software         gory according to the largest lesion de-
ages revealed that observer performance        that were used at the trial office. The au-      tected, which was referred to as the
was related to prior experience (6).           thors had full control of all data and infor-   index lesion. To ensure that studies ac-
    At the time of this writing, there         mation submitted for publication.               curately reflected the natural and inevi-
were no evidence-based guidelines for                                                          table technical variation found in day-
training; however, a working group sug-        Data Set Composition and Accrual                to-day practice, centers were obliged to
gested that supervised interpretation of       We collated a data set of normal and            submit all eligible studies, with the ex-
at least 40 validated studies might be         abnormal CT colonographic studies sub-          ception of those that were deemed non-
adequate for this purpose (7). How-            mitted by the seven participating cen-          diagnostic (ie, any study in which the
ever, when this suggestion was tested          ters. In this data set, prevalence and          local principal investigator would nor-
on a small scale, it was shown that ob-        morphology of neoplasia were modeled            mally recommend repeat colonography
server response to such training is            to simulate those expected in patients          or another examination because of in-
highly unpredictable and that perfor-          with positive fecal occult blood test re-       surmountable technical problems, such
mance may even deteriorate (8). Also, a        sults (prevalence of cancer, 10%; prev-         as segmental collapse or retained fluid).
recent study revealed that some novice         alence of large polyps, 30%; prevalence         In all subjects, CT findings were defined
observers exposed to a training module         of medium polyps, 10%; prevalence of            by subsequent same-day colonoscopic
could outperform their more-experi-            normal colorectum, 50%) (10–12). The            findings obtained by experienced practi-
enced colleagues (9).                          aim of this procedure was to create a           tioners. Polyp size was based on the
    The European Society of Gastroin-          mix of studies with normal and abnor-           colonoscopic measurement, which was
testinal and Abdominal Radiology is in-        mal findings to investigate sensitivity for      estimated with adjacent biopsy forceps.
terested in developing evidence-based          different classes of neoplasia and speci-           Participating centers were chosen
guidelines for training and accrediting        ficity between observer groups. This             because they had active CT colonogra-
radiologists in the interpretation of CT       mix also ensured that the data set was          phy research programs at the time the
colonographic studies. With this goal in       clinically relevant.                            study protocol was developed and be-
mind, the purpose of our study was to               Physicians at each center were asked       cause they could contribute studies. We
define the interpretative performance           to submit 10 studies that were obtained         also stated that as long as studies were
of radiologists with experience in CT          in subjects aged 50 – 69 years and that         chronologically consecutive, centers could
colonography and to compare their per-         matched the expected prevalence of
formance with that of novice observers         neoplasia on the basis of fecal occult
who had undergone directed training,           blood test results (ie, one patient with        Published online
                                               cancer, three patients with large polyps,       10.1148/radiol.2421051000
                                               one patient with medium polyps, and
  Advances in Knowledge                                                                        Radiology 2007; 242:152–161
                                               five subjects with no colonic lesion) (10–
     Experienced observers inter-              12). The four diagnostic categories             Author contributions:
     preted CT colonography studies            were as follows: cancer, large polyps,          Guarantors of integrity of entire study, S.H., D.B.; study
                                                                                               concepts/study design or data acquisition or data analy-
     significantly better than did novice       medium polyps, and normal. In line with
                                                                                               sis/interpretation, all authors; manuscript drafting or
     readers trained with 50 studies.          the results of fecal occult blood test tri-     manuscript revision for important intellectual content, all
     On average, we found no differ-           als, a large polyp was defined as a polyp        authors; manuscript final version approval, all authors;
     ence between trained radiologists         measuring 10 mm or more in diameter             literature research, S.H., D.B., W.A.; clinical studies, S.H.,
     and trained technologists; how-           and a medium polyp was defined as a              D.B., H.F., R.F., S.T., D.N., J.W.B., J.F., M.P., V.V.d.H.,
     ever, individual performance was          polyp measuring less than 10 mm in di-          P.L., J.M., G.D., A.O., S.F., E.N., P.V., R.I., F.M., D.R.;
     variable and some trainees out-           ameter (6 –9 mm in diameter for the             statistical analysis, S.H., D.G.A., P.B.; and manuscript
                                                                                               editing, S.H., D.B., W.A., C.B., H.F., A.L., J.S.
     performed some experienced                purposes of this study).
     observers.                                     To reflect normal variation in data         See Materials and Methods for pertinent disclosures.

Radiology: Volume 242: Number 1—January 2007                                                                                                             153
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                      ESGAR CT Colonography Study Group

submit retrospective studies, provided         could not be opened at the trial office.       tional CT (in which trainees generally
the technical stipulations for CT colonog-     The three remaining centers submitted         learn by using studies acquired locally).
raphy were satisfied. Five centers submit-      five, four, and three studies because of       However, we did stipulate that 50 indi-
ted only retrospective data, and two cen-      difficulties satisfying protocol require-      vidual studies should be interpreted; in-
ters submitted only prospective data. Eth-     ments, notably, those related to age.         terpretation was to be unaided initially
ical permission for data sharing was           Thus, our study included 51 subjects, of      and then followed by face-to-face dis-
covered by the local stipulations at each      whom 27 (53%) had no colonic lesion           cussion with the local trainer on a pa-
center. Four centers that submitted ret-       and 24 (47%) had an index lesion. Of          tient-by-patient basis, so as to closely
rospective data did not require additional     the 24 patients with an index lesion,         mimic standard day-to-day training
specific ethical committee approval for         eight had cancer, 12 had large polyps,        practice. Trainers and trainees used
this study because data were collected as      and four had medium polyps. Seven             the preferred local reading platform,
part of a local study, and ethical commit-     (29%) of the patients with an index le-       in line with everyday practice. We
tee approval and patient informed con-         sion had a second lesion: Two patients        stipulated that training should occur
sent were applicable for additional analy-     with cancer each had an additional large      over several separate sessions and
ses and data sharing. The fifth center that     polyp, and one patient with cancer and        several weeks to reflect standard teach-
submitted retrospective data obtained          four patients with large polyps each had      ing practice.
ethical committee approval and verbal          an additional medium polyp.
consent via telephone from the subjects                                                      Reading Conditions and Outcome
selected. Physicians at the two centers        Observers                                     Measures
that submitted prospective data obtained       The data set was interpreted by the fol-      After training, an individualized test
patient informed consent, ethical com-         lowing three groups of observers: expe-       data set of 40 studies was prepared by
mittee approval, and permission for addi-      rienced radiologists and trained radiolo-     the trial coordinator for each participat-
tional analyses and data sharing with on-      gists and radiologic technologists. Nine      ing center. These 40 studies were sam-
going studies, provided patient identifying    centers (including all seven that submit-     pled from the data set of 51 studies and
information was removed before data            ted imaging studies) provided an ob-          balanced in terms of the prevalence of
sharing. We applied this stipulation to all    server for each group; one center pro-        abnormalities; studies submitted by a
data collected for this study.                 vided two technologists.                      center were excluded from the data set
    Images were acquired with the pa-               An experienced radiologist was de-       sent to that center. The order of studies
tient in the prone and supine positions        fined as a radiologist who had consider-       was randomized to mix abnormal and
with full bowel purgation, a collimation       able practical and/or research experi-        normal cases, and all readers read the
that was no greater than 2.5 mm, and           ence with CT colonography prior to this       studies in the same order. All patient
use of a multi– detector row CT scan-          study. Individual experience ranged from      identifiers were removed. The experi-
ner. Gas insufflation and spasmolytic           evaluation of 325 to evaluation of 1200       enced radiologist (n 9), trained radi-
use were left to the discretion of local       studies (median, 750 studies), with be-       ologist (n     9), and trained technolo-
physicians. Low-radiation-dose proto-          tween 120 and 600 studies (median,            gist(s) (n     10) at each center then
cols were permissible, but administra-         200 studies) validated with colonos-          interpreted this data set over 2 days.
tion of intravenous contrast material          copy.                                         The trial coordinator visited each center
was impermissible on the grounds that               Each experienced radiologist identi-     to supervise reading, which was con-
contrast agents are unlikely to be used        fied a local radiologist and radiologic        ducted with individual laptop computers
in a screening program. Fecal tagging          technologist who had interpreted 10 or        equipped with 17-inch (43.18-cm) screens
was not permitted since this procedure         fewer studies prior to this study. We         and software that allowed a primary
was not common practice at the time of         stipulated that radiologists be familiar      two-dimensional analysis, with three-di-
data accrual (May to November 2003).           with the interpretation of standard ab-       mensional analysis available for prob-
Studies were archived on a compact             dominopelvic CT studies and that tech-        lem solving (Voxar ColonScreen, ver-
disk and transferred to the trial office;       nologists be familiar with the acquisition    sion 2.2; Barco, Edinburgh, Scotland).
technical and diagnostic category data         of abdominopelvic CT studies. The ex-         Observers were familiarized with the
were included for each study.                  perienced radiologists used normal and        software, when necessary, and the su-
    Of the seven centers that submitted        abnormal studies that had been ac-            pervisor was available at all times.
data, three provided data only from            quired locally and verified with subse-        Reading was performed in a quiet envi-
symptomatic patients, one provided             quent colonoscopy to train inexperi-          ronment with ambient light. Observers
data only from asymptomatic patients,          enced radiologists and technologists to       were asked to read at their own pace,
and three provided data from both              interpret CT colonographic images.            with no requirement to finish within a
symptomatic and asymptomatic pa-               There was no attempt to use the same          prespecified time. Observers had read
tients. Four centers submitted studies         training data set at all participating cen-   the study protocol and knew that stud-
obtained in a full data set of 10 subjects;    ters because we wanted to emulate ex-         ies obtained at their own institutions (if
however, the file containing one study          isting training programs for conven-          any) had been excluded, but they had no

154                                                                                                 Radiology: Volume 242: Number 1—January 2007
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                                      ESGAR CT Colonography Study Group

specific information about the composi-                  study protocol to avoid including nondi-             determined. Individual and group per-
tion of their individualized data set.                  agnostic studies.                                    formance was determined by calculat-
     Observers used a data sheet to cate-                   Data sheets were collated, and ob-               ing the number and percentage of
gorize each subject as either healthy or                servers’ responses were compared with                studies in which the index lesion (and
unhealthy. Subjects designated as un-                   the known diagnostic category. The trial             second lesion in seven patients) was
healthy were further categorized as hav-                coordinator (who had experience with                 correctly identified and the number and
ing cancer, a large polyp, or a medium                  more than 300 endoscopically verified                 percentage of normal studies that were
polyp. Large polyps had a maximal two-                  studies) independently evaluated each                correctly categorized. The number of
dimensional transverse diameter of 10                   study to confirm both the CT findings                  false-positive classifications in healthy
mm or larger, whereas medium polyps                     reported by the submitting center and                subjects and in patients known to have
had a diameter of 6 –9 mm; software                     the CT coordinates of the abnormality,               an index lesion was determined.
calipers were used to obtain these mea-                 which were then used to determine                         Two measures were derived for
surements. Observers noted any polyp                    whether observers’ responses were                    each reader: sensitivity for lesions (num-
that measured 5 mm or less but catego-                  true-positive or false-positive. All but             ber of lesions correctly seen divided by
rized subjects with such polyps as                      one of the endoscopically validated le-              number of lesions present) and accu-
healthy; this practice allowed false-neg-               sions could be identified. However, ob-               racy (the overall percentage of correct
ative findings due to measurement error                  servers encountered difficulty locating               categorizations). For both measures,
to be distinguished from false-negative                 four flat adenomas (which measured 40,                studies classified as technically inade-
findings due to perceptual error. Ob-                    30, 15, and 12 mm in diameter), one of               quate by readers were included. For the
servers were unaware of each other’s                    which was only visible when standard                 most part, observers read the same
responses. Prone and supine image co-                   abdominal CT window settings were                    studies and observations were corre-
ordinates and segmental location were                   used (window level, 40 HU; window                    lated to some extent; therefore, a boot-
recorded for each perceived abnormal-                   width, 400 HU) (Fig 1). One flat ade-                 strap analysis was used to investigate
ity so that false-positive responses could              noma (40 mm) could not be identified                  differences between observer groups. A
be distinguished from true-positive re-                 despite good bowel preparation and dis-              total of 1999 samples were redrawn
sponses in the same patient. Multiple                   tention and a thorough review of endo-               randomly from the original sample, with
responses were possible. There were                     scopic data.                                         replacement and analysis of each result-
six bowel segments (rectum, sigmoid                                                                          ant data set. The results of interest
colon, descending colon, transverse co-                                                                      were calculated for each bootstrap sam-
lon, ascending colon, and cecum), and                   Statistical Analysis                                 ple, and the distribution of values was
observers were provided with an anno-                   Observer responses were compared                     used to obtain a bootstrap confidence
tated diagram of segmental definitions.                  with the known diagnostic category and               interval. A probability value was also
Observers were free to classify a study                 lesion coordinates; the numbers of true-             calculated by considering how many of
as technically inadequate, although steps               positive, true-negative, false-positive,             the values were farther from zero than
had been taken when designing the                       and false-negative classifications were               the actual value observed with the data.
                                                                                                             Results were considered statistically sig-
                                                                                                             nificant at a probability level of 5%. Sta-
  Figure 1                                                                                                   tistical analysis was performed with
                                                                                                             Stata, version 8.0, software (Stata, Col-
                                                                                                             lege Station, Tex).

                                                                                                             The 28 observers read a total of 1084
                                                                                                             individual studies; 22 (79%) readers
                                                                                                             (including all nine experienced observ-
                                                                                                             ers) read all 40 studies assigned to
                                                                                                             them, two read 39, one read 37, one
                                                                                                             read 35, and two read 27 because of
                                                                                                             time constraints.
                                                                                                                 Overall, 736 (68%) patients were
  Figure 1: Transverse CT images acquired with a four– detector row scanner (100 mA, 120 kV) in a 62-year    correctly classified (Table 1). The num-
  old woman with a 12-mm-diameter sigmoid flat adenoma. All observers missed this polyp. (a) The adenoma      ber of lesions correctly classified de-
  (arrows) is barely visible with standard CT colonography window settings (window level 150 HU; window
                                                                                                             clined in conjunction with decreased
  width, 1500 HU). (b) The adenoma (arrows) is more clearly visible with standard abdominal CT window set-
                                                                                                             size of the index lesion: Cancer was de-
  tings (window level, 40 HU; window width, 400 HU).
                                                                                                             tected in 121 (79%) patients, large pol-

Radiology: Volume 242: Number 1—January 2007                                                                                                         155
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                                               ESGAR CT Colonography Study Group

                                                 Table 1
yps were detected in 134 (47%), and
medium polyps were detected in 33                  Relationship between Patient Category and Observer Assessment for All Observer
(36%) (Table 1). In the remaining 348              Groups Combined
patients, the index lesion was missed in
239, findings were false-positive in 73,                                      Correct               Incorrect            False-Positive        Inadequate
and studies were deemed technically in-            Patient Category          Classification         Classification        Finding               Study            Total
adequate in 36. Of the 36 technically
                                                    Cancer                   121 (79)               33 (21)             ...                    0                154
inadequate studies, 23 (64%) related to
                                                    Large polyp              134 (47)              147 (52)             ...                    2                283
one subject who had no colonic lesion.
                                                    Medium polyp              33 (36)               59 (64)             ...                    0                 92
Of the 13 other technically inadequate
                                                    No colonic lesion        448 (81)              ...                  73                    34                555
studies, 11 related to subjects who had
no colonic lesion. Overall, the false-pos-             Total                 736                   239                  73                    36               1084
itive rate was 13% (22 of 176 studies)             Note.—Data are numbers of studies. Data in parentheses are percentages.
for experienced radiologists, 12% (21 of
169 studies) for trained radiologists,
and 16% (30 of 188 studies) for technol-
ogists. One experienced radiologist and          Figure 2
two technologists did not assign any
false-positive diagnoses. Six readers
(one experienced observer, two radiol-
ogists, and three technologists) had
false-positive rates of 20% or more.

Observer Performance
Overall, more lesions were detected by
experienced radiologists (66%) than by
trained radiologists (51%) or technolo-
gists (47%) (Table 2). This was also the
case when all subgroups of lesions were
considered individually.
    Subset analysis revealed that some
polyps were clearly more difficult to de-
tect than others; this phenomenon ap-
plied across all observer groups. For ex-
ample, in the 12 patients whose index            Figure 2: Transverse CT images acquired with a four– detector row scanner (100 mA, 120 kV) in a 66-year
lesion was a large polyp, two polyps             old woman with an 8-mm adenoma in the descending colon. This polyp was visible on only those images
were missed by all 24 observers who              obtained with the patient in the prone position, and it was missed by all readers except one experienced ob-
read these two studies. The large polyp          server. (a) Polyp (arrow) is visible with CT colonography window settings (window level, 150 HU; window
                                                 width, 1500 HU). (b) Three-dimensional volume-rendered endoluminal view shows this same polyp (arrow).
in two of these 12 patients was identi-
fied and categorized correctly by only
one observer (an experienced reader).
All four of these difficult-to-detect pol-      Accuracy and Sensitivity                                        performed best on average, there was
yps were morphologically flat. In the           In regard to the bootstrap analysis,                            considerable overlap between the ob-
four patients whose index lesion was a         overall accuracy and sensitivity values                         server groups when individual perfor-
medium polyp, only one (4%) of 23 ob-          were significantly higher for experi-                            mance was considered (Figs 3, 4); for
servers identified the index lesion in one      enced radiologists than for trained radi-                       example, the level of accuracy achieved
study (Fig 2), and only two (10%) of 21        ologists or technologists (Table 3). This                       by one technologist and two trained ra-
observers identified the index lesion in        was the case for all analyses, regardless                       diologists was higher than the mean ac-
another study. Thus, there were two            of whether the six difficult-to-detect pol-                      curacy achieved by experienced observ-
difficult-to-detect medium polyps. Over-        yps were included. However, there was                           ers.
all, the six difficult-to-detect polyps         no significant difference in measures of
(four large and two medium polyps) had         accuracy or sensitivity when the trained                        Secondary Lesions
considerable influence on our results           radiologists were compared with the                             The ability of observers to detect the
and decreased accuracy for observer            trained technologists. Although these                           seven secondary lesions is shown in Ta-
groups and individuals.                        results show that experienced readers                           ble 4. The secondary lesion was a large

156                                                                                                                          Radiology: Volume 242: Number 1—January 2007
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                                                                      ESGAR CT Colonography Study Group

polyp in 14 of the studies read by expe-                           ori experience exceeded 1000 studies                                der the receiver operating characteris-
rienced radiologists, 14 of the studies                            (four individuals) with those whose ex-                             tic curve was 0.80 for the most experi-
read by trained radiologists, and 15 of                            perience did not exceed 1000 studies                                enced readers and 0.77 for the least
the studies read by technologists. This                            (six individuals) and found no significant                           experienced readers (6); this is a small
lesion was detected by all of the experi-                          difference: Rates for detection of can-                             difference in relative terms. Highly ex-
enced radiologists, 13 (93%) of the                                cer, large polyps, and medium polyps                                perienced individuals agree that specific
trained radiologists, and 12 (80%) of                              were 100%, 56%, and 43%, respec-                                    and supervised training is a prerequisite
the technologists (Table 3). The second-                           tively, for readers with the most experi-                           for acceptable performance (7). More-
ary lesion was a medium polyp in 35 of                             ence and 86%, 58%, and 53%, respec-                                 over, they specified that such training
the studies read by experienced radiolo-                           tively, for readers with the least experi-                          should involve interpretation of 40 –50
gists, 33 of the studies read by radiolo-                          ence. False-positive rates were also                                endoscopically validated studies. How-
gists, and 38 of the studies read by tech-                         similar (80% for readers with the most                              ever, there is little evidence to support
nologists. This lesion was detected by                             experience vs 84% for readers with the                              or refute this recommendation. Our
13 (37%) experienced radiologists,                                 least experience).                                                  data show that the overall sensitivity
seven (21%) trained radiologists, and                                                                                                  of novice observers trained with this
six (16%) technologists.                                                                                                               scheme is significantly inferior to that of
                                                                        Discussion                                                     experienced observers. Both groups of
Prior Experience Levels                                            Unsurprisingly, prior experience en-                                trained observers detected approxi-
We performed a subset analysis to com-                             hances performance. Investigators in a                              mately 70% of cancers, whereas experi-
pare experienced readers whose a pri-                              prior study found the average area un-                              enced observers detected 92% of can-

  Table 2

    Summary of Lesion Detection Rates according to Observer Experience
                                                    All Lesions                                 Cancer                            Large Polyps                           Medium Polyps
    Observer Group                         Seen                   Missed            Seen                 Missed            Seen              Missed               Seen             Missed

     Experienced radiologists              116 (66)               60 (34)           47 (92)               4 (8)            54 (57)              40 (43)           15 (48)             16 (52)
     Trained radiologists                   85 (51)               82 (49)           34 (71)              14 (29)           42 (47)              48 (53)            9 (31)             20 (69)
     Trained technologists                  87 (47)               97 (53)           40 (73)              15 (27)           38 (39)              59 (61)            9 (28)             23 (72)

    Note.—Data are numbers of lesions. Data in parentheses are percentages.

  Table 3

    Observer Accuracy and Sensitivity for All Lesions and When Six Difficult-to-Detect Polyps Were Excluded
    A: Accuracy and Sensitivity for Each Observer Group
                                                                                           Overall Accuracy                                                              Sensitivity Excluding
    Observer Group                                  Overall Accuracy                       Excluding Difficult Cases                      Sensitivity                     Difficult Cases

     Experienced radiologists                       74.2                                   83.7                                          65.5                            85.3
     Trained radiologists                           66.6                                   76.9                                          50.7                            69.7
     Trained technologists                          63.2                                   72                                            47.3                            63.5
    B: Difference in Accuracy and Sensitivity between Observer Groups
                                                                            Difference in Overall                                                           Difference in
                                       Difference in                        Accuracy Excluding                         Difference in                        Sensitivity Excluding
    Group Comparison                   Overall Accuracy           P Value   Difficult Cases                  P Value    Sensitivity               P Value    Difficult Cases            P Value

     Experienced radiologists vs
        trained radiologists             7.6 (1.2, 14.3)          .017        6.8 (0.5, 13.1)               .035       14.9 (4.3, 25.2)          .007       15.6 (5.7, 25.8)          .004
     Experienced radiologists vs
        trained technologists          11.0 (0.5, 17.7)           .003      11.7 (5.2, 17.9)                .001       18.2 (8.3, 28.3)          .002       21.9 (6.2, 16.9)          .001
     Trained radiologists vs
        trained technologists            3.4 ( 3.4, 10.1)         .33         4.9 ( 1.8, 11.7)              .16          3.4 ( 7.2, 13.7)        .52         6.1 ( 0.8, 10.7)         .31

    Note.—Unless otherwise indicated, data are percentages. Data in parentheses are 95% confidence intervals. Accuracy refers to the correct classification of patients with and without lesions,
    whereas sensitivity refers to detection of cancer and polyps only.

Radiology: Volume 242: Number 1—January 2007                                                                                                                                                     157
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                                                       ESGAR CT Colonography Study Group

 Figures 3, 4

 Figure 3: Graph shows overall observer accuracy. Mean values were 74.2%                 Figure 4: Graph shows observer accuracy after six difficult-to-detect lesions
 for experienced observers, 66.6% for trained radiologists, and 63.2% for technol-       were excluded. Mean values were 83.7% for experienced observers, 76.9% for
 ogists.                                                                                 trained radiologists, and 72.0% for technologists.

cers. This discrepancy seems to suggest
                                                           Table 4
that the training we administered was
inadequate.                                                  Ability to Detect a Secondary Lesion in Patients with More than One Lesion
    However, it could be argued that
                                                                                                      All Lesions                  Large Polyps               Medium Polyps
our proposition that trainees should
                                                             Observer Group                    Seen            Missed       Seen             Missed        Seen       Missed
achieve the competence of experienced
observers is flawed. A distinction can be                      Experienced radiologists         27 (55)        22 (45)       14 (100)        0 (0)          13 (37)       22 (63)
made between “best achievable” and                            Trained radiologists             20 (43)        27 (57)       13 (93)         1 (7)           7 (21)       26 (79)
“acceptable” performance. On average,                         Trained technologists            18 (34)        35 (66)       12 (80)         3 (20)          6 (16)       32 (84)
subspecialist radiologists perform bet-                      Note.—Data are numbers of observations. Data in parentheses are percentages. Seven patients had a second lesion (ie, a
ter than their generalist peers because                      lesion other than the index lesion).
subspecialists are able to make deci-
sions on the basis of prior experience
(13). Whether all radiologists interpret-                considered a general examination de-                             ing because patients are asymptomatic
ing CT colonographic studies need to be                  spite compelling evidence that interpre-                         and lesions are often difficult to detect
as capable as those with extensive expe-                 tation of barium enema studies is best                           (19). The same principle might apply to
rience is a question to be answered by                   handled by those with extensive experi-                          CT colonography. Like mammography,
the wider radiologic community. The                      ence (16). Furthermore, since the in-                            colonography may be used to examine
answer will depend on whether the ex-                    ception of CT colonography, the diag-                            symptomatic patients (who actually
amination is performed by generalists                    nostic performance of this modality has                          constitute the largest group that under-
or subspecialists. A number of observa-                  been compared with that of colonos-                              goes this procedure in research stud-
tions will indicate this. First, subspecial-             copy, which is reportedly a more effec-                          ies). Symptomatic colonic lesions tend
ization has affected radiology since the                 tive test than a barium enema examina-                           to be larger and easier to detect than
1920s (14); since then, it has become                    tion (17). Comparisons between skilled                           asymptomatic lesions; thus, it may be
more prevalent for a number of rea-                      colonoscopists and less-skilled colonog-                         possible that less interpretative skill is
sons, not the least of which is that it is               raphers damage the reputation of CT                              needed to detect symptomatic lesions.
thought to benefit patients (15). The ul-                 colonography (18).                                               This hypothesis is supported by our
timate position of CT colonography as a                      It may be possible to stratify accept-                       findings, which show that detection
specialist examination is thus more                      able performance contingent on the                               rates increased in line with lesion size
likely today than it was previously. A                   clinical setting. For example, it has been                       for all observer groups. However, while
parallel may be drawn with a barium                      argued that for mammography, the                                 cancers were the easiest index lesions
enema examination, which is widely                       highest aptitude is necessary for screen-                        for the trained observers to detect,

158                                                                                                                                  Radiology: Volume 242: Number 1—January 2007
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                    ESGAR CT Colonography Study Group

whether a potential patient or health-         two groups. This suggests that the para-     twofold: Most obviously, detection rates
policy maker would be satisfied with a          digm for interpretation of CT colono-        were reduced. Also, flat adenomas di-
70% average detection rate is a subject        graphic studies differs from that for in-    minished our power to discriminate be-
for wider debate.                              terpretation of routine CT studies; thus,    tween groups because they present a
     It was not the aim of our study to        radiologists may not have an intrinsic       challenge to all observers (28). How-
investigate the performance character-         advantage (unless we also consider the       ever, they can be detected if observers
istics of CT colonography. Rather, we          detection of extracolonic lesions, which     are careful in their interpretation (28);
aimed to determine the performance of          we chose not to address). This may be        for example, one experienced observer
novice observers relative to that of ex-       explained by the fact that one organ is      identified two flat adenomas. The pro-
perienced observers after novice ob-           being examined for one disease (ie, neo-     portion of flat adenomas should be re-
servers had undergone training with a          plasia); therefore, an extensive medical     ported in future studies of CT colonog-
schedule that was in line with proposed        knowledge base confers no substantial        raphy.
guidelines (7). With this approach, no         advantage. Furthermore, the skills re-           Our study did have limitations. We
aspect of individual aptitude is taken         quired for colonic navigation are differ-    originally intended that all participating
into account. It is inevitable that some       ent from those used to interpret con-        centers would contribute studies ob-
individuals will outperform others de-         ventional CT studies, and interpretation     tained in 10 patients; however, not all
spite similar professional backgrounds         takes longer, with a greater potential       centers did this. Three centers did not
and training. Our data revealed consid-        for observer fatigue and error (20). Our     contribute any studies because they
erable overlap in individual perfor-           data possibly support the concept that       could not satisfy protocol stipulations.
mance between all groups. Notably,             radiographic technologists may be a              Although the data set was designed
there were two trained radiologists and        valuable resource for interpretation of      to reflect what might be expected in a
one trained technologist whose accu-           studies, especially when radiologists are    fecal occult blood test screening pro-
racy exceeded the mean accuracy achieved       in short supply. This is already the case    gram, it was by necessity a simulation
by the experienced observers. Conversely,      for interpretation of barium enema           and can be regarded as a convenience
the accuracy of one experienced ob-            studies, and it is a cost-effective mea-     sample. Assumptions for the bootstrap
server was below the mean accuracy             sure (21,22).                                analysis best suit a random sample. For
achieved by both trained groups, even               Although our primary aim was to         example, cancers detected at screening
after difficult-to-detect lesions were ex-      assess the relative performance of expe-     are in an earlier stage than those that
cluded. Our data suggest that compe-           rienced and trained observers, we            are detected in patients who present
tence might be achieved by certain tal-        should explore the reasons behind the        with symptoms (10–12). Conversely,
ented individuals after they complete a        overall detection rate of only 57% of        adenomas detected with the fecal occult
training program based on 50 validated         large polyps, which lags behind that in      blood test are larger than those in
studies. Merely completing such train-         some studies (23) and meta-analyses          asymptomatic patients (10–12). We
ing is insufficient to guarantee compe-         (24,25). This was undoubtedly influ-          have discussed the difficulties posed by
tency, and it is self-evident that compe-      enced by the disproportionately high         the proportion of flat adenomas.
tent individuals will need to be identified     percentage of flat adenomas (a third of           Reading conditions were, by neces-
in some other way, possibly with an ex-        large polyps were flat, and one was in-       sity, artificial. Image interpretation in-
amination. Again, this is a subject for        visible on CT images, even in retro-         duces fatigue (6), and practitioners are
wider debate. It should be noted that          spect), and it may not translate to series   currently unlikely to read 20 studies per
because our sample data set was rela-          that are more representative of the gen-     day. However, this was a pragmatic ne-
tively small, the observed variability be-     eral population. The findings of large        cessity for this study, and this paradigm
tween observers will likely exceed the         series in which dye-spray colonoscopy        has been adopted successfully in other
real variability because of sampling er-       was used suggest that 13%–15% of             high-profile studies that have involved
ror. A larger study would likely reveal        large adenomas are flat (26,27). Ironi-       large numbers of observers from sev-
performance that regressed toward the          cally, the higher percentage of flat ade-     eral centers (6). Our original intention
mean value for each group. Because of          nomas in our study was a result of our       was for observers to use their preferred
this, it would be unwise to overempha-         attempts to make the data set reflect         software platform, but difficulties up-
size the performance of individuals in         conditions in everyday practice. We          loading studies prevented this. Instead,
the present study.                             prevented investigators from submitting      we assembled the data set onto laptop
     Considering aptitude further, on av-      only their best studies by stipulating       computers that could be transported
erage, we found no difference between          that studies be accrued in a chronologi-     easily to each center. These computers
the trained radiologists and the trained       cally consecutive fashion. Some contrib-     had high-resolution screens, and the
technologists, despite the radiologists’       uting centers had ongoing research re-       software used a two-dimensional ap-
relative wealth of interpretative experi-      lating to hereditary cancer, which in-       proach, with a three-dimensional ap-
ence with CT. Also, the range of individ-      creased the prevalence of flat lesions in     proach available for problem solving;
ual abilities was similar between these        our study. The consequence of this was       this was the preferred method of analy-

Radiology: Volume 242: Number 1—January 2007                                                                                       159
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                            ESGAR CT Colonography Study Group

sis for the majority of experienced read-      may have been increased. Also, not all             Gasthuis, Amsterdam, the Netherlands); Phil-
ers at the time of the study. All normal       observers read the same studies to pre-            ippe Lefere, MD, Jesse Marrannes, and Guido
                                                                                                  Dessey, MD (Stedelijk Ziekenhuis, Roeselare,
software functions were preserved on           vent recall bias due to interpretation of
                                                                                                  Belgium); Helen Fenlon, MD, Alan O’Hare, MD,
the computers. Because some readers            studies obtained at an observer’s own              and Shane Foley (Mater Misericordiae Univer-
had been trained to use another plat-          center; however, we did balance the                sity Hospital, Dublin, Ireland); Emmanuele Neri,
form locally, we ensured that the soft-        prevalence of abnormalities across all             MD, Paola Vagli, MD, and Benedetta Politi (Uni-
ware used in this study was easy to            data sets so that they would remain                versity of Pisa, Pisa, Italy); Riccardo Iannaccone,
learn, and the study supervisor was            comparable.                                        MD, Filipo Mangiapane, MD, and Sante Ori (La
available at all times to help, if neces-          In conclusion, experienced observ-             Sapienza, Rome, Italy); and Teresa Gallo, MD,
                                                                                                  Giulia Nieddu, MD, Saverio Signoretta, and
sary. While there is some evidence that        ers asked to interpret CT colonographic
                                                                                                  Daniele Regge, MD (Candiolo Oncologic Hospi-
the type of software platform used does        studies performed significantly better              tal, Turin, Italy).
not influence accuracy (6), it is possible      on average than did novice observers
that accuracy may have improved if a           who were trained with 50 endoscopi-                References
primary three-dimensional approach             cally validated studies. However, indi-
                                                                                                   1. Kalish GM, Bhargavan M, Sunshine JH, For-
had been available (23). However, it           vidual performance is variable, and                    man HP. Self referred whole-body imaging:
should be stressed that we aimed to in-        some trainees may outperform some                      where are we now? Radiology 2004;233:
vestigate the relative performance of          experienced radiologists. On average,                  353–358.
observers and not the confounding ef-          we found no performance difference be-              2. Illes J, Fan E, Koenig BA, Raffin TA, Kann D,
fect of the software platform. Whether         tween trained radiologists and trained                 Atlas SW. Self-referred whole-body CT
the reading platform used has a differ-        radiographic technologists, which sug-                 imaging: current implications for health care
ential effect on experienced observers         gests that prior interpretation of con-                consumers. Radiology 2003;228:346 –351.
versus trained readers clearly merits          ventional abdominal CT studies may not              3. Burling D, Halligan S, Taylor SA, Usiskin S,
further research. The use of laptop            be of benefit for interpretation of CT                  Bartram CI. CT colonography practice in the
computers also meant that study load-          colonographic studies.                                 United Kingdom: a national survey. Clin Ra-
ing times were longer than those of a                                                                 diol 2004;59:39 – 43.
                                               Acknowledgments: The investigators are grate-
workstation, and this may have frus-                                                               4. Halligan S, Taylor SA, Burling D. Virtual
                                               ful to Voxar and E-Z-Em for providing worksta-
trated some readers.                           tions and CT colonography interpretation soft-
                                                                                                      colonoscopy [letter]. JAMA 2004;292:432.
     Investigators have examined the ef-       ware.                                               5. Ferrucci J, Barish M, Choi R, et al. Virtual
fect of implementing an identical train-       European Society of Gastrointestinal and Ab-           colonoscopy [letter]. JAMA 2004;292:431–
ing schedule for novice observers, with        dominal Radiology (ESGAR) CT Colonography              432.
use of a teaching file and test set (29);       Study Group Investigators: Principal investiga-
                                                                                                   6. Johnson CD, Toledano AY, Herman BA, et
however, we decided to leave the pa-           tor: Steve Halligan, FRCR (University College
                                                                                                      al. Computerized tomographic colonography:
tient selection and training schedule          London). Trial coordinator and data manager:
                                                                                                      performance evaluation in a retrospective
                                               David Burling, FRCR (St Mark’s Hospital, Lon-
largely to the discretion of the local                                                                multicenter setting. Gastroenterology 2003;
                                               don, England). Writing committee: Steve Halli-
trainer (beyond stipulations relating to                                                              125:688 – 695.
                                               gan, FRCR; David Burling, FRCR, Wendy Atkin,
the number of studies and length of            PhD, and Clive Bartram, FRCR (St Mark’s Hos-        7. Soto JA, Barish MA, Ferrucci JT. CT
training) because we thought this would        pital); Helen Fenlon, MD (Mater Misericordiae          colonography interpretation: guidelines for
better reflect current teaching practice.       University Hospital, Dublin, Ireland); Andrea          training courses [abstr]. In: Radiological So-
As a result, differences in performance        Laghi, MD (La Sapienza, Rome, Italy); and Jaap         ciety of North America scientific assembly
                                               Stoker, MD (Amsterdam Medical Centre, Am-              and annual meeting program. Oak Brook, Ill:
potentially could be explained by varia-                                                              Radiological Society of North America,
                                               sterdam, the Netherlands). Statisticians: Doug-
tions in the quality of local training,        las G. Altman, DSc (Cancer Research UK/NHS             2004; SSQ09-07.
which are precisely what occur in resi-        Centre for Statistics in Medicine, Wolfson Col-
                                                                                                   8. Taylor SA, Halligan S, Burling D, et al. CT
dency programs in general. For exam-           lege, Oxford, England) and Paul Bassett, BSc
                                                                                                      colonography: effect of experience and train-
ple, some trainers may have empha-             (Statistical Consultant, Ruislip, Middlesex, En-
                                                                                                      ing on reader performance. Eur Radiol 2004;
sized the importance of careful soft-          gland). ESGAR liaison: Roger Frost, FRCR
                                               (Salisbury NHS Trust, Salisbury, England). Study
tissue reading when looking for flat
                                               readers and local coordinators: Stuart Taylor,      9. Rockey DC, Paulson E, Niedzwiecki D, et al.
lesions, whereas other trainers may not        FRCR (University College London); Clive Bar-           Prospective comparison of colon imaging
have stressed this point. Whether an           tram, FRCR, Lesley Honeyfield, DCR, and                 tests: a determination of the relative sensi-
identical training scheme and materials        Melinda De Villiers, DCR (St Mark’s Hospital);         tivity of air contrast barium enema, com-
administered via a training course are         David Nicholson, FRCR, Velauthan Rudraling-            puted tomographic colonography, and colonos-
superior to more prolonged but less            ham, FRCR, and Lisa Renaut, DCR (Hope Hospi-           copy. Lancet 2005;365:305–311.
                                               tal, Salford, England); Clive Kay, FRCR, Andy
standardized local training is a subject                                                          10. Hardcastle JD, Chamberlain JO, Robinson
                                               Lowe, FRCR, and Jane Williams-Butt, DCR
that needs further investigation.              (Royal Infirmary, Bradford, England); Jasper
                                                                                                      MH, et al. Randomised controlled trial of
     We have already stated that be-                                                                  faecal-occult-blood screening for colorectal
                                               Florie, MD, and Martin Poulus (Academic Medi-
                                                                                                      cancer. Lancet 1996;348:1472–1477.
cause our data set was relatively small,       cal Center, Amsterdam, the Netherlands); Vic-
observed variability between readers           tor Van der Hulst, MD (Onze Lieve Vrouwe           11. Kronborg O, Fenger C, Olsen J, Jorgensen

160                                                                                                       Radiology: Volume 242: Number 1—January 2007
GASTROINTESTINAL IMAGING: Directed Training and Reader Performance                                                 ESGAR CT Colonography Study Group

    OD, Sondergaard O. Randomised study of          18. Cotton PB, Durkalski VL, Pineau BC, et al.     24. Sosna J, Morrin MM, Kruskal JB, Lavin PT,
    screening for colorectal cancer with faecal-        Computed tomographic colonography (vir-            Rosen MP, Raptopoulos V. CT colonography
    occult-blood test. Lancet 1996;348:1467–            tual colonoscopy): a multicenter comparison        of colorectal polyps: a metaanalysis. AJR
    1471.                                               with standard colonoscopy for detection of         Am J Roentgenol 2003;181:1593–1598.
                                                        colorectal neoplasia. JAMA 2004;291:1713–
12. Mandel JS, Bond JH, Church TR, et al. Re-           1719.                                          25. Halligan S, Altman DG, Taylor SA, et al. CT
    ducing mortality from colorectal cancer by                                                             colonography in the detection of colorectal
    screening for fecal occult blood. N Engl        19. Sickles EA, Wolverton DE, Dee KE. Perfor-          polyps and cancer: systematic review, meta-
    J Med 1993;328:1365–1371.                           mance parameters for screening and diag-           analysis, and proposed minimum data set for
                                                        nostic mammography: specialist and general         study level reporting. Radiology 2005;237:
13. Halligan S. Subspecialist radiology. Clin Ra-       radiologists. Radiology 2002;224:861– 869.         893–904.
    diol 2002;57:982–983.
                                                    20. Johnson CD, Harmsen WS, Wilson LA, et al.
                                                                                                       26. Rembacken BJ, Fujii T, Cairns A, et al. Flat
14. Alderson PO. A balanced subspecialization           Prospective blinded evaluation of computed
                                                                                                           and depressed colonic neoplasms: a pro-
    strategy for radiology in the new millennium.       tomographic colonography for screen detec-
                                                                                                           spective study of 1000 colonoscopies in the
    AJR Am J Roentgenol 2000;175:7– 8.                  tion of colorectal polyps. Gastroenterology
                                                                                                           UK. Lancet 2000;355:1211–1214.
15. Capp MP. Subspecialization in radiology.                                                           27. Suzuki N, Talbot IC, Saunders BP. The prev-
                                                    21. Culpan DG, Mitchell AJ, Hughes S, Nutman
    AJR Am J Roentgenol 1990;155:451– 454.                                                                 alence of small, flat, colorectal cancers in a
                                                        M, Chapman AH. Double contrast barium
                                                                                                           Western population. Colorectal Dis 2004;6:
16. Halligan S, Marshall M, Taylor SA, et al.           enema sensitivity: a comparison of studies
    Observer variation in the detection of colo-        by radiographers and radiologists. Clin Ra-
    rectal neoplasia on double contrast barium          diol 2002;57:604 – 607.                        28. Fidler JL, Johnson CD, MacCarty RL, Welch
    enema: implications for colorectal cancer       22. Brown L, Desai S. Cost-effectiveness of bar-       TJ, Hara AK, Harmsen WS. Detection of flat
    screening and training. Clin Radiol 2003;58:        ium enemas performed by radiographers.             lesions in the colon with CT colonography.
    948 –954.                                           Clin Radiol 2002;57:129 –131.                      Abdom Imaging 2002;27:292–300.

17. Rex DK, Vining D, Kopecky KK. An initial        23. Pickhardt PJ, Choi JR, Hwang I, et al. Com-    29. Fidler JL, Fletcher JG, Johnson CD, et al.
    experience with screening for colon polyps          puted tomographic virtual colonoscopy to           Understanding interpretative errors in radi-
    using spiral CT with and without CT cologra-        screen for colorectal neoplasia in asymptom-       ologists learning computed tomography
    phy (virtual colonoscopy). Gastrointest En-         atic adults. N Engl J Med 2003;349:2191–           colonography. Acad Radiol 2004;11:750 –
    dosc 1999;50:309 –313.                              2200.                                              756.

Radiology: Volume 242: Number 1—January 2007                                                                                                        161

To top