Automation of Patient Narratives
Using SAS ® and Analysis Datasets
David C. Izard, GlaxoSmithKline, Collegeville, PA
Eric M. Simms, GlaxoSmithKline, Collegeville, PA
FDA and ICH guidance documents call for the submission of subjects' study experience in narrative form for those
subjects who meet specific criteria. This narrative text is typically developed manually utilizing word processing
technology and information sourcing from subject line listings or individual case report form tabulations.
Historically this is a manual, labor intensive process requiring significant human resource and lengthy timelines.
This paper will explore the use of SAS to identify subjects requiring narratives, and automate the development of
subject narrative text and make it available to clinical colleagues in a readily usable form, thus reducing the effort
and time required to produce this regulatory requirement.
The International Conference on Harmonization (ICH), in its Technical Requirements for Registration of
Pharmaceuticals for Human Use Guideline document "Structure and Content of Clinical Study Reports" (better
known as the "E3" guideline), specifies the conditions under which subject narratives must be provided as part of
a clinical study report in Section 12.3.2 (1):
"There should be brief narratives describing each death, each other serious adverse event, and
those of the other significant adverse events that are judged to be of special interest because of
The guidance goes on to specify specific content of each narrative, including but not limited to the following
information about the event:
• investigator and sponsor opinion on nature, causality and intensity of the event
• clinical experience leading up to the event
• timing of study medication in relationship to event onset
• relevant laboratory measurements
• other relevant safety measurements
as well as other relevant subject information:
• subject identifier
• subject age and sex
• general condition of the subject
• relevant prior / concomitant medical conditions
• relevant prior / concomitant medications with dosage details
• disease / medical condition being treated, including duration of condition / current experience
Prior to ICH specification, the US Food and Drug Administration (FDA) specified in its 1988 guidance (2) that
narratives should be produced for similar situations:
"For all deaths and all potentially serious adverse experiences, there should be a brief narrative
describing each event and assessing the likelihood that the drug was responsible. [Section
III.10.b.6] … Although minor transient changes may not warrant detailed discussion, marked
changes (defined by the applicant) … that do not lead to discontinuation should also be
discussed in narrative form. [Section III.10.c]"
This guidance does not go into quite as much detail as the ICH guidance but does indeed provide a basis for the
inclusion of subject narratives in a clinical study report and in regulatory applications seeking the approval of a
medication, device or procedure for use within it's domain.
Typically Statistics and Programming departments are left out of the subject narrative production process and
gratefully so. Subject narrative implementation is historically a manual, labor intensive process that is frequently
left until the last minute, contracted out to a CRO or both. Typical sources of information include subject line
listings, individual case report form tabulations, analysis datasets available through a rudimentary query or
reporting tool such as SAS/JMP or the SAS System Viewer and, in the case of narratives to support a
comprehensive regulatory submission, the individual study reports contained in the submission.
A second consideration of what goes into a subject narrative, though, is all it takes to realize that a combination of
good problem definition, programming capability and determination can reduce the manual effort significantly
when it comes to producing subject narratives. What spurs on the programming capability to make this happen?
Resources (or a lack thereof), time (or a lack thereof) and urgency (or an abundance thereof).
Our work group was supporting an application to the FDA for a new chemical entity. Requirements from the FDA
for our specific application were spelled out during a Pre-NDA meeting requiring significant review of clinical
findings due to issues uncovered for this entire class of drugs as well as pre-clinical findings for our specific drug.
This included additional subject narratives for all new onset cardiovascular abnormalities of any kind as well as all
adverse experiences from one specific body system regardless of treatment period. This was compounded by a
large subject population (5000 +/- subjects), an indication for a chronic condition with typically co-morbidity
characteristics from the cardiovascular body system, and a broad definition of what was considered "new onset".
This resulted in the need to write an estimated 2000 to 3000 subject narratives in a five month time period while
simultaneously reporting out two pivotal studies and developing the submission deliverables.
Consulting with our clinical colleagues, Statistics & Programming reviewed narratives that had been written for
previous clinical study reports for this compound and discussed the general structure of the desired narratives for
this submission. Discussion focused on the wide variety of previous narrative formats, but the observation that
each narrative contained the same demographic characteristics event descriptors along with specific language for
each type of narrative being written. The result was agreement on a skeleton narrative template, the key
characteristic being that the demographic and presenting conditions section would be standardized while the
clinical account (i.e., the reason the narrative was being written in the first place) would remain open and free
This standard yet flexible narrative format allowed Statistics & Programming to go one step further, automating
the generation of skeleton of the subject narrative. All of the information contained in the narrative was also
contained in the analysis datasets derived from the CRF data captured by our clinical database management
system. The task remaining was to devise a method of accurately capturing the relevant information from the
analysis datasets, converting it to narrative text and making it available in a format that could be readily utilized by
our clinical colleagues to complete the section of the narrative requiring medical interpretation and expertise.
FUNCTIONAL & BUSINESS REQUIREMENTS
Aside from the requirements laid out as part of developing the business case in the previous section, the three
key business and functional requirements of this application were
• to deliver a comprehensive solution that could be passed off to any capable group or organization for
narrative finalization regardless of company affiliation
• to guarantee that the suite of tools that comprised this solution was internally consistent
• to guarantee that the suite of tools that comprised this solution was consistent with the tables, figures, listings
and electronic submission deliverables provided to write the clinical study report and include with the
In order to satisfy these requirements, as a project team we determined that we would programmatically identify
who would need a narrative and why based on previously determined criteria developed jointly by Clinical and
Statistics & Programming. From this list of subjects and associated narrative reasons we would produce:
• a "report", in our case the output of a SAS program produced and delivered as a PDF file attached to an
electronic mail message, that detailed which subjects required a narrative and why
• case report form tabulations for the subjects who required a narrative
• a "file", in our case an ASCII file reformatted as RTF via Microsoft Word, containing narrative text
representing the portion of the individual subject narrative that could be standardized
FLOW CHART OF SUBJECT NARRATIVE PROCESS
Identify Subjects “Who Needs a
Narratives and Why”
Produce Case Case Report
Report Form Form
Start File File
A key point to make is that these case report form tabulations were produced solely to support the narrative
writing process, they were not developed for all subjects, nor were they submitted to regulatory authorities. All
submissions to US regulatory authorities from GlaxoSmithKline are fully electronic at this point; SAS transport
files of subject data satisfy the need for reviewing subject level data, CRTs are not required.
Using a single source for all deliverables from Statistics & Programming is the key to ensuring that deliverables
are consistent with each other and other deliverables for a given reporting effort. This is preserved in this effort.
The same analysis datasets used to produce the tables, figures and listings as well as electronic submission
deliverables were used for this effort. A dataset of subjects who required a narrative was produced and used as
the driver to both the CRT production application and the narrative start file. To ensure that the CRTs are
consistent with the study tables, figures and listings, they are implemented in a modular fashion using the same
reporting software utilized by the study reporting software as described in (3).
The standard portion of a subject narrative is a collection of sentences grouped into paragraphs. These
sentences contain static text and text based on contents of the analysis datasets. A typical generic 1st paragraph
for a subject narrative could look like:
"Subject 12345 was a 47 year old Caucasian female from the United States. This subject entered
the study on January 1, 2000 and was first exposed to Treatment X during the double-blind phase of
the study on January 16, 2000. Medications taken by the subject in the four weeks leading up to
study entry include Tylox (codeine phosphate, acetaminophen) and Advil (ibuprofen)."
Restating this narrative but replacing specific subject text with the source of that information in your analysis
datasets, the narrative would look like:
"Subject [dem.subjid] was [a|an] [dem.age] [dem.age_unit] [dem.race_txt] [dem.sex_text] from
[[the] dem.country]. This subject entered the study on [min(visit.dat)] and was first exposed to
[trx.trx where trx.session = "DOUBLE BLIND"] during the double-blind phase of the study on
[min(visit.dat) where visit.session = "DOUBLE BLIND"]. Medications taken by the subject in the
four weeks leading up to study entry include [pcmed.verbatim (pcmed.generic (series))]."
Recognizing the role the analysis dataset contents play in producing this text reduces the task of generating the
narratives to the following programming design expressed as pseudo code:
%** start of program ;
%** ROUTINE: GET DATA FOR NARRATIVES
%**** GET DEMOGRAPHIC DATA (SUBJID, AGE, AGE_UNIT, RACE_TXT, COUNTRY)
%**** GET TREATMENT DATA (one record be SESSION, including TRX and START DATE)
%**** GET PRIOR MEDS (one record per TRADE NAME / GENERIC)
%**** GET [additional analysis dataset data]
%** END ROUTINE: GET DATA FOR NARRATIVES
%** ROUTINE: BUILD SENTENCES
%**** BUILD SENTENCE 1
%**** BUILD SENTENCE 2
%**** BUILD SENTENTCE …
%**** BUILD SENTENCE N
%** END ROUTINE: BUILD SENTENCES
%** ROUTINE: WRITE SENTENCES TO FILE
%**** WRITE SENTENCE 1
%**** WRITE SENTENCE 2
%**** WRITE SENTENCE …
%**** WRITE SENTENCE N
%** END ROUTINE: WRITE SENTENCES TO FILE
%* MAIN PART OF PROGRAM
* GET DATA FOR NARRATIVES
* BUILD SENTENCES
* WRITE SENTENCES TO FILE
%* END MAIN PART OF PROGRAM
%** end of program ;
The design and implementation of modules to prepare the subject narrative text are largely an exercise in text
generation. A notable design feature is to separate the generation of a sentence and the recording of that
sentence in your final narrative file; you may have a sentence you always produce, but you may place it in
different locations for each implementation, separating these tasks allows you to do that. A key issue to consider
is how you handle items that could have zero, one or many contributing data points based on the data contained
in the analysis datasets, such as adverse experiences and concomitant medications. Another issue to consider is
how to handle obvious errors with the analysis datasets and what role this process plays in providing feedback to
the work group responsible for capturing the transaction level CRF data and making it available electronically.
IMPACT OF PROJECT
The key value driver for any software project is how expending the effort delivering this product will benefit
everyone. As mentioned in the "Business Case" section, there were a significant amount of narratives to write in
a very short amount of time. All CROs approached to bid on the contract to produce subject narratives for this
project had serious concerns about the overall timeline; the winning bidder agreed to the contract with
contingencies on having significant resource from the sponsor company available at many points during the
project. When this process and the associated deliverables were presented to the CRO to support their efforts,
they reduced their bid by $100.00 per subject narrative, declared the timelines adequate and removed their
sponsor company resource contingencies beyond routine administrative and periodic review of progress. The
hard dollar savings to the Sponsor Company amounted to approximately $265,000.00 (US); the soft savings,
timelines were met and quality of the final product met or exceeded all expectations.
CONSIDERATIONS FOR FUTURE WORK
The two key aspects of improving this application for future use and applicability would be to make the application
fully modular and to effectively utilize SAS ODS to produce all deliverables, particularly the final narrative start file.
Currently the application was written as a number of internal macros, but they were highly linked and written for a
single purpose, to deliver subject narratives for a specific regulatory submission. With a bit of effort this
application could easily be written as a driver program using external macros, either available standard macros at
the company or department level or work group level macros written for a specific purpose.
This application was developed under SAS Version 8.1e; delivering objects as PDF and RTF were still a
challenge. We chose to produce ASCII output and take advantage of desktop software (MS Word, Adobe
Acrobat) to convert these items to RTF and PDF files. With SAS Version 8.2 now available and Version 9.0 on
the way, seamless generation of RTF and PDF output is no longer the challenge it used to be; this could easily be
incorporated into the application.
Subject narratives are typically handled outside of the Statistics & Programming arena by clinical colleagues using
their expertise, available data resources and a lot of effort. This paper has shown that the Statistics &
Programming capability can reduce the effort involved in producing these items by utilizing available analysis
datasets and a bit of expertise to deliver items that simplify and streamline the effort clinical colleagues typically
pour into this effort. Not only will it reduce the actual effort, but automation of this process will enforce the use of
a common data source for the contents of this deliverable, raising the overall quality of the regulatory submission.
(1) International Conference on Harmonization of Technical Requirements for Registration of Pharmaceutical
Products for Human Use, "Structure and Content of Clinical Study Reports: E3", 30 November 1995
(2) United States Food and Drug Administration, "Guideline for the Format and Content of the Clinical and
Statistical Sections of an Application", July 1988
(3) Izard, David C. and Yeh, Shi-Tao, "An Automation System for Generating Case Report Form Tabulations",
2002 Pharmaceutical SAS Users Group, Inc. Conference Proceedings, pgs. 33-38, May 2002
David C. Izard
GlaxoSmithKline, Mail Stop UP4310
1250 South Collegeville Road
Collegeville, PA 19426
Eric M. Simms
GlaxoSmithKline, Mail Stop UP4335
1250 South Collegeville Road
Collegeville, PA 19426
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.