Informatics for Clinical and Translational Research
Daniel R. Masys, MD Professor and Chair Department of Biomedical Informatics Professor of Medicine Vanderbilt University School of Medicine
Session Outline
• Goals for this session • Size and scope of clinical research ie., why this topic is important • Role of informatics in clinical and translational research • Informatics-related topics in the design and execution of clinical research studies
– Regulatory context: Good Clinical Practice standards and 21 CFR 11 – Design of forms and databases – Data security and HIPAA compliance – Specialized technologies
• Putting it all together
Premise and Goals for this session
• This week is a „teach the teachers‟ session for change agents • You might want (or be asked) to teach some of the material covered here • The topics touched upon during this session are taught in a 20 credit hour Masters course, and all PowerPoint, readings, quizzes and course outlines are available for your use. • This session is a sampler of topics from the course
Full Course website: http://dbmichair.mc.vanderbilt.edu/courses/crestdata/
Educational objectives Session outlines Readings PowerPoint flies
Widely used text: Second Edition 2007
Clinical Research as an activity
• Fundamental to translation of basic research to medically useful interventions • Big business: est. $95 B spent annually in U.S. in biomedical research/drug and device testing • Academic centers lag behind commercial clinical trials organizations in knowledge and skills related to efficient and high quality clinical research.
– Academic center market share of clinical trials now est. at 20%, was 80+% in 1990 – Generally inferior performance with respect to error rates, missing data, timeliness of submission
National Context
• NIH Roadmap and Clinical & Translational Science Awards (CTSA) envision networks of clinical researchers • FDA rules now being applied to NIH sponsored research • Clinical research training has historically included biostatistics and trial design, but little regarding informatics and data management
Barriers to Research
• Administrative bottlenecks • Poor integration of translational resources • Delay in the completion of clinical studies • Difficulties in human subject recruitment • Little investment in methodologic research • Insufficient bi-directional information flow • Increasingly complex resources needed • Inadequate models of human disease • Reduced financial margins • Difficulty recruiting, training, mentoring scientists
CTSA – A roadmap initiative
“It is the responsibility of those of us involved in today’s biomedical research enterprise to translate the remarkable scientific innovations we are witnessing into health gains for the nation.”
NIH CTSAs: Home for Clinical and Translational Science
Clinical Research Ethics
Trial Design
NIH
Biomedical Informatics
CTSA HOME
Advanced Degree-Granting Programs Participant & Community Involvement
Industry
Clinical Resources Biostatistics
Regulatory Support
Other Institutions
Re-engineering Clinical Research
Interdisciplinary Research Innovator Award Public-Private Partnerships
Bench
Building Blocks and Pathways Molecular Libraries Bioinformatics Computational Biology Nanomedicine
Bedside
Practice
Translational Research Initiatives
Integrated Research Networks Clinical Research Informatics NIH Clinical Research Associates Clinical outcomes Harmonization Training
Role of Informatics in Clinical and Translational Research
• Structured observation and structured record keeping are the essence of science • Primary differentiation between routine clinical care and research is how processes are controlled (ie., protocoldriven) and information is managed to make it useful for analysis
“Classical” Data Management Flow for Clinical Research
Scientific Hypotheses Specific Data Elements Required to Test Hypotheses Data Acquisition Instruments (forms) Computer Data Model and Tool Selection to Support Model and output to Analytical Software
People and Process Development (Who does What, When and Where)
Documentation: Standard Operating Policies & Procedures
Research Data Management Goals
• Create processes and systems that result in research data that is:
– Accurate – Complete – Timely – Verifiable – Secure – Available for analysis
Regulatory Context: Good Clinical Practice and 21 CFR 11 Electronic Records & Signatures
Regulatory Context: Good Clinical Practice standards
• General and uniform set of principles for conducting clinical research • Two themes
– Respecting rights of participants – Conducting research so that data is accurate and verifiable
• Required by FDA but a good (higher) standard for NIH and other sponsored research
http://www.fda.gov/oc/gcp/
GCP Standards address...
• Responsibilities of participating sites • Responsibilities of coordinating centers for multisite trials • Quality Assurance methods for data • Audits • Reporting to regulatory agencies
GCP Principles of Data Management
• All data should be independently verifiable
– Normally done by comparison with locally kept medical records in interventional trials
• Structured approach to record keeping
– Physical structure: tabbed participant folders with dividers for different classes of information – Logical structure: database designs and tracking systems
GCP Principles of Data Management
• Research records are separately maintained from healthcare-related records • Source document = place where observation first recorded • Source document verification: comparison of Case Report Forms (CRFs) with source documents
– corollary: CRFs are not usually considered source documents
GCP standards example: Paper Case Report Forms
• Follow instructions • Write legibly • Originals normally go to Coordinating Center; copies local • No marginalia (literally outside the box) • Forms designed so that all variables have a current value (may be code for Pending, Missing, Missed) • Correct units of measurement (best included with value as separate field)
• Proper methods of correction
– – – –
GCP standards for Case Report Forms, cont‟d
Line through incorrect value (value still visible) Correct value added Correction initialed White-out is always red in an auditor‟s eyes - no correction fluids or erasures
• Check forms for completeness prior to submission • Double check and verify ID info on CRF • Submit on time
21 CFR 11: Electronic Records & Signatures
• Applies (only) to data submitted to FDA in support of drug & device applications • Address issues related to paperless data management systems where there is no source document for verification • Subpart C relates to digital signatures • Full compliance requires formal software validation testing and certification • To date, has paradoxically impeded rather than advanced use of electronic research data management systems
Forms Design
General Forms Design Principles
• Have definitions of all data to be collected in hand before starting the study
– Avoids unnecesary forms revisions that often confuse Clinical Research Associates (CRA‟s), participants, and creates statistical complexities – Avoids „fishing expedition‟ approach to iterative protocol modification
Forms Design
• Think of forms as chronologically ordered data containers • Define container size and content on the basis of who, when, where:
– Who fills out the form – When the data becomes available in the course of the study – Where the data will be collected
General Forms Principles, cont‟d
• Design forms for ease of use
– this can be difficult since same form often serves different user groups
• Strive to be clear and unambiguous
– only achievable when others review the forms
• Create forms that make it more likely to do the right thing than to make mistakes
– amount of training required is an indicator
• Get draft forms early enough in the planning of the trial that they can be pilot tested prior to study launch
Forms Design principles, cont‟d
• Format should anticipate three purposes
– Data acquisition (recording on paper forms or entering data into electronic forms) – Data entry from paper to computer – Data retrieval (inspection, QA, inference)
Forms Design principles, cont‟d
• Top three considerations: – Consistency, consistency, consistency • Includes consistency of overall layout, and consistency of coding • Use standardized form header for „referential integrity‟ items: – Study ID – Participant ID – Event ID and/or Date – Site ID • Consider systems approach to ID data: barcoded labels or barcoded unique forms
Web browser (“thin client”) electronic forms for data entry and retrieval
• Strengths
– Deploy to any location on the Internet – Platform independent (sort of… be careful and test all software on all potential clients) – No software to install or license on user‟s machines
• Weaknesses
– – – – – Less efficient (compact interface) Fewer controls available Limited repertoire of „widgets‟ (buttons, lists, etc.) Slower Dependent upon Internet connectivity
Specialized Software
Specialized Software for Clinical Trials
• • • • Registration Randomization Participant tracking Site communications
– Transaction or batch upload of local data to coordinating center – Websites for protocols, forms, administrative info
Specialized Software for Clinical Trials, cont‟d
• Performance measures
– – – – Site actual vs. projected accrual Data completeness Data accuracy Data timeliness
• Usually displayed as trends over time • Performance measures should include reference values for performance at all sites combined
Data Acquisition Technologies
Keyboard Data Entry
• Average keystroke error rates will be 0.1% to 1%, depending upon data type • Improve accuracy over baseline by:
– Double entry and file comparison („gold standard‟ but inefficient and expensive) – Special technologies for referential integrity items (e.g., barcode visit and participant ID) – Event-driven auditing and source document verification of scientifically important variables
Data acquisition Technologies:
Data acquisition Technologies:
Double keying
• Common “best practice”: forms entered by two different data entry operators • Computer generates difference (diff) file • Third person (usually data manager with clinical expertise) reviews and resolves differences • Increases personnel costs by factor of 2 - 2.5 over single entry plus sample-based auditing
Data acquisition Technologies:
Barcoding
• Applications
– Referential integrity items: identifiers for participant, study, site, protocol, event/visit – Physical object tracking: e.g., tissue specimens, freezer inventory management systems
• System-generated barcode labels
– Various barcode standards: 3-of-9 generally used for scientific applications – Produced by TrueType fonts or dedicated barcode printers
Data acquisition Technologies:
Barcoding, cont‟d
• Barcode readers
– “Keyboard wedge” - wand or handheld scanner plugged between keyboard and computer – Self-contained scanners with infrared or USB bulk data upload (derived from warehouse inventory systems)
Barcodes and reading devices
Workstation accessories
Code 39 (3-of-9) with and without readable text Note: without text is not a security measure and increases errors Self-contained Reader unit
Data acquisition Technologies:
Mark-sensing Technologies • Example: Scantron (www.scantronforms.com) • Strengths
– Mature technology – Efficient for re-usable form scanning
• Weaknesses
– – – – Low information density: poor for most biomedical uses Susceptible to “frame shift” errors by users Requires forms printing Cost effective at level of ~ 100K forms
Mark sensing technologies
Data acquisition Technologies:
POF: Plain Old Fax
• Design issues
– Include signature or initials on faxable forms
• Strengths
– Widely used surrogate for paper
• Weaknesses
– Not considered a source document – Legibility – Requires additional effort to enter data into computable form
Data acquisition Technologies:
Fax + Optical Character Recognition • Example: Teleform (www.cardiff.com) • Strengths
– Can substitute for data entry staff – Includes design, recognition, and verification functionality – 90+% recognition accuracy depending upon data type
• Weaknesses
– Error rates equivalent to single entry, higher than double entry – Cost vs. person hours becomes favorable only at large numbers of forms (50-100K)
Data acquisition Technologies:
Direct Computer Entry by Participants
• Can use thin client (HTML forms) or „thick client‟ ie., workstation forms (e.g., MS Access)
• Strengths
– If well designed, eliminates data entry step – Can add multimedia explanations and tutorials – Can be more enjoyable for study participants than paper forms
• Weaknesses
– Requires basic computer skills (mouse +/- keyboard) – Requires literacy skills – Requires staff assistance and verification
Data acquisition Technologies:
Computer to Computer Messaging
• Example: import lab results from lab system directly into research database for study participants. • Strengths – If well designed, eliminates data entry step – Timeliness – Accuracy • Weaknesses – Requires specialized computer programming expertise – Requires standards for representing clinical data (most widely used = HL-7) – Requires willingness of systems managers at source of data (e.g., medical center Information Services) to allow data connections
Data acquisition Technologies:
PDA‟s
• Example: Pendragon software • Strengths – Portable, relatively low cost – Nonprogrammer interfaces to MSAccess • Weaknesses – Limited screen size and navigation speed – Not suitable for text entry – Security: lost or stolen PDA
Data Archiving and Database design
Commonly used data archiving and analysis software
• Single investigator, simple trial:
– Spreadsheet (MS Excel)
• Beware using spreadsheets for HIPAA-regulated data – no audit trail capability
– Workgroup-capable database management software (MS Access, Filemaker Pro, 4th Dimension, MS Visual FoxPro)
• Data Center, multiple studies
– Enterprise relational database system
• Sybase, Oracle, MS SQL Server
– Dedicated statistical analysis packages
• SAS, BMDP, SPSS, S Plus, JMP
Commonly used data archiving and analysis software, cont‟d
• Pharmaceutical companies - multiple drugs, multiple sites, multiple studies, FDA audits
– Dedicated clinical trials software (e.g., BBN ClinTrials, Oracle Clinical)
Sample data model for one-time administration of a survey
one
Study_Data *ParticipantID Date Answer1 Answer2 Answer3 Answer4 Last_update Update_by
one
Person (Participant) * ParticipantID [primary key] Last_name First_name Address City State Zip Phone Fax E-mail MRN Birthdate SSN Gender Last_update Update_by
Best practices: store Person table on removable media with physical security OR store Person encrypted by private key
Simple clinical study with a variable number of identical repeat visits
many
Study_Data
one
Person (Participant) * ParticipantID Last_name First_name Address City State Zip Phone Fax E-mail MRN Birthdate SSN Gender Last_update Update_by
*ParticipantID VisitID VisitDate BPsystolic BPdiastolic Weight Sodium Potassium Chloride Bicarb BUN Creatinine Last_update Update_by
Note: In best pactice, primary key of Study_Data is the combination of Participant ID and the study visit, which defines a unique protocol event. VisitDate is the calendar date that event occurs.
Clinical study with a baseline evaluation followed by variable number of identical repeat visits
Baseline
*ParticipantID VisitDate DataItem1 DataItem2 DataItem3 Last_update Update_by
one
Person (Participant)
one
many
Follow_Up *ParticipantID VisitID VisitDate BPsystolic BPdiastolic Weight Sodium Potassium Chloride Bicarb BUN Creatinine Last_update Update_by
* ParticipantID Last_name First_name Address City State Zip Phone Fax E-mail MRN Birthdate SSN Gender Last_update Update_by
Data Security
Information Security Elements
• Availability - when and where needed • Authentication -a person or system is who they purport to be (preceded by Identification) • Access Control - only authorized persons, for authorized uses • Confidentiality - no unauthorized information disclosure • Integrity - Information content not alterable except under authorized circumstances • Attribution/non-repudiation - actions taken are reliably traceable
Research Records Security, General Principles
• Physical Security
– Locked file storage for physical files
• Programmable locks best • Change combination on a regular basis (common practice: twice a year)
– Person-identifiable data
• Keep separate from other study data • Consider additional protections such as two person access requirements
Research Records Security, cont‟d
• Electronic Security
– – – – – No workstations viewable from public areas Password-protected login Screensaver timeouts Separate login and password for database access Store demographics data separately and encrypted if feasible – Regular backups and offsite backup storage
Research Records Security, cont‟d
• Network Security
– Safest but least useful: disconnect workstations with research data from network – Keep all workstations and servers patched with latest security updates – Run antivirus software on all machines – Consider firewall computer to protect Internet access point, and/or workstation firewall software
Security Rule: Basic Concepts
• Applies security principles well established in other industries • Like Privacy Rule, affects Covered Entities that create, store, use or disclose Protected Health Information (PHI) • Unlike the Privacy Rule, affects only PHI in electronic format (not oral or paper-based) • Like the Privacy Rule, written for health care; research not the principal focus • Scalable: burden relative to size and complexity of organization
Three Categories of Standards
• Administrative safeguards
– Policies and procedures to prevent, detect, contain and correct information security violations
• Physical Safeguards
– IT equipment and media protections
• Technical Safeguards
– Controls (mostly software) for access, information integrity, audit trails
Administrative Safeguards
• Required
1. 2. 3. 4. 5. 6. 7. 8. 9. Risk Analysis Risk Management Plan Sanctions Policy Information System Activity Review (audits) Security Incident Response & Reporting Data Backup Plan Disaster Recovery Plan Emergency Mode Operations Periodic Evaluations of Standards Compliance
Physical Safeguards
• Required
1. Workstation Use Analysis 2. Workstation Security 3. Disposal of media
– deletion of PHI prior to disposal, or – Secure disposal so data nonrecoverable
4. Media Reuse
– Deletion of PHI prior to re-use
Technical Safeguards
• Required
1. Unique User Identification
– No shared logins
2. Emergency access procedures 3. Audit controls
– Logs of who created, edited or viewed PHI
4. Person and/or Entity Authentication
– No systems without access control
If a research project maintains e-PHI…
• Responsible group must designate a Security Officer who has responsibility for implementing HIPAA-compliant policies and procedures for research use of e-PHI • Must do and document a risk analysis • Must create risk management plan based on the risk analysis • Must create and keep current a HIPAA Security Rule compliance document that includes description of how 17 Required elements are met, and decisions regarding Addressable elements
Widespread current research practices that don‟t meet the standard
• Research workgroups that create or use PHI in electronic format but have no written security procedures, policies or training • Workstations with no login security (e.g., Windows98) • Data management and analysis applications used to store PHI that have no ability to generate audit trails
– E.g., Excel spreadsheets with PHI in them
Using the Internet for Clinical Research
Internet Functionality for Clinical Research
• E-mail
– Avoid putting HIPAA PHI in e-mail
• Study participant recruitment
Internet Functionality for Clinical Research, cont‟d
• E-mail
– Avoid putting PHI in e-mail
• Study participant recruitment • Private FTP site as „drop box‟ for study related file communications
– encrypt files if they contain PHI
• Data submission and reporting • Multi-site coordination and administration
Approved Internet Technologies relevant to Clinical Research
Function
Advertise services Data submission Data Reporting1
Trial Administration1
1containing
Std Web
128 bit SSL Secure Web
smtp E-mail
File Transfer Protocol2
n.a.
1
person-identifiable i.e., HIPAA PHI
2
must be encrypted to HCFA/CMS std
Sample Project administration website for multi-center study
Putting it All Together: Research Data Management
• An artful selection of physical and electronic management methods
– Signed informed consent documents – Paper forms – Regulatory and project management binders – Data models and databases – Data acquisition and display technologies – Communications technologies for project management as well as data management
Attributes of Successful Data Management
• Attention to detail • Explicit structure and process • Robust designs
– Anticipate failures, lapses and mistakes – Design systems that identify and correct them
• Mechanisms for verification • Well documented
Lessons Learned about Data Management in Clinical and Translational Research
• Effective data management is a continuous process, not a point in time analysis • Historically, health care organizations and providers have invested suboptimally in information systems and this provides an uneven infrastructure for clinical research • In health care organizations, data management and information systems implementation is “20% technology and 80% sociology” (R. Gardner) – plan accordingly
Research Data Trends
• Data Tsunamis
– Genome, proteome, regulome, new forms of imaging – High dimensionality: variables >> subjects
Premise and Goals for this session
• This week is a „teach the teachers‟ session for change agents • You might want (or be asked) to teach some of the material covered here • The topics touched upon during this session are taught in a 20 credit hour Masters course, and all PowerPoint, readings, quizzes and course outlines are available for your use. • This session is a sampler of topics from the course
Full Course website: http://dbmichair.mc.vanderbilt.edu/courses/crestdata/
Educational objectives Session outlines Readings PowerPoint flies
Please submit online evaluations
Questions?