Docstoc

harrell

Document Sample
harrell Powered By Docstoc
					R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics
                       R for Clinical Trial Reporting: Reproducible
Software
Quality and                 Research, Quality and Validation
Validation
Quality and
Error Sources
What is Called
“Validation” and
What Should it
Be?
                                              Frank E Harrell Jr
Example of a
Comprehensive
Analysis
Validation              Department of Biostatistics, Vanderbilt University School of Medicine
High-Level
Tools for
Reproducible                  useR! 2007 Conference                   10 Aug 2007
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example            Slides and Code at http://biostat.mc.vanderbilt.edu/Rreport
                   Outline

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics     1   Software Quality and Validation
Software
                         Quality and Error Sources
Quality and
Validation
                         What is Called “Validation” and What Should it Be?
Quality and
Error Sources
                         Example of a Comprehensive Analysis Validation
What is Called
“Validation” and
What Should it
Be?
Example of a
Comprehensive
                   2   High-Level Tools for Reproducible Analysis and Reporting
Analysis
Validation               Background
High-Level               Tools
Tools for
Reproducible             Statistical Methods
Analysis and
Reporting                Example
Background
Tools
Statistical
Methods
Example
                   Quality and Error Sources: Example

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software
Quality and            Since around 1967 SAS treats NA as −∞
Validation
Quality and            Key analysis from Duke U. published in NEJM used
Error Sources
What is Called
“Validation” and
                       IF stroke time < follow up time THEN stroke=1;
What Should it
Be?
Example of a
                       Patients having missing stroke time categorized as having
Comprehensive
Analysis               stroke (2◦ endpoint, primary was death)
Validation

High-Level             Never corrected
Tools for
Reproducible
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Some Sources of Errors

R for Clinical
    Trial
 Reporting
                       Original information source
  Vanderbilt
 Biostatistics         Data entry and OCR
Software               Derived variables
Quality and
Validation
Quality and
                       Data management and storage
Error Sources
What is Called
“Validation” and
                       Data import/conversion package
What Should it
Be?
Example of a
                       Data manipulation and analysis file creation
Comprehensive
Analysis
Validation
                       Statistical package/system
High-Level
Tools for
                       User analysis code
Reproducible
Analysis and           Transcription of results into report
Reporting
Background             Error in insertion or typesetting results
Tools
Statistical
Methods                Interpretation of results
Example
                   Most Common Errors Involving Analysts

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software
Quality and
Validation
Quality and            Derived variables
Error Sources
What is Called
“Validation” and       Data manipulation and analysis file creation
What Should it
Be?
Example of a           Errors in user analysis code
Comprehensive
Analysis
Validation

High-Level
Tools for
Reproducible
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   What is “Validation”?

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software               Traditionally it involves validating general statistical
Quality and
Validation             packages through
Quality and
Error Sources               code inspection
What is Called
“Validation” and
What Should it
                            test cases
Be?
Example of a
                            simulation
Comprehensive
Analysis
Validation
                       Such validation cannot envision all possible combinations
High-Level             of options / analyses, or all possible data configurations
Tools for
Reproducible
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   What Should Validation Really Emphasize?

R for Clinical
    Trial
 Reporting

  Vanderbilt       Validation of analyses
 Biostatistics
                       Entire process of analysis file creation, analysis, graphics
Software
Quality and            Resources seldom available for first part
Validation
Quality and
                            Analysis file creation tested interactively, merge datasets
Error Sources
What is Called
                            and derive variables two ways, etc.
“Validation” and
What Should it
Be?                    Validation is not static is per-analysis
Example of a
Comprehensive
Analysis               For pivotal analyses, compare results (point estimates,
Validation

High-Level
                       confidence intervals, P-values) with those from another
Tools for
Reproducible
                       package
Analysis and
Reporting              For checking R calculations, ideal independent and highly
Background
Tools
                       programmable package is probably Stata
Statistical
Methods
Example
                   A Comprehensive Validation

R for Clinical
    Trial
 Reporting

  Vanderbilt           Statistical Center (SC) at Vanderbilt does not use SAS for
 Biostatistics
                       any aspect of data processing or analysis except
Software               sometimes to export data from SAS
Quality and
Validation
Quality and
                       Sponsor uses SAS for all data manipulation, derived
Error Sources
What is Called
                       variables, analysis
“Validation” and
What Should it
Be?                    SC created dummy randomization and created an
Example of a
Comprehensive
Analysis
                       unblinded study report; sent to sponsor
Validation

High-Level
                       Sponsor recreated all pivotal calculations
Tools for
Reproducible           Worked to obtain exact agreement
Analysis and
Reporting              Biggest challenge: getting exactly same study samples
Background
Tools                  (e.g., “efficacy population”)
Statistical
Methods
Example
                   High Level Tools: Purpose

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software               Data Monitoring Committees
Quality and
Validation
Quality and
                       Enhance safety and risk/benefit review by DMC
Error Sources
What is Called
“Validation” and
                       Methods useful for general RCT reports
What Should it
Be?
Example of a
                       Provide efficient and state-of-the-art statistical reporting
Comprehensive
Analysis
Validation
                       High-quality graphics (a la Bill Cleveland) and tables
High-Level
Tools for
                       Hard copy and on-screen review
Reproducible
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Problems to Solve

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software
                       Reproducible research: no transcription of results
Quality and
Validation             Repeated reports, main changes are updates to data
Quality and
Error Sources
What is Called
                       Many response variables and repeated measurements
“Validation” and
What Should it
Be?
                       Non-normality of data (especially clinical chemistry)
Example of a
Comprehensive
Analysis
                       Dropouts and missing data
Validation

High-Level             Graphical methods for judging differences in point
Tools for
Reproducible           estimates
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Tools Needed

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software               Batch mode capability (scripting)
Quality and
Validation             Fine control (graphics, tables, text)
Quality and
Error Sources
What is Called
                       High-level, flexible statistical language
“Validation” and
What Should it              graphics
Be?
Example of a                statistical analysis
Comprehensive
Analysis
Validation
                            easy to implement new functions
High-Level                  functions are data-sensitive (unlike macros)
Tools for
Reproducible
                            advanced tables
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Tools Needed, cont.

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software
Quality and
                       Document processing (typesetting)
Validation
Quality and
                           easy handling of Greek letters, subscripts, superscripts,
Error Sources
What is Called
                           font changes
“Validation” and
What Should it             no cut and paste
Be?
Example of a               easy inclusion of chunks of text, tables, graphics
Comprehensive
Analysis
Validation
                           automatic cross-referencing and hyperlinking
High-Level
                           let software worry about formatting details
Tools for
Reproducible
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Tools Selected and Developed

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics

Software
                       R: open source statistical language
Quality and
Validation
                       A
                       LTEX
Quality and
Error Sources          Hmisc package
What is Called
“Validation” and
What Should it
                           advanced table making
Be?
Example of a
                                                                   A
                           latex functions to convert S objects to LT X code
                                                                     E
Comprehensive
Analysis                   graphics
Validation
                           Lan-DeMets sequential monitoring stopping bands
High-Level
Tools for
Reproducible           Design package for survival curve plotting
Analysis and
Reporting
Background
Tools
Statistical
Methods
Example
                   Tools, cont.

R for Clinical
    Trial
 Reporting

  Vanderbilt
 Biostatistics         New series (rreport package) of higher-level report
                       generation functions
Software
Quality and                completenessReport, accrualReport, baselineReport,
Validation
                           mixedvarReport, repVarclus, complianceReport,
Quality and
Error Sources
                           dropoutReport, aeReport, labReport, publishPdf, mockTable
What is Called
“Validation” and
What Should it             functions
Be?
Example of a               uses data attributes (value levels, variable labels, units)
Comprehensive
Analysis
Validation
                           generates all tables, graphs, figure and table captions
High-Level
                           unified mapping of treatments to line types, with graphical
Tools for                  legends in text captions
Reproducible
Analysis and               generates some sentences
Reporting
Background
                           conditional inclusion of certain graphics and sentences
Tools
Statistical
Methods
Example
                   Tools, cont.

R for Clinical
    Trial
 Reporting

  Vanderbilt
                                                         A
                       All non-graphical output files are LTEX
 Biostatistics
                                     A
                       Generates all LTEX \includegraphics calls
Software
Quality and            Simultaneously generates open and closed meeting
Validation
Quality and
                       components
Error Sources
What is Called
“Validation” and
                       User writes calls to modular functions, study-specific text
What Should it
Be?
Example of a
                       pdf   file created directly by pdflatex
Comprehensive
Analysis
Validation             hyperref   style used for automatic hyperlinking
High-Level
Tools for
                       publishPdf function copies reports to secure web server,
Reproducible
Analysis and
                       creates html index file for them, and e-mails committee
Reporting
Background
                       members and assistants URLs, access IDs, and (sep.
Tools
Statistical
                       e-mail) passwords
Methods
Example
                   Graphical Method for Interpreting Differences

R for Clinical
    Trial
 Reporting
                       Confidence limits have more information that P-values
  Vanderbilt           Graphs showing CLs for multiple treatment groups are
 Biostatistics
                       busy
Software
Quality and            Confidence interval for difference in two parameters not
Validation
Quality and
                       directly obtainable from individual confidence intervals
Error Sources
What is Called
“Validation” and
                       Best to show individual estimates and include a separate
What Should it
Be?                    panel to show difference and its CLs
Example of a
Comprehensive
Analysis
Validation
                       Compromise: draw half-width of CL centered at midpoint
High-Level
                       of two estimates
Tools for
Reproducible
Analysis and
                                          ¯     ¯
                                          Y1 − Y2
Reporting                                         > z
Background
                                             se
Tools
Statistical
                                          ¯     ¯
                                          Y1 − Y2 > z × se
Methods
Example                              Width of CL = 2 × z × se
                   Example

R for Clinical
    Trial
 Reporting
                      Data from an actual clinical trial, contributed from a
  Vanderbilt
 Biostatistics        pharmaceutical company
Software
                      Not included in example report:
Quality and
Validation
                          efficacy analysis
Quality and               study design
Error Sources
What is Called            data monitoring plan
“Validation” and
What Should it
Be?
                          summary of previous closed reports
Example of a
Comprehensive
                          interpretation
Analysis
Validation                protocol changes
High-Level                screening
Tools for
Reproducible              eligibility
Analysis and              waiting time until treatment commencement
Reporting
Background
Tools                 See Ellenberg, Fleming, DeMets: Data Monitoring
Statistical
Methods               Committees in Clinical Trials, 2002
Example
R for Clinical        R for Clinical Trial Reporting: Reproducible Research, Quality and Validation
    Trial                                            Frank E Harrell Jr
 Reporting                                         Vanderbilt University
  Vanderbilt
 Biostatistics     Reliability of analysis software is of paramount importance in clinical and pharmaceutical research. Classical
                   software “validation” has little to do with quality, as most errors are committed when deriving variables,
Software           manipulating and analyzing data. Validation should be directed towards checking the analysis at hand.
Quality and
                   The methods often used for generating statistical reports for clinical trials have a number of drawbacks. The
Validation
                   most commonly used statistical software packages require users to specify somewhat tedious low-level
Quality and        commands, and the resulting tabular and graphical output are not optimal. Too often, statisticians still
Error Sources
What is Called     overuse tabular reports even though most consumers of the reports would rather review graphics. And in an
“Validation” and   era in which reproducible research is starting to become popular, most statisticians still engage in some level
What Should it     of manual intervention, such as insertion of calculated values in sentences. These issues are particularly
Be?
                   important in reporting for data monitoring committees.
Example of a
Comprehensive
Analysis                                                                                             A
                   This talk will describe an approach that uses free open-source software (R and L TEX) to produce advanced
Validation         tables and graphics using a very high-level language. The component tables and graphics are automatically
High-Level
                                               A
                   assembled and indexed by L TEX, resulting in an Adobe Acrobat PDF file with hyperlinks for easy navigation.
Tools for          Example open- and closed-session DMC reports will be shown, which includes tables and graphics describing
Reproducible       data completeness, subject accrual, baseline variables, compliance, dropouts, adverse events, and lab data.
Analysis and       Some issues in statistical graphics will be discussed, such as a way to depict confidence limits for differences
Reporting          between treatments in graphs that show individual treatment responses.
Background
Tools
Statistical
Methods
Example

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:10/28/2011
language:English
pages:18
xiaohuicaicai xiaohuicaicai
About