Learning Center
Plans & pricing Sign in
Sign Out

Data Analysis


  • pg 1
									  Data Management:
 Policies and Practices

        June 13, 2006

      Eric M. Cottington, Ph.D.
Associate Vice President for Research
 What is research data?
 Why is research data management important?
 How should research data be managed?
 Who owns the research data?
 How is data management affected by
  collaborative research and peer review issues?
 What resources are available to assist with
  research data management?
                 Case Study
A post-doc suddenly tells his faculty adviser one day that
he will be leaving at the end of the month (when his first
year as a post-doc ends) for various reasons. Two days
before he leaves, he indicates to the faculty adviser that all
of the data that he has collected and analyzed on the NIH
supported project is on the lab computer and the notebooks
are in the lab desk drawer. The faculty member does not
verify this. A few days after the post doc has left, the
faculty adviser checks the computer and the desk and none
of the data are there. The faculty member subsequently
learns that the post-doc has prepared a manuscript for
     What is research data?
 Little consensus on the definition
 Definition varies depending upon the
 Definition varies depending upon the
  regulatory agency or sponsor
 Definitions of Research Data
 FAR:       “recorded information regardless of the
  form or media on which it may be recorded”
 PHS:       none
 EPA:       “any laboratory worksheets,
  memoranda, notes or exact copies thereof, that are
  the result(s) of original observations and activities
  of a study and are necessary for the reconstruction
  and evaluation of the report of that study”
    Definition of Research data
 The definition may be less important than
  the type of data
 Three types of data:
  1. Primary
  2. Compiled
  3. Derived
    The importance of primary
 Various institutions such as Johns Hopkins, MIT,
  Harvard, University of Michigan, Emory
  University, and Case have issued guidelines
  stating that original or primary data should be kept
 Professional organizations such as AFCR, AAMC,
  journal of APHA and APA dictate that primary
  and original data should be retained
Case Definition of Research Data
  Case Policy on the Custody of Research Data

  “Research data are defined as the material,
  originally recorded by or for the investigator,
  commonly accepted in the scientific community
  as necessary to validate research findings”
                Case Study
A graduate student finally finishes his dissertation work
(based on a group project with his adviser) and takes a
position at another institution. The faculty member, while
signing off on the work, does not think the data is worthy
to be published (although a draft manuscript has been
submitted). The graduate student does not take the data
with him, but makes a request later to get a copy of his
data and some other data from the project (including some
software code). The faculty adviser refuses to let the
former graduate student have the data.
     Why is data management
 Integrity of research
 Intellectual property protection
 Ensuring confidentiality
 Compliance with sponsor’s requirements
      Data Management and
        Research Integrity
 Research data, including detailed experimental
  protocols, all primary data, and procedures of
  reduction and analysis are the essential
  components of scientific progress (NIH)
 Integrity requires meticulous attention to the
  acquisition and maintenance of research data
 Questions about the integrity of the research are
  often answered by inspecting and reanalyzing the
  primary data
 If the veracity of published results are challenged,
  data can be subject to subpoena
     Data Management and
      Intellectual Property
 Research data are legal documents for
  purposes of establishing patent rights (NIH)
 Legal challenges to inventorship often
  require producing the original data with
  recorded dates
 Proprietary issues can also drive data access
  and sharing practices
     Data Management and
 Sponsors and/or the university may want
  data to be kept confidential for proprietary
  or security reasons
 Regulations to protect human subjects may
  require data to be kept confidential
 Confidentiality concerns will dictate how
  data is collected, retained and shared
     Data Management and
     Sponsor Requirements
  Requirements can include:
 How long data should be kept
 With whom data can be shared
 Who has rights to the data
    Data Management Practices
 Data Acquisition
 Data Analysis
 Data Sharing
 Data Retention
                    Case Study
A clinical trial is being conducted at a number of sites to test the
efficacy of a new drug. Dr. Porter, the principal investigator, has
written a research protocol for training staff who will be
collecting lab data. However, due to an unusually high rate of
staff attrition, not all staff replacements received adequate
training in data collection and documentation. In addition,
regularly scheduled site visits by investigators were not strictly
adhered to.

Preliminary analyses reveals inconsistencies in the recorded
measurements among staff members both within and between
sites. A possible explanation is that some staff may have
deviated from the required collection procedures (a consequence
of the quality of training). This breakdown in both quality
assurance and quality control compromises a portion of the data.
Dr. Porter must decide how to address the issue for the
remainder of the data collection period as well as how to handle
the data that has been collected up to this point.
           Data Acquisition
 Research data should be carefully recorded in a
  form that will allow continuous access for
  analysis and review (NIH)
 Attention should be given to annotating and
  indexing notebooks and documenting
  computerized information to facilitate review of
  data (NIH)
 The general standard of practice is to provide
  information regarding how the data was generated
  such that another individual can repeat or extend
  the study (NAS)
   OMB guidelines (Oct. 2001)
                           Case Study
A patient compliance intervention has been ongoing for one year in a county
Public Health tuberculosis clinic. The recruitment efforts are focused on a
vulnerable population that are primarily lower socioeconomic, unemployed,
homeless, and suffer for a number of other medical and social problems. The
research protocol requires research staff to conduct a medical chart review and
administer a baseline interview following recruitment. Despite obtaining
informed consent, some of the data collected will be highly sensitive. After
twelve months (data collection period is 24 months), the principal investigator
finds that 20% of the recruits have withdrawn for various reasons (some have
died, moved away, were incarcerated).

While the research protocol is fairly clear on recruitment procedures, there is
minimal information regarding procedures following participant withdrawal.
While the participants' informed consent provided permission to collect and
analyze data, a research assistant questions the principal investigator's decision to
use data from participants who have withdrawn from the study. The issue in
question is whether withdrawing participants are viewed as also withdrawing
their permission to analyze the collected data. A decision has to be made at the
midpoint of data collection. What should the researchers do with this data?
            Data Analysis
 All data, even from observations and
  experiments not directly leading to publication,
  should be treated comparably
 “It is a violation of the most fundamental aspect
  of the scientific research process to set forth
  measurements that have not, in fact, been
  performed (fabrication) or to ignore or change
  relevant data that contradict the reported
  findings (falsification)”. (NAS)
 Models used to analyze data should be made
  available to the public (OMB)
Data Analysis
                    Case Study
At the conclusion of a three year multi-campus study collecting
data on the use of alcohol among undergraduate students, a Dr.
Capeless is offered and accepts a research position at an institution
which had not participated in the study. Dr. Capeless intends to
take the data and continue to run analyses. However, the study's
PI, Dr. Brockner is reluctant to allow this.

Even though the original multi-site agreement had established data
sharing rights and obligations among and between institutions,
there was no stipulation for extending that to individual
researchers who leave for non-participating institutions. Does the
change in position and institution of the researcher call into
question accessibility to the data?

Should the leaving researcher have access to the data? To whom
does the data belong?
             Data Sharing
 General norms of science emphasize the
  principle of openness (NAS)
 Some scientists may share materials as part of a
  collaborative agreement in exchange for co-
  authorship on publications
 Researchers may be willing to exchange
  scientific data of possible economic significance
  without regard for financial or institutional
 Data should be made available to the public
  (does not override compelling interests) (OMB)
     NIH Data Sharing Policy
 Sharing data, particularly unique data, is essential for
  expedited translation of research results into
  knowledge, products, and procedures to improve
  human health.
 The NIH endorses the sharing of final research data to
  serve these and other important scientific goals.
 The NIH expects and supports the timely release and
  sharing of final research data from NIH-supported
  studies for use by other researchers.
 Starting with the October 1, 2003 receipt date,
  investigators submitting an NIH application seeking
  $500,000 or more in direct costs in any single year are
  expected to include a plan for data sharing or state why
  data sharing is not possible.
        NIH Data Sharing Policy
   “Unique data” means data that can not be replicated.
   “Final research data” is recorded factual material
    commonly accepted in the scientific community as
    necessary to validate research findings.
   Does not include lab notebooks, partial datasets,
    preliminary analyses, drafts of scientific papers, plans
    for future research, peer review reports,
    communications with colleagues, or physical objects,
    such as gels or laboratory specimens.
   “Timely release” means no later than the acceptance
    for publication of the main findings from the final
             Data Sharing
 Case Policy on the Custody of Research Data
 “In group research projects, the PI is obligated to
 give co-investigators access to the research data
 or copies thereof for review and/or use in follow-
 on research, with proper acknowledgement”.
“Where applicable, appropriate measures to
 protect confidential information must be taken”.
                Case Study
At the completion of a healthcare study about infants, the
research team decides to archive certain parts of the data
and discard the remaining part of the data that may not be
necessary for replicating the study in the future or
conducting further analyses. One of the project team
members is assigned the task of discarding data by
shredding, deleting computer files, and reformatting the
disks. While exploring the data to be discarded, the team
member realizes that the data decided for discarding
contains information about particular infants that could be
useful to the parents if a cure or treatment was ever
discovered for their ailment covered in the study. But the
project team is responsible only for the particular
healthcare study on infants as directed by an agency.
            Data Retention
 Consensus is that principal investigator or project
  director should be the gatekeeper of all original
  data produced by his or her project
 Opinions vary on how long research data should
  be retained (3 years - indefinitely)
 Retention period should be long enough to allow
  for challenges to research results by the scientific
           Data Retention
Case Policy on the Custody of Research Data
 “Research data must be archived for not less
 than three years after the final close-out or
 publication, whichever occurs last, with
 original data retained whenever possible”.
       Who owns your data?
 “The products of research conducted by a faculty
  member or a researcher in the course of
  employment and developed with institutional
  support…reside with the institutional employer”
  (Estelle Fishbein, JHU)
 “Research data generated under PHS funding
  generally is owned by the grantee institution, not
  the principal investigator or the researcher
  producing the data” (ORI)
      Who owns your data?
 Universities have generally been silent on
  the subject of data ownership
 Many universities have addressed the topic
  indirectly through intellectual property
 Ownership terms can often be stipulated by
  contracts with the sponsor
Standard NIH Subcontract Clause
“The results and data developed by this consortium, if
jointly developed, will be jointly owned by the parties,
and if developed solely by one party, will be owned
solely by that party. Each party grants to the other
party a non-exclusive, royalty-free license to use the
results and data developed solely by each other
provided that each party uses such results and data
only for its own internal research and educational
purposes. The parties agree to negotiate in good faith
in the event that either requests a license for
commercial purposes.”
      Who owns your data?
Faculty generally own copyright to
 “learning material”
Case copyright policy:
“In the absence of a prior agreement between
 the author and the University, it is assumed
 that materials developed through the normal
 activities of faculty…are the property of the
 faculty member”.
                    Case Study
A collaborative research venture involves institutions from the United
States and a foreign country. Although the study will be conducted
abroad, U.S. researchers’ protocol underwent their institutional IRB
review, while there is no such requirement for their foreign colleagues.
As the study is underway, a situation threatens the promise of
confidentiality for participants and thus the integrity of the research
design. These participants are HIV positive patients receiving care, and
who are citizens of the foreign country.

Although the foreign investigators inform their U.S. counterparts that
they are providing the National Health numbers of patients in order for
outside laboratories to be paid, these researchers do not exhibit concern
since they are not obliged to follow the same regulations as U.S.
researchers. Dr. Alexie Filipnova (co-principal investigator) and Dr.
Syd Shingles, the U.S. researchers, discuss what they perceive as a
threat to data integrity with a representative of the foreign research
team, Dr. Raymond Lagarde.
 Data Management Issues in
   Collaborative Science
 Increasing emphasis on collaborative
 Research papers involving authors from
  several institutions has become
 Organizational structure and function of
  “research teams” can be critical
 The complexity can increase as industry
  becomes a partner in the research program
Data Management Issues in
  Collaborative Science
It is important to set the ground rules early
in collaborations with regard to:
– access to data (when, to what level, cost)
– authorship issues
– transfer of data, materials
– confidentiality provisions
– pre-existing obligations to sponsor(s)
    Data Management Issues in
      Collaborative Science
 Tension between traditional academic
  interests in prompt publication and free
  access to research results and commercial
  interests in restricted access to data of
  proprietary value
 Agreements with industry often address
  confidentiality, non-disclosure, ownership
  of data/IP and publication rights
  Data Management Issues in
    Collaborative Science
Standard clause in contracts with industry:
“The University will advise the Principal
  Investigator that if the Principal Investigator
  proposes to publish any results or
  conclusions from the Research Program, he
  or she must allow the Company to review
  any proposed publication thirty days prior
  to submitting it for publication”.
              Case Study
A graduate student alleges that a faculty advisor
has instructed him and his fellow graduate
students to read an NIH grant that the faculty
member received to review and to “identify things
that could benefit their research”. The faculty
advisor states that he was giving the graduate
students an opportunity to assist in the review of a
grant – something they will have to do when they
    Data Management Issues in
          Peer Review
 Peer review can be defined as the expert
  critique of a scientific treatise, a grant
  proposal, a research protocol, or a research
 Peer review is an essential component of the
  conduct of science
 All material under review is privileged
    Data Management Issues
         in Peer Review
 Data and material under review should not
  be used to the benefit of the reviewer unless
  it has previously been made public
 It should not be shared with anyone unless
  necessary to the review process
 Material under review should not be copied
  and retained or used in any manner by the
  reviewer unless specifically permitted
    Data Management Issues
         In Peer Review
 Expert peer reviewers are frequently from a small
  field of individuals who know each other
 Peer reviewers often sign a document that
  obligates them to protect the confidentiality of the
  research being reviewed
 Nevertheless, reviewers are influenced by the
  privileged information to which they have access
           Resources for Data
 Case Policies and Guidelines (e.g., Data Custody, IP,
  Authorship, Human Subjects)
 Case Sponsored Projects Administration and Compliance
 Office of Research Integrity
 Cintas Records Management Agreement
 Cintas Records Management
 Can be used to store all types of “paper-based”
 OSPA will cover one-time charges associated with
  sponsored project records (e.g., initial pick-up,
  delivery, shelving, destruction)
 Department/School responsible for on-going
  storage charges ($ .20 per cubic foot per month)
 Contact Cintas at 440-838-8611 to get started

To top