Data Management Plan

Document Sample
Data Management Plan Powered By Docstoc
					This Data Management Plan outlines how a company will handle its data, both during
the research phase, and after the project is completed. The goal of a data management
plan is to consider the many aspects of data management, metadata generation, data
preservation, and analysis before the project begins. This tool can be used to provide
guidance and to establish policy and protocols regarding data management of a project.
This document should be used by a company when undertaking a project requiring a
data management plan.
Table of Contents
1     Data Management Policy ................................................................................................. 3
    1.1     Research Context .................................................................................................................... 3
    1.2     Key Terminology .................................................................................................................... 3
    1.3     Information Model .................................................................................................................. 4
    1.4     Intellectual Property ................................................................................................................ 4
    1.5     Data Access and Distribution .................................................................................................. 4
    1.6     Referencing and Citation ........................................................................................................ 5
    1.7     Funding Arrangements............................................................................................................ 5
    1.8     Other Responsibilities ............................................................................................................. 5
2     Research and Data Protocols ........................................................................................... 5
    2.1     Data Collection, Deposit and Quality Control ........................................................................ 6
    2.2     Access Protocols ..................................................................................................................... 6
    2.3     Data Maintenance, Persistence and Archival Practice ............................................................ 6
    2.4     Decommissioning/Destruction/Sanitisation ............................................................................ 8
3     Technical Requirements................................................................................................... 8
    3.1     Current Infrastructure and Requirements ................................................................................ 8
    3.2     Future Infrastructure Requirements ........................................................................................ 8
    3.3     Interoperability ........................................................................................................................ 8
    3.4     Data Security........................................................................................................................... 8
    3.5     Availability, Reliability, Support and Response ..................................................................... 9




© Copyright 2011 Docstoc Inc.                                                                                                             2
1 Data Management Policy
1.1 Research Context
        The section shall provide basic information about the research being conducted in the project,
        group or department. It shall indicate the research discipline and briefly outline how the research
        will be conducted and shall not go into details. It shall include the initial planning and decisions
        of data management.

1.2 Key Terminology
        This section shall list and briefly describe key terms or acronyms/abbreviations used throughout
        this document. Consistent use of these terms will lead to a more readable plan. The following
        terms are suggestions that can be removed or modified as per the context.
        Data element type
        Data elements that are collected for the same purpose, under similar methodology, and may have
        the same file format and metadata schema, are considered to have the same “data element type.”
        One such example is “survey data.”
        Data element
        This is an abbreviation of “data element type.” This may also refer to the actual data itself, such
        as raw data and aggregate data (including files and collections) that fall under a specific “data
        element type.” An example is the results of a particular survey, or the aggregate survey data.
        Repository
        A repository structure includes collections with multiple files. Each collection/file is usually
        accompanied by metadata. Repositories are user-focused tools that allow for data entry
        workflow, data management, search and online representation, and referencing.
        Database
        A database structure includes tables of information that have specific attributes. Each table
        consists of multiple elements that conform to the same attribute schema (i.e., each element has the
        same attributes.)
        Content Management System (CMS)
        A CMS is usually an online tool that allows document storage and management as well as
        collaborative Web page editing.
        Metadata
        Information surrounding data that is not usually found within the data itself. Examples include
        author, description, comments, and experimental parameters, references to subsequent updates, or
        predecessor data. In most repositories, files/collections will be accompanied by metadata.
        Metadata is often used for discovery of data by searching metadata, which are human readable
        representations of data (attributes).
        Online representation


© Copyright 2011 Docstoc Inc.                                                                  3
        A description, summary or abstract of a data element that can be obtained online. This is not
        usually the data or object itself, but may include its metadata.
        Online metadata
        An online representation of a data element that is composed of its metadata. A citable identifier
        for data may not resolve to the data itself, but may resolve to online metadata.
        Identifier resolve or resolution
        An identifier (or reference or citation) for an object (data or paper) must aid researchers in finding
        the object itself. Finding an object from its identifier is termed “resolving.” This is usually
        accomplished by passing the identifier to a “resolution service,” which provides instructions on
        how to obtain the object. For example, an identifier that can be resolved by a Web browser will
        automatically take the browser to a resolution service, which will in turn redirect the browser to
        the object or online representation.
        Data Maintenance
        Over time, computer systems and files may need to be moved, software and technologies may
        change, and file formats and metadata schemas may become obsolete. Data maintenance
        includes the actions required to ensure that data can be discovered and used by researchers over
        time.
        Datasets / Collections
        A collection of files to be managed as one object or data element, usually stored in a repository.
        Metadata Schema
        This is the approved format for metadata to be associated with a data element. This includes
        attributes that are to be used, how they are defined, and acceptable values.

1.3 Information Model
        The information model shall list and outline the key data/information elements in context and
        how they relate to one other. What are the types of data, supporting records, source materials,
        notes, communication etc? This is the data that should be kept, registered, findable, able to be
        referenced, and maintained.

1.4 Intellectual Property
        Who owns each of the data element types in the information model? If multiple people or
        organizations collaborate on it, it is best to have this question clarified up front. Does the project
        deal with copyrighted material, belonging to someone else? What licenses are applicable to the
        data?

1.5 Data Access and Distribution
        When and on what basis data is data to be shared and made available for access by other
        researchers? How and when is data to be deposited into a database or repository? Is
        privacy/confidentiality applicable, or is some information sensitive? Are there inappropriate ways
        to use the project data?


© Copyright 2011 Docstoc Inc.                                                                    4
1.6 Referencing and Citation
        Of the data elements, what needs to be cited or have identifiers? Putting in some thought to what
        is available for access, download, or distribution might help. What types of identifiers will be
        used, and how are they generated? How do they recommend or mandate references, citations,
        and attribution of owners?

1.7 Funding Arrangements
        Does this research come under a granting body or funding organization?

1.8 Other Responsibilities
        This is a statement that the researchers are aware of their responsibilities. This is to help
        summarize the responsibilities of new and existing colleagues.                Summarize important
        requirements, and reference the external policies, etc. This section may include institutional
        policies, funding requirements, ethics, consent, licensing, legislation, and reporting requirements.
        One needs to determine any legal obligations imposed on the research project or individual
        researchers and how intellectual property rights are to be managed.

2 Research and Data Protocols
        These are the processes, protocols, or standard operating procedures for the management of data
        in the department, group, or research project.
        For research departments or groups, this section shall include, at minimum, subsections
        addressing the following:
        Departmental/Group Roles and Responsibilities — aside from Head of Department and
        researchers in general, are there other specialized roles with research data responsibilities, such as
        Research Manager, Data Manager, and IT Manager?
        Authorization of Protocols for Research Data Management — this shall cover authorization of
        research project data management plans, data collection, and storage protocols.
        Registering Research Data and Records — descriptions of how a departmental register of
        research data is kept up-to-date. This may cover data held physically within the group or
        department and/or all data owned by researchers (including remote data sources).
        Research Data Maintenance — this section shall include departmental storage procedures,
        protocols for data updates (if any), data replication, and archiving and relocation protocols.
        Destruction — this shall include retention periods and how they are enforced as well as protocols
        for how data are identified for destruction, evaluated and finally destroyed or sanitized. The
        minimum data retention responsibilities and guidelines must be met (research data = 5 years,
        clinical trials data = 15 years or longer, Funding/Patent/Legal requirements).
        For research projects, this section might include the subsections, which focus on a different aspect
        of the lifecycle of your key data elements. How is it recorded, and what has happened to the data
        throughout the project, creation/history/ancestry of each element? What aspects of the research
        method or protocols need to be described in a DMP?


© Copyright 2011 Docstoc Inc.                                                                    5
        For both projects and departments/groups, the Head of Department must authorize procedures for
        storage and destruction of data and records. The Department or Individual has the responsibility
        of maintaining a data register. The register shall include at least who is responsible for data,
        where data is kept, when it is due for destruction/review, and the paper trail or equivalent. This
        section shall outline who is responsible for changing the data register, data maintenance, and
        relocation.

2.1 Data Collection, Deposit and Quality Control
        This section shall keep records about the digital and physical data, i.e., metadata. The rich
        descriptions for collections or files can be added here. Who is responsible for this? What’s in a
        file, where did it come from, how can it be recreated, how is it different, and are there any known
        problems?
        It is also recommended to collect records (metadata) on other entities, e.g., Experimental or
        Software configuration, Laboratories, Apparatus, Data Sources, Methods, Activities, Subprojects,
        Research Subjects.
        The metadata needs to be stored, managed, and secured (e.g., backed up) like the rest of the data.
        The simplest metadata system may consist of well-structured text files stored along with data files
        or collections. If so, this section should detail how these are created and their content.
        If the project has a centralized data register, it might contain information such as key data element
        type, data file/collection ID or code, data location (physical and/or digital), registered date, and
        description of data or other metadata. This might be as simple as a spreadsheet.

2.2 Access Protocols
        Who gains access; which roles? What are the different levels of security for people or for
        different roles? Are there different levels of access for different data and different access
        situations (location, time, emergencies)? Are there situations dictating Restricted access,
        Conditions of Use, or “I Agree” Declarations? How do people reach the point where they can
        access, and when (paper trail, auditable)? What licenses are applicable?
        How should the data be used/not be used or processed? How is it expected to be used in the
        future? How is data tagged/marked/flagged with known problems?

2.3 Data Maintenance, Persistence, and Archival Practice
        How are the data stored/copied/transferred over time/distance/software? How are the Department
        Register, project registers, and records updated? It must be ensured that the data remains usable,
        accessible, and that nothing is lost. Remember the minimum data retention responsibilities
        (research data = 5 years; clinical trials data = 15 years or longer; Funding/Patent/Legal
        requirements).
        Please consider the following issues:
        Regular Data Maintenance




© Copyright 2011 Docstoc Inc.                                                                   6
        Data shall be regularly backed up together with metadata. Identifier systems must also undergo
        the same backup cycle. Snapshots of databases, regular automated tests of data and links are to
        be considered
        Please consider the following issue:
        Data Updates and Format Upgrades
        Data is collected and stored using approved/appropriate file and metadata formats. However,
        data may need to be transformed out of obsolete formats to ensure its viability. Every <x>
        months, the Data Manager will review existing formats and newer or alternate versions. If there
        is sufficient risk of obsolescence, the Data Manager will notify the Project Managers or the Head
        of Department to make a decision. Previous formats of the same data will be preserved with their
        identifiers. Newer file or data set formats will be assigned to new identifiers. Some information
        sources may need to be updated with the new identifiers (or links). The new data element
        metadata should link back to the prior formats—old formats where possible shall link forward to
        new datasets in metadata.
        A similar procedure exists for metadata schema updates. A new identifier must be generated for
        the new record. If the underlying data remains the same, it may be possible for repositories to
        have a new record of the same data or for the updated schema version to be referenced by a
        separate identifier from the older schema version, depending on system limitations.
        Data Retention and Maintenance Periods
        The participating institutions have requirements regarding data retention. Research data and
        records should be maintained for as long as they are of continuing value to project collaborators.
        The minimum retention period for this project data and records shall be <x> months.
        In retaining data, on occasion, repositories and data will need to be moved, either physically or to
        a new network location. Identifiers should be updated to reflect the current location of the data
        without interruption. If the data is changing ownership/stewardship, the metadata must reflect
        where the data came from (provenance) and whether the copy is authoritative. Ideally, all data
        element copies should have the same identifier, but because identifiers are assigned by
        institutions, this may not be possible. The Data Management Plan must be updated to reflect who
        is responsible for the new data ownership and who is responsible for keeping track of the various
        copies.
        Data Retention and Archiving
        It is encouraged that the deposited data and any publications arising from a research project in an
        appropriate subject and/or institutional repository within <x> months of the end of project. The
        future home of the project data will be considered <x> months prior to the end of the project by
        the chief investigators. This must also take into consideration the future home for the identifiers
        and who will be responsible for managing them over time, maintaining their viability. As source
        material and data collection during the project may have community or heritage value, this should
        be addressed with a view toward keeping project data permanently, preferably within a collection.
        A portion of project funding has to be set aside for transition of the project data to archival quality
        research records within the final <x> months of the project. This will include substantial project
        and key data element documentation as well as preservation data formats.


© Copyright 2011 Docstoc Inc.                                                                     7
2.4 Decommissioning/Destruction/Sanitisation
        If, when, and how should the data be destroyed? Who does it, and who needs to approve it?
        How is it documented? It might be appropriate to ensure complete/partial destruction of
        information (sanitization). Paper — should it be shredded? Magnetic Media (audio-video tape,
        data tapes, floppies) — should these be “bulk erased”? Digital Data — should it be properly
        erased, reformatted, or rewritten (deleting may not offer necessary security)?

3 Technical Requirements
        These are the policies and plans for system developers, implementers, and administrators.
        Whatever is built or installed must conform to these requirements.

3.1 Current Infrastructure and Requirements
        Items to be considered: servers, software needed, necessary storage, network bandwidth? Need
        access from where? Need access from what (OS, Software, Google, external)?

3.2 Future Infrastructure Requirements
        Plan ahead if possible. Do not worry if facts and figures are fuzzy; these can be inaccurate. State
        if/when infrastructure review is necessary. Hardware/software cannot be purchased and installed
        at a moment’s notice.
        For departments, groups, large-scale projects, or projects in the planning stages, it might be useful
        to calculate an IT budget including hardware, hosting, software, support, and other infrastructure
        costs.

3.3 Interoperability
        Are there people or communities with whom one will need to interoperate? This is especially
        important for collaborations. How are items tested for interoperability or compliance? Items in
        this section might include, Standards, Schemas, Vocabularies, Community data conventions,
        Services, and Data & Metadata formats (used for transport, exchange, archiving or used by
        programs). When choosing software, does it have to support export/import of certain data
        formats/standards?

3.4 Data Security
        How can one make sure that the data is safe from damage (unintentional and intentional)? What
        are the possible consequences? This section might include information on backups (data,
        software, system), how often they should be conducted (hourly, daily, weekly, monthly), where
        (onsite, offsite, both), disaster recovery, and hardware redundancy (multiple/spare servers, spare
        disk drives, system image backups).
        How it is ensured that the data is not misused (unintentionally or intentionally)? PCI DSS
        (Payment Card Industry Data Security Standard) may be a good beginning. It contains 12
        requirements that should be treated as good practice (treat “card holder data” as “your research
        data.”)




© Copyright 2011 Docstoc Inc.                                                                   8
3.5 Availability, Reliability, Support and Response
        What is an unacceptable data/server outage? Are there peak times during which no changes must
        be made to the system?       What’s the expected response time for help, support, or fixes?




© Copyright 2011 Docstoc Inc.                                                            9

				
DOCUMENT INFO
Shared By:
Tags:
Stats:
views:788
posted:1/4/2012
language:English
pages:10
Description: This Data Management Plan outlines how a company will handle its data, both during the research phase, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins. This tool can be used to provide guidance and to establish policy and protocols regarding data management of a project. This document should be used by a company when undertaking a project requiring a data management plan.