Embed
Email

DC3A Melbourne Neuropsychiatry Centre _MNC_ Bioinformatics ...

Document Sample

Shared by: gegeshandong
Categories
Tags
Stats
views:
0
posted:
11/5/2011
language:
English
pages:
14
DC3A: Melbourne Neuropsychiatry Centre (MNC)

Bioinformatics Development Project

1 Project Purpose and Activities

The MNC has one of the largest databases of brain scans and associated neuropsychiatric research data

in the world. It has National and International collaborators using and contributing to the database.

 Build workflow for automating documentation of dataset segments used in individual studies

and publications. This will include researchers, datasets, associated projects and publications.

 Build workflow for automating creation of citable persistent identifiers for unique studies and

linking with publications.

 Build software to automate capture of public facing metadata to University of Melbourne

Registry which will deliver collections metadata to the ARDC.

 MNC has 270+ publications resulting from datasets stored in the MNC database. Completing the

work above will result in ~ 100 dataset descriptions described in the ARDC by June 2011, with an

expected 25+ being registered each year after that.



2 Deliverables

Deliverable

D1 Project plan agreed by ANDS.



D2 Five sample Collection records in ARDC, with associated Party and Activity records,

of agreed standard.



D3 High level design documents:

a. Metadata mappings between each pair of metadata formats where

mappings required, including to RIF-CS.

b. Process descriptions for capturing MNC dataset metadata and storing it in

VITRO registry.

c. Process descriptions of integration between pre-print register, dataset

metadata and IP information.

d. Design document of overall system.



D4 Deployed system that:

a. Extracts metadata from datasets in the MNC database.

b. Automatically enriches dataset with pre-print register metadata, and

copyright/IP metadata, by connecting with those systems.

c. Allows extracted metadata to be enriched by user input.

d. Generates RIF-CS Collection, Party and Activity records from metadata.

e. Allows authorised external users to access datasets from MNC database,

using a query builder.

f. Stores datasets in VITRO registry.

g. Allows users to develop advanced queries to find datasets.

h. Deposits RIF-CS metadata in VITRO registry for ARDC harvest, including

Service descriptions.

i. Automatically assigns persistent identifiers to datasets where required.



D5 Collection descriptions for 100 datasets, with associated, Party, Service and Activity

descriptions produced by deployed system and made visible to ARDC.

a. As many descriptions as possible should contain links, or access

information, immediately shareable data.



D6 Source code for all developed software, with developer’s manuals (to facilitate

reuse) deposited in agreed open-source repository.



D7 Deployed, permanent, operational feed of Collection, Party, Service and Activity

description records to ARDC operational, with output of agreed quality.

DC3B: Longitudinal qualitative and quantitative

survey data capture and re-use, Youth Resource

Centre

1 Project Purpose and Activities

The Youth Research Centre’s Life Patterns Research Program maintains an extensive qualitative and

quantitative data base on a cohort of 2000 young Australians who left secondary school in 1991 and of a

second cohort of 3000 who left school in 2005. With ARC funding through to 2014 for annual

quantitative and qualitative data capture for the second cohort (Gen Y) and biannual data capture for

the first cohort (Gen X), this activity aims to enable wider access and use of the data by developing the

infrastructure to:

a) make sets of the existing data available for re-use,

b) streamline capture of new data so that it is more readily available for re-use, and

c) build the capacity to efficiently respond to future requests for derived data sets.

Appropriate structures for the capture of relevant metadata (compliant with DDI2, DDI3 and RIF-CS

schemas) and tools to extract this metadata from workflows will be developed.



2 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 Statement of ethical issues policy, indicating how future datasets will be released.



D3 Five sample collection descriptions (with associated activity, service and party

descriptions) in ARDC, including one for the Life Patterns Research Program Project

submitted to the ARDC



D4 High level design descriptions:

a. Process descriptions for deriving, describing and publishing re-usable data

sets

b. High level software design document, showing data flows and links

between components.



D5 Deployed system that:

a. Automates the deriving and publishing of re-usable data sets

b. Automatically captures and extracts metadata in quantitative and

qualitative data capture workflows

c. Allows extracted metadata to be enriched by human input.

d. Allows authorised external users to access datasets.

e. Automatically assigns persistent identifiers where required.

D6 Twenty Collection descriptions, with linked Party, Service and Activity descriptions,

produced by deployed system and available for harvest by ARDC. These describe

linked, comparative longitudinal case studies on young people’s life trajectories

from each of cohort 1 and cohort 2 illustrative of the underlying the quantitative

and qualitative data.

a. As many descriptions as possible should contain links, or access

information, immediately shareable data.



D7 Source code for all developed software deposited in agreed open-source repository,

with developer’s manuals to facilitate reuse.



D8 Deployed, permanent, operational feed of collection, party and activity records to

ARDC, with output of agreed quality.

DC3C: Optimising Metadata Capture, Data Sharing

Procedures and Long-term Re-use of Video data in

the Social Sciences

3 Project Purpose and Activities

The University of Melbourne has an especially rich humanities and social science research community

that utilises video as its primary form of data capture. The increasing use of video as a research tool

poses particular challenges for aggregated data storage initiatives. This project will integrate metadata

capture facilities at selected sites within the University of Melbourne as part of facilitating sharing and

re-use. The project will address current metadata issues associated with large-scale audio-visual

repositories and workflows to enable efficient generation of metadata, ensuring that stored video data

is accessible and searchable through the ARDC. The project will:

 Develop software to automate the capture of metadata from existing mature video storage

systems developed by the ICCR (International Centre for Classroom Research),

 Develop and – where possible - utilise existing infrastructure to identify generic workflow tools

that will enable rich knowledge of data sets, access services and parties to the research to be

systematically (RIF-CS) captured from the researchers,

 Develop standards compliant video data and metadata deposit services.

These are generic goals which are broadly applicable to activities elsewhere within the university, for

example in the Faculty of Architecture, Building and Planning and the Faculty of the VCA and Music.



4 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 High level design documents:

a. Business process description and high level design for publication of video

dataset descriptions, including case-specific protocols and ethics

considerations for data access locally, nationally and internationally.

b. Metadata schema for video dataset descriptions

c. Mapping of schema to RIF-CS

d. Web services design documents and validation against existing research

projects.



D3 Five sample Collection descriptions, with linked Activity, Party and Service

descriptions, representing the selected active video intensive projects across the

University in ARDC.



D4 Deployed system to be used by research staff and data librarians that:

e. Allows video data to be deposited

f. Automatically extracts metadata from video data

g. Allows extracted metadata to be enriched by user input

h. Generates RIF-CS Collection, Party and Activity descriptions from metadata.

i. Ingests metadata into the University of Melbourne VITRO registry.

j. Automatically assigns persistent identifiers where required.



D5 Operational automatic feeds of Collection descriptions and associated Party and

Activity information to ARDC



D6 Agreed number (to be specified in project plan) of Collection descriptions, with

associated Party, Service and Activity descriptions, produced by deployed system

and available to ARDC.

a. As many descriptions as possible should contain links to, or access

information for, immediately shareable data.



D7 Deposit of all developed software in agreed open source repository, accompanied

by developer manuals to facilitate reuse.

DC3D: Human and mouse neuroimaging collections

in the national data commons

5 Project Purpose and Activities

DaRIS is a raw data management system based on the Mediaflux digital asset management platform and

has been in operation for the last 3 years at the Neuroimaging Computational and Data Management

Facility (CDMF). There it has been used to routinely receive MR images from researchers and organise

them into a subject-centric data model, ready for access by project members. It hosts over 70 mouse

and human projects, each with many tens of subjects and some with time-dependent data.

 Map DaRIS project-metadata to the ANDS schema

 Write a DaRIS service to populate ANDS-compliant metadata,

 Develop an adapter to harvest the ANDS-compliant metadata from DaRIS

 Connect identifiers within DaRIS to ANDS persistent identifiers (PIDs).



Relationship to the National Imaging Facility (NIF) ANDS proposal

The DaRIS system has been selected by the NCRIS NIF to provide its data management capability. The

NIF will manage collections of data from a range of domains which are primarily but not only

neuroimaging (e.g. plant imaging, microscopy, etc.). This is possible because the general DaRIS

framework can be tailored to a number of domains. Nonetheless, each domain requires different

metadata definition design, data capture protocols and workflows; therefore the metadata capture

process is inherently different for each domain.

The University of Melbourne ANDS proposal focuses on neuroimaging metadata exposure (with

collections managed by DaRIS held at UoM) whereas the separate NIF ANDS proposal focuses on

operationalising the DaRIS system to multiple nodes of the NIF as well as exposing NIF collections with

DaRIS. There is thus no dependency between these two proposals.

6 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 Mapping of DaRIS project metadata to RIF-CS.



D3 High level design of:

a. DaRIS RIF-CS generation service.

b. DaRIS-ARDC OAI-PMH feed.

c. Integration with ANDS Persistent Identifier Service.

d. Automated extraction of metadata from datasets.



D4 Ethics policy for re-use of data collections. (If suitable number of collections cannot

be shared with other researchers on an ongoing basis, project cannot proceed.)



D5 Five sample Collections descriptions, accompanied by Party, Service and Activity

descriptions, representing a range of different dataset types, in ARDC.



D6 Deployed, tested, documented system that:

a. Extracts metadata from datasets.

b. Allows users to enhance metadata for datasets.

c. Generates ANDS-compliant RIF-CS from datasets.

d. Exposes RIF-CS as OAI-PMH feed, with controls to prevent harvesting of

non-shareable collections.

e. Provides direct download access to datasets with appropriate

authentication and authorisation controls.

f. Automatically assigns persistent identifiers where needed.



D7 Collection descriptions for 100 datasets, including Service, Activity and Party

description links, produced by deployed system and available for harvest by ARDC.

a. As many descriptions as possible should contain links to, or access

information for, immediately shareable data.



D8 Deposit of all developed software in agreed open source repository, accompanied

by developer manuals to facilitate reuse.

DC3E: Humanities and Social Science Data at the

University of Melbourne

7 Project Purpose and Activities

The University of Melbourne has one of the most rich and diverse humanities and social science (HASS)

research communities in Australia and is well ranked internationally. HASS researchers at Melbourne

generate and hold valuable data sets and associated materials that are currently not easily discoverable,

accessible or configured for further research purposes. This project will build infrastructure (tools and

services) to connect this diverse community with the UoM Registry (Vitro) which will in turn

communicate the relevant metadata to the ARDC. The project will:

 Develop and utilize existing (OHRM-based) infrastructure to identify generic workflow tools that

will enable rich knowledge of data sets and related materials, access services and parties to the

research to be systematically (RIF-CS) captured from the researchers.

 Development of a generic web services-based data capture tool to be used both by researcher

staff, data librarians or other staff in the data management fabric. This will be based on the ‘pre-

register’ work done for the Australian Women’s Register in 2009

 Develop standards compliant ‘access service’ descriptions

 Ensure project, data, party and service descriptions concord with Data Documentation Initiative

(v2&3) requirements.

 It will inform the development and utilisation of digital and analogue archival preservation,

curation and access systems for the University

8 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 Sample descriptions in ARDC as follows:

a. Collection, Activity, Party and Service descriptions representing five

selected active HASS projects across the University in ARDC.



b. Five sample Collection descriptions, with associated Party, Service and

Activity records, drawn from those projects made available in the ARDC.



D3 Mapping of one (or more, if applicable) dataset formats held in OHRM, to RIF-CS.



D4 Design document for web services to be built.



D5 Design document for a generic web service-based data entry, ingest and metadata

management tool (henceforth “web data capture tool”) to be used both by

researcher staff, data librarians or other staff.



D6 Deployed, tested, documented system that:

a. Allows data related to humanities projects to be input, managed, browsed,

searched.

b. Provides a RIF-CS feed into the University of Melbourne’s VITRO registry.

c. Is integrated with data from a number of existing OHRM databases, to be

specified in the project plan.

d. Can be controlled through web services, including the bulk retrieval of data.



D7 Agreed number of Collection descriptions (as determined and specified in project

plan), with associated Service, Party and Activity descriptions, produced by

deployed system and available for harvest by ARDC.

a. As many descriptions as possible should contain links to, or access

information for, immediately shareable data.



D8 Deposit of all developed software in agreed open source repository, accompanied

by developer manuals to facilitate reuse.



D9 All descriptions available for harvest from VITRO.

DC3F: Capturing multi-modal data to support

research in cardiovascular and neurological

medicine

9 Project Purpose and Activities



Complex physiological data is routinely collected on patients as part of clinical care (echocardiography,

intravascular ultrasound, x-ray angiography, optical computerised tomography, patient clinical data,

etc.). However, this rich multi-model data is not usually subjected to subsequent analysis nor is it made

available to researchers from other disciplines for novel analysis. Making this multi-model data available

along with patient outcomes such as morbidities will provide the opportunity for collaborative groups to

employ novel strategies to developed assessments and models based on this data. This project will form

necessary base of making multi-model data collections available, enabling the establishment of new

links between biomedical research groups in engineering, physics and bioinformatics. This project will

occur in collaboration with BioGrid Australia where it will use the access, de-identification and privacy

protection protocols already established there. The major activities will be:

 Map BioGrid metadata to the ANDS schema,

 Write a service to populate ANDS-compliant metadata,

 Develop a service to harvest ANDS-compliant metadata from multiple BioGrid data sets which

form a single study,

 Enable the assignment of globally unique identifiers that link to the source of multi-modal

datasets.

10 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 Ethics policy quantifying which datasets will be shareable and under which

conditions, and which will never be shareable.



D3 High level design documents:

a. Mapping of BioGrid datasets to RIF-CS.

b. Design of service to generate RIF-CS.

c. Process descriptions, including ethics approvals, ETL processes,

deidentification and metadata annotation.



D4 Five sample Collection descriptions, with associated Service, Party and Activity

descriptions, in ARDC.



D5 Deployed, tested, documented system that:

a. Extracts metadata from datasets in BioGrid.

b. Allocates persistent identifiers where needed.

c. Allows that metadata to be enriched by user input.

d. Provides access for authorised external users to the datasets.

e. Provides an OAI-PMH feed of ANDS-compliant RIF-CS.

f. Allows datasets of different levels of shareability to be managed.



D6 Collection descriptions with associated Service, Party and Activity descriptions for

ten multi-modal and all descriptions produced by deployed system, and available

for harvest by ARDC.

a. As many descriptions as possible should contain links to, or access

information for, immediately shareable data.



D7 Deposit of all developed software in agreed open source repository, accompanied

by developer manuals to facilitate reuse.

DC3G: Founder and Survivors Project

11 Project Purpose and Activities

The Founders and Survivors Project (http://www.foundersandsurvivors.org/ ) has brought together a

number of research data sets created from records relating to the 73,000 convicts transported to

Tasmania in the 19th century and their descendents to create a population database of national and

international significance for historical, demographic and population health researchers.



This project will:

 Develop a toolkit based around the projects XML/TEI workflow for further relevant records sets

to be systematically ingested into the population database,

 Build the infrastructure to enable persistent identification and descriptions of derived data sets

produced on request from the population database to be made available to the ARDC.



12 Deliverables

Deliverable

D1 Project plan agreed by ANDS, in ANDS standard template format.



D2 High level design documents:

a. Description of automated processes for deriving, describing and publishing

ingest data sets.

b. Description of automated processes for deriving, describing and publishing

derived data sets.

c. Mapping of data set metadata to RIF-CS.



D3 Five sample Collection descriptions, with associated Party and Activity descriptions,

representing both ingest and derived datasets in ARDC.



D4 Deployed system that:

a. Generates RIF-CS Collection, Party and Activity descriptions from derived

and ingest datasets.

b. Includes an extraction and ingestion toolkit for researchers to incorporate in

their data collection workflows to facilitate the production of ingest data

sets to the population database



D5 User documentation for all types of users.



D6 Collection, party and activity descriptions for 20 ingest data sets that meet ANDS

requirements, produced by the deployed system, and available for harvest to the

ARDC.

a. As many descriptions as possible should contain links to, or access

information for, immediately shareable data.

D7 Source code for all developed software published to open source repository, with

developer documentation to facilitate reuse.



Related docs
Other docs by gegeshandong
Centre of mass - Maths - it_
Views: 0  |  Downloads: 0
Chapters 11 12
Views: 5  |  Downloads: 0
TFC-MS100 - Hespro
Views: 0  |  Downloads: 0
836329-9.4 Pneumonia Consent Form
Views: 0  |  Downloads: 0
19089
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!