BBSRC E-science & Bioinformatics (with additional support from the

Shared by: HC120727031027
Categories
Tags
-
Stats
views:
0
posted:
7/26/2012
language:
pages:
3
Document Sample
scope of work template
							BBSRC E-science & Bioinformatics (with additional support from the DTI)


Grant Title:
A Distributed Pipeline for Structure-based Proteome Annotation using Grid
Technology

Project name
Name: e-Protein

Aims and Objectives
The aim is to provide a structure-based annotation of the proteins in the major genomes
linking resources at 3 sites by Grid technology. The objectives are: (1) - to establish local
databases with structural and function annotations, (2) - to disseminate to the biological
community our proteome annotation via a single web-based distributed annotation system
(DAS), (3) - to share computing power transparently between sites using Grid
middleware such as the Globus Toolkit, (4) to use the developed system for comparison
of alternative approaches for annotation and thereby identify methodological
improvements, (5) to establish a pre-prototype at 6 months for demonstration purposes,
(6) to provide a working system after two years, (7) to link to relevant bioinformatic and
Grid resources that will be integrated into this project.

E-science issues involved
The project presents a complex Grid infrastructure of distributed computational resources
with different databases and specialised analysis software that may not be deployed on
every resource. It is therefore essential that the Grid infrastructure is able to accurately
represent the state of the software and hardware resources at each site. This
heterogeneous capability needs to be effectively exploited by the complex workflow
presented within the protein annotation pipeline. The capture of this workflow and the
mapping of its components to the distributed resources are the key e-science issues within
this project.

Brief overview of the system architecture
The proposed system builds on top of a service oriented architecture based around the
current stable release of the Globus Toolkit (version 2) while research and development
activities are under taken with version 3. The scientists will define the required workflow
within, say, a graphical environment by ‘dragging and dropping’ database and processing
components to build their required workflow. This application specification will then be
mapped to the ‘best’ currently available resources through a scheduling infrastructure.
The scheduler will examine the currently available services (both software and data
sources) and evaluate the capability of the free resources to meet the requirements
specified by the user, e.g. the inter-operation dependencies. Instantiation of the workflow
takes place on the resources using an X.509 based security infrastructure.

Use of metadata
The ICENI middleware used within the project has a rich meta-data structure that will
allow the current state of the resources to be captured. This meta-data will allow the
different library versions and application programs to be accurately represented. This
information will be defined in an XML schema.

Use of and contribution to evolving standards
The computer science researchers within the project are active in developing
infrastructures using the Open Grid Services Architecture, and open standard defining the
the next generation of Grid middleware infrastructures.

Current state of the project

The project has met its first milestone and demonstrated:
   inter-institution Grid computing
   integrated web-based access to databases at different institutions

Specifically:

    All staff have been recruited

    The groups of Sternberg, Jones and Orengo each have a local pipeline for proteome
     annotation that has common features but substantial difference.

    The group of Thornton is developing libraries for structure-based assignment of
     protein function.

    The group of Darlington and Newhouse at ICL and Sorensen at UCL have
     implemented Globus-based facilities for external sites to utilize their computing
     resources.

    Proteome annotation has been run by Imperial at UCL and by UCL at Imperial
     using the Globus Toolkit V2 protocol

    The group of Birney (EBI) has developed software for Protein DAS that serves as a
     front end to the proteome annotations at different sites.

    The DAS front end has successfully integrated access to databases at Imperial and
     UCL.


Institutions involved and their role

    Imperial College
    University College London
    European Bioinformatics Institute
   Each institution is involved both with the proteome annotation and the GRID
    computing

Names of the team

   Prof M Sternberg, Prof J Darlington and Dr S Newhouse (Imperial College
    London)
   Prof D Jones, Prof C Orengo & Dr S Sorensen (University College London)
   Prof J Thornton, Dr E Birney & Dr A Robinson (European Bioinformatics Institute,
    Cambridge)



Resources
The project obtained support for six PDRAs, with two at each site together with
consumables, travel and local hardware. 36 months support was provided by the BBSRC
and a further 3 months from the DTI to promote links with industry. In addition, the
project will use existing high performance computing at the three sites such as that
purchased recently under SRIF for e-science.

						
Related docs
Other docs by HC120727031027
BIOGRAPHY BOOK REPORT
Views: 18  |  Downloads: 0
Ave EAYC Correlation
Views: 4  |  Downloads: 0
BIOGRAPHY BOOK REPORT - DOC
Views: 97  |  Downloads: 0
New Employment Forms Checklist Faculty
Views: 1  |  Downloads: 0
Student Information Sheet - DOC 1
Views: 8  |  Downloads: 0
BIOGRAPHY BOOK PROJECT
Views: 3  |  Downloads: 0
Sixth Grade Biography Project Rubrics
Views: 236  |  Downloads: 0
poetry portfolio soph
Views: 12  |  Downloads: 0
1961 Dag Hjalmar Agne Carl Hammarskj�ld
Views: 1  |  Downloads: 0