BBSRC E-science & Bioinformatics (with additional support from the
Shared by: HC120727031027
-
Stats
- views:
- 0
- posted:
- 7/26/2012
- language:
- pages:
- 3
Document Sample


BBSRC E-science & Bioinformatics (with additional support from the DTI)
Grant Title:
A Distributed Pipeline for Structure-based Proteome Annotation using Grid
Technology
Project name
Name: e-Protein
Aims and Objectives
The aim is to provide a structure-based annotation of the proteins in the major genomes
linking resources at 3 sites by Grid technology. The objectives are: (1) - to establish local
databases with structural and function annotations, (2) - to disseminate to the biological
community our proteome annotation via a single web-based distributed annotation system
(DAS), (3) - to share computing power transparently between sites using Grid
middleware such as the Globus Toolkit, (4) to use the developed system for comparison
of alternative approaches for annotation and thereby identify methodological
improvements, (5) to establish a pre-prototype at 6 months for demonstration purposes,
(6) to provide a working system after two years, (7) to link to relevant bioinformatic and
Grid resources that will be integrated into this project.
E-science issues involved
The project presents a complex Grid infrastructure of distributed computational resources
with different databases and specialised analysis software that may not be deployed on
every resource. It is therefore essential that the Grid infrastructure is able to accurately
represent the state of the software and hardware resources at each site. This
heterogeneous capability needs to be effectively exploited by the complex workflow
presented within the protein annotation pipeline. The capture of this workflow and the
mapping of its components to the distributed resources are the key e-science issues within
this project.
Brief overview of the system architecture
The proposed system builds on top of a service oriented architecture based around the
current stable release of the Globus Toolkit (version 2) while research and development
activities are under taken with version 3. The scientists will define the required workflow
within, say, a graphical environment by ‘dragging and dropping’ database and processing
components to build their required workflow. This application specification will then be
mapped to the ‘best’ currently available resources through a scheduling infrastructure.
The scheduler will examine the currently available services (both software and data
sources) and evaluate the capability of the free resources to meet the requirements
specified by the user, e.g. the inter-operation dependencies. Instantiation of the workflow
takes place on the resources using an X.509 based security infrastructure.
Use of metadata
The ICENI middleware used within the project has a rich meta-data structure that will
allow the current state of the resources to be captured. This meta-data will allow the
different library versions and application programs to be accurately represented. This
information will be defined in an XML schema.
Use of and contribution to evolving standards
The computer science researchers within the project are active in developing
infrastructures using the Open Grid Services Architecture, and open standard defining the
the next generation of Grid middleware infrastructures.
Current state of the project
The project has met its first milestone and demonstrated:
inter-institution Grid computing
integrated web-based access to databases at different institutions
Specifically:
All staff have been recruited
The groups of Sternberg, Jones and Orengo each have a local pipeline for proteome
annotation that has common features but substantial difference.
The group of Thornton is developing libraries for structure-based assignment of
protein function.
The group of Darlington and Newhouse at ICL and Sorensen at UCL have
implemented Globus-based facilities for external sites to utilize their computing
resources.
Proteome annotation has been run by Imperial at UCL and by UCL at Imperial
using the Globus Toolkit V2 protocol
The group of Birney (EBI) has developed software for Protein DAS that serves as a
front end to the proteome annotations at different sites.
The DAS front end has successfully integrated access to databases at Imperial and
UCL.
Institutions involved and their role
Imperial College
University College London
European Bioinformatics Institute
Each institution is involved both with the proteome annotation and the GRID
computing
Names of the team
Prof M Sternberg, Prof J Darlington and Dr S Newhouse (Imperial College
London)
Prof D Jones, Prof C Orengo & Dr S Sorensen (University College London)
Prof J Thornton, Dr E Birney & Dr A Robinson (European Bioinformatics Institute,
Cambridge)
Resources
The project obtained support for six PDRAs, with two at each site together with
consumables, travel and local hardware. 36 months support was provided by the BBSRC
and a further 3 months from the DTI to promote links with industry. In addition, the
project will use existing high performance computing at the three sites such as that
purchased recently under SRIF for e-science.
Get documents about "