Proposal Template

Document Sample
Proposal Template Powered By Docstoc
					30/06/2004                                                     BRIDGES Work Plan v4

                                   Bridges Work Plan
                      (1st July 2004 – 30th September 2004)

1. Introduction
This document provides an updated outline of the Bridges work plan, extending the third
version (see
2. Progress to Date
The Bridges team has made the following progress to date:
      A portal has been developed to host software/Grid services and provide a central
       information source for the CFG scientists collaborating with Bridges. This is hosted
       on the IBM WebSphere platform.
      Initial prototypes of Grid services have been developed wrapping key bioinformatics
       tools (BLAST). These services are being extended with the help of the eDIKT team to
       include both ScotGrid and a Condor pool at NeSC Glasgow.
      An initial DB2 data repository has been developed providing a central warehouse of
       the key public data sets identified by the CFG scientists – specifically some of those
       data sets that do not offer programmatic interfaces (OMIM, SWISS-PROT, HUGO,
       RGD). This repository is linked to remote data resources (currently Ensembl rat,
       mouse and human DBs and MGI in Jackson) via appropriate Information Integrator
      Two key applications have been developed/extended in Bridges
          o SyntenyVista which is used to visualise syntenic relationships between
            genomic data sets. A version of SyntenyVista has been extended to support
            remote visualisation (via OGSA-DAI) of remote genomic data sets.
          o An initial prototype of MagnaVista has been produced. MagnaVista enables
            single querying of multiple live genomic data sources, mining of online
            bioinformatics data, upload of local data to a central database and sharing of
            this data with multiple users in a secure manner. It acts as the front end to the
            Bridges repository.
      Three papers (extended abstracts) were produced and accepted for the UK e-Science
       All Hands Meeting.
      A paper was sent to the Life Science Grid conference in Kanazawa, Japan and
       accepted as an invited paper.
      A presentation and demonstration of the Bridges work was made at the Life Science
       Grid meeting at the Global Grid Forum in Hawaii.
      Initial roll-out to the CFG scientists is planned for 30th June 2004.

3. Work Plan
3.1 Scientific Engagement
30/06/2004                                                     BRIDGES Work Plan v4

A key requirement for progression of the work is more engagement with the CFG scientists.
This was highlighted through the sparsity of use cases in deliverable D2.2. (These use cases
will be refined in future deliverables D3.1 and D4.1.) Engagement issues are likely to be
resolved after the initial roll-out of the software on 30th June. A pre-ISMB CFG meeting is
also planned where CFG-Bridges interactions might take place.
Direct discussions and on-site visits were made with the CFG scientists (Derek/Micha visited
Rick Dixon in Leicester) to address what data could be shared and how this might be accessed
3.2 Upcoming Deliverables
The deliverables work plan up for the project is as follows:
D3.1 Report on Updated List of Use Cases (Derek Houghton – editor)
           o Refine and update deliverable D2.1
                      Should include SyntenyVista usage, MagnaVista usage, and planned
                       BLAST usage, possibly BioBeans workflows, …
D3.2 Operational System giving Uniform Access across Sites (Magnus Ferrier – editor)
           o The plan with this deliverable was to replicate the basic infrastructure at each
             site in the CFG consortium. This should mean that the original technology,
             working practices, scripts, queries, of WP2 run at every site. However, given
             that the project has now realised a portal giving access to the services which
             give access to the data repository itself, this deliverable has now changed in
             scope. That is, this deliverable will now focus upon experiences of the CFG
             scientists in using the developing services and data sets.
           o In addition, this phase of the work should also include initial investigations in
             the usage of a Grid plus replication manager, (Replica Location Service - RLS)
             to maintain the illusion of selected data from each site being available to all
             other sites.
                      Given that a single repository is being constructed and that there is
                       minimal data being made available at remote CFG sites, this RLS
                       investigation will focus upon replication of the data repository at
                       Glasgow and Edinburgh.
D3.3 Report on Cycle 2: experience, lessons & issues
           o Done. (Documented as AHM 2004 paper.)
D3.4 Plan & Base System for UP3 (Richard Sinnott)
           o Scope and plan to be refined as work on SyntenyVista, MagnaVista, portal,
             BLAST and other visualisation and bioinformatics services evolve. Will be
             done once CFG users have given feedback.
D3.5 Initial Evaluation of Information Integrator (Derek Houghton)
           o Done. Document sent through to IBM. As a result, we have now agreed to be
             Beta testers for Masala (IBMs latest extended version of Information
3.3 Detailed Work Plan and Responsibilities
30/06/2004                                                 BRIDGES Work Plan v4

In addition to the planned deliverables and engagement with the CFG scientists as a whole,
various other works are required to be realised. The specific people responsible are denoted
along with a description of the work to be completed and the priority associated with the
 Person                         Description                            Priority Deadline
 Neil                           Extend SyntenyVista to work with M              To       be
                                local (CFG) data and Bridges                    determined
                                repository (via OGSA-DAI) – even                when
                                if this is just for local scientists            Neil’s hols
                                (Neil, Donald etc)                              over
                                Neil/Ela - Can SV be extended to
                                support this? (Can it be easily
                                extended to work on non-ensembl
                                data and do the scientists have data
                                in the form which could be
                                imported into an “extended”
                                version of SV?
 Magnus                         Implement automatic X.509 based L               6-08-04
                                authentication in portal;
 Magnus/Micha/Derek             Prototype web service based L                   To       be
                                download of remote public data                  determined
                                sets to get “up to date” data from
                                those sites without programmatic
                                Derek look at and implement basic
                                process for automating these data
                                sets to update repository (note if
                                data sets change schemas etc then
                                this will break the update system
                                but we cannot realistically address
                                this issue)
 Magnus                         Finalise MagnaVista version 1 – H               Initial
                                including data mining capabilities,             version
                                QTL upload functionality, linkage               done – but
                                to Micha’s BLAST service when                   should add
                                this is ready, and using Derek’s                info      on
                                stored       procedures          for            when data
                                querying/bringing back data sets                last
                                                                                checked for
                                Extend MV to also include
                                information on when returned data
                                was last checked for “freshness”
 Micha/Neil                     Investigate possible family of M                Starts after
                                “open source” visualisation tools               07-08-04
                                (JalView, FeatureView, …) that
                                could be integrated into meaningful
                                scientific Grid service family
30/06/2004                                         BRIDGES Work Plan v4

                    deliverable from the portal;
                    Need to get CFG to provide
                    software FeatureView etc
Micha/Aileen/Davy   Complete     GT3    based     job H            07-08-04
                    scheduling / BLAST service with
                    EDIKT team – making usage of
                    ScotGrid, Condor pool in training
                    lab, and possibly (???) National
                    Grid Service
                    This release should have basic
                    functionality and GUI client tools
                    with a later release supporting high-
                    throughput BLAST’ing
Micha               Design and implement basic …                   To        be
                    BioBeans         scenario    using             determined
                    SyntenyVista,           MagnaVista,            after CFG
                    BLAST and data repository                      experiences
                    following discussions with CFG                 of existing
                    scientists (Neil);
Derek               Continue refining existing version H           continuous
                    of repository adding necessary
                    stored procedures to return data sets
                    from other remote resources
                    Major issue is CFG Microarray
                    Data warehouse at Imperial College
                    – we should try to support this /
                    link in to it etc if allowed. If not
                    then consider linkage to other
                    resources such as ArrayExpress.
Derek/Kostas        Sketch out outline proposal for how H          Starts    02-
                    OGSA-DAI/DAIT/DQP can be                       08-04
                    applied for data repository
Derek               Release extended version of M                  Depends on
                    repository including GO, PubMed                CFG input
Neil                Map RAE230 chips and make this M               To       be
                    information available to CFG                   determined
                    consortia, either via data repository          once Neil’s
                    or via DB linked to repository                 holiday
                    Should ideally use advanced
                    BLAST facility being engineered
                    by Micha
Aileen/Davy         Complete porting of Eldas to L                    Done
Aileen/Davy         Enhance Eldas to work with joins L              Expected
                    between relational database queries
30/06/2004                                                    BRIDGES Work Plan v4

                                                                                    1-8-04 in
 Aileen/Davy                        Complete implementation of flat L              Starts after
                                    file querying capabilities                       1-8-04
 Rich                               Investigate back-up solutions using M
                                    Edinburgh and/or Glasgow SAN
                                    Glasgow SAN not available very
                                    soon. Edinburgh SAN possible –
                                    who do we need to contact to get
                                    access/use this?
                                    Other GU HW being spec’d out
                                    and ordered
 Rich/John/Derek/                   Investigate design     of H                   Starts
 Magnus/Anthony                     PERMIS/SAML GT3 roles and                     26-07-04
 Rich/John/Derek                    Finalise design and implement M               06-08-04
                                    authorisation infrastructure
 Rich/John/Derek/                   Complete report on PERMIS usage L             asap
                                    for Prof. Chadwick
 All                                Think about publishing papers       M         continuous

3.4 Open Issues
There are several open issues with the work and the next phase of development.
1. Lack of Grid
Currently much of the software development activities (MV, SV, portal, repository) have had
little direct “Grid”. Since MV and SV are standalone Java applications, Grid-enabling them
has been considered as adding unnecessary overheads. It is likely that the next phase of the
work on the repository will focus more on OGSA-DAI (and DQP) solutions so that
comparisons can be made between Grid data access and commercial solutions like IBM’s
Information Integrator Masala. In addition, the next phase of the work should be much more
focused on the security aspects of the Grid infrastructure (exploring GSI, PERMIS etc).
2. Lack of Testbed Infrastructure
Currently work on maintaining and updating the data repository is done on the “live”
resource. Ideally we should have a test bed for trialling our systems. This requires a suitable
server being available. Currently neither NeSC (Glasgow or Edinburgh) nor IBM has such a
machine available. It might be the case that DCS/BRC have funds available (~£6.5k) to
purchase a new machine.
3. The existing BLAST service assumes preformatted data exists on ScotGrid (nucleotide and
non-repudiation nt/nr DBs). The current solution is that we keep the ScotGrid data
(nr/nt/others data sets) regularly updated and not through exporting the data directly from the
30/06/2004                                                    BRIDGES Work Plan v4

DB2 repository. (It is not possible to simply export the data onto ScotGrid from the DB2
repository since formatting of the data is necessary and to do live formatting each time it is
needed would require significant computing which would outweigh the benefits of using the
BLAST service in the first place.) The result of this is a danger of inconsistencies between the
data either in or linked via the repository and the data on ScotGrid exists.
4. The SyntenyVista service should “ideally” be extended to support remote (i.e. at CFG sites)
data sets with syntenic regions and data available via the Bridges data repository. The
question remains if the scientists have such data sets.
5. We will focus initially on OGSA-DAI/DAIT/DQP work and eventually if time permits
incorporate ELDAS/AVAKI solutions.
6. The deliverables list needs to be re-evaluated and names attributed to editorship. This can
be done at the next full Bridges meeting.

The next review is informally scheduled for October. The precise date has not yet known been

Shared By: