; 31
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

31

VIEWS: 1 PAGES: 28

  • pg 1
									The myGrid project:
services, architecture and
demonstrator



Professor Carole Goble,
Chris Wroe, Robert Stevens and the
myGrid consortium

http://www.mygrid.org.uk
myGrid




• EPSRC UK e-Science pilot project
• Open Source Upper Middleware for
  Bioinformatics
• (Web) Service-based architecture ->
                                                       Newcastle
  OGSA Grid services
                                                   Sheffield
• Prototype v1 Release Oct 2003, some       Manchester
                                                 Nottingham
  services available now.
                                                           Hinxton
• Targeted at Tool Developers,
  Bioinformaticians and Service Providers   Southampton
AHM meta-talk for a meta-paper
Experiences with e-                                Semantic and
Science workflow            The NEReSC Core        Personalised Service
specification and           Grid Middleware        Discovery
enactment in
bioinformatics
                                                     AMBIT: Acquiring
                                                     Medical and
myGrid                    The myGrid project:        Biological Information
Notification              services,                  from Text
Service                   architecture and
                          demonstrator               Service-Based
Provenance of e-                                     Distributed Query
Science                                              Processing on the
Experiments -                                        Grid
experience from       Performing in        Soaplab – a Unified
Bioinformatics        silico Experiments   Sesame Door to Data
                      on the Grid: A       Analysis Tools
                      Users Perspective
                                                                         [source:
                  Data-intensive bioinformatics                          GlaxoSmithKline]




ID   MURA_BACSU     STANDARD;      PRT;   429 AA.
DE   PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE
DE   (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE
DE   ENOLPYRUVYL TRANSFERASE) (EPT).
GN   MURA OR MURZ.
OS   BACILLUS SUBTILIS.
OC   BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;
OC   BACILLUS.
KW   PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.
FT   ACT_SITE    116    116       BINDS PEP (BY SIMILARITY).
FT   CONFLICT    374    374       S -> A (IN REF. 3).
SQ   SEQUENCE   429 AA; 46016 MW; 02018C5C CRC32;
     MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI
     GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP
     RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT
     IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
Graves disease                 Application Drivers




                 • Autoimmune disease of
                   the thyroid in which the
                   immune system of an
                   individual attacks cells
                   in the thyroid gland
                   resulting in
                   hyperthyroidism
                 • Weight loss, trembling,
                   muscle weakness,
                   increased pulse rate,
                   increased sweating and
                   heat intolerance, goitre,
                   exophtalmos
      Biology working together with…
                      TSH
                                     Autoimmune Antibodies attach
                                     to TSH receptors, competing
                                                                             •        Grave’s Disease caused by
         Pituitary                                                                    the stimulation of the
         Gland                       with TSH
                                        TSH                                           thyrotrophin receptor by
                                        Recept                                        thyroid-stimulating
                                        or                                            autoantibodies secreted by
                                             Thyroid
       -ve feedback
                                             Cell                                     lymphocytes of the immune
           effect                                                                     system.
                                                                             •        What is the molecular basis
                                                                                      for this autoimmune
                       Thyroid Hormones
                                                                                      response?
                       Released



What genes
might be                                                                                 Affymetrix data mining tool
                                                                                           Probe IDs            ESTs
                                          Wet-lab biology
associated with                               Extract
                                                                         8 datasets
                                                                                           Gene ID
                                                                                                             A
                             4 patients                       U95A
Graves’                      4 controls     lymphocyte
                                              mRNA
                                                            Affy chips
                                                                                            NCBI
                                                                                                     Gene    P
                                                                                                              M
Disease?                                                                                 What genes are expressed in
                                                                                          patient samples but not in
Affymetrix                                                                                controls, and vice versa?

microarray
studies                                                            Candidate gene
                                                                        pool
                                                                          Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003)
                                                                          1School of Computing Science and 2Institute of Human Genetics,

                                                                          University of Newcastle-upon-Tyne.
                                            Candidate gene
       Bioinformatics                            pool


  Annotation Pipeline            Genotype Assay Design System                                  3D Protein Structure
   What is known about my           Select a SNP from candidate gene.                    What is the structure of the protein
      candidate gene?                   Is this SNP associated with                   product encoded by my candidate gene?
                                                  Disease?                                     Query PDB & display protein
                                                   Gene ID                                       structure using Rasmol
           Medline
                                                Primer Design
                                         Emboss Eprimer application                             PDB
EMBL                        GO                 in SoapLab

                                       Use primers designed by myGrid
                                  to amplify region flanking SNP on the gene
                                                                                             Obtain information about protein
                                                                                          & extract information about active site
                                                    SNP
           Query
                                                                                              Interpro    Swiss-Prot     AMBIT
                                           Restriction Fragment
                                     Length Polymorphism experiment

                                       Selection of restriction enzyme                      Determine whether coding SNPs
                                                                                           affects the active site of the protein
OMIM                   BLAST
                                       Emboss Restrict        Talisman
                                         in SoapLab

           DQP
                                                                                               AMBIT
                                                                     SN
                                      SNP                        P
                                                                     SN
                                                                 P
       Workflows are in silico
       experiments     http://cvs.mygrid.org.uk/scufl/NucleotideSeqAnnotationPipelineWithGoTerms/


  Annotation Pipeline
   What is known about my
      candidate gene?


            Medline



EMBL                        GO




            Query



                                                 Experimental orchestration
                                                        Exploratory
OMIM                   BLAST
                                                    Hypothesis driven
                                                      Not prescriptive
             DQP
                                                     Methodology free
                                                          Ad hoc
       Experiment life cycle
      Resource & service discovery                                         Personalised registries
          Repository creation                                              Personalised workflows
          Workflow creation                                                 Info repository views
       Database query formation                                           Personalised annotations
                               Forming experiments                         Personalised metadata
                                                                                   Security
                                                                  Personalisation


    Discovering and reusing
   experiments and resources
                                                                                     Executing
    Workflow                                                                        experiments
   discovery &
                                                                                    Workflow enactment
   refinement
                                                                                     Distributed Query
  Resource &
                                                                                        processing
service discovery
                                                                                      Job execution
   Repository          Providing services &
                                                           Managing                     Provenance
     creation             experiments
                                                          experiments                   generation
  Provenance
                     Service registration                                             Single sign-on
                     Workflow deposition             Information repository            authentician
                     Metadata Annotation             Metadata management             Event notification
                         Third party                      Provenance
                         registration                     management
                                                       Workflow evolution
                                                        Event notification
Bio in silico experiments service types
• Making in silico experiments    • Sharing experiments
   – workflow                        – semantic services for
   – distributed database query        discovering services and
     processing.                       workflows, and managing
                                       metadata
                                     – third party service registries
• Managing experimental                and federated personalised
  outcomes                             views over those registries,
   – information management          – ontologies and ontology
   – managing metadata                 management.

• Scientific method               • The base services that tools
   – provenance management          that will constitute the
   – change notification            experiments
   – personalisation                 – third party services such
                                       databases, computational
                                       analyses, simulations ….
                                     – specialised services such
                                       as AMBIT text extraction.
Investigation = set of experiments + metadata

• Experimental design
  components

• Experimental instances
  that are records of
  enacted experiments

• Experimental glue that
  groups and links design
  and instance
  components

• Life Science IDs
• URIs
• RDF
                    myGrid   in a nutshell




                                                                                               Applications
Bioinformaticians



                    e   External Applications: workbench, portal, Talisman, Taverna


                          semantic grid capabilities           High level services for
                                                                                        d
                        knowledge-based technologies,        e-Science experimental
                           semantic-based service,                 management




                                                                                               Core services
                           workflow & data discovery,       provenance, change notification,
                                  match making                       personalisation,
Tool Providers




                         c linking investigation              investigation and experiment
                                 components.                      holdings management

                                High level services for data intensive integration
                        b
                                      workflow & distributed query processing




                                                                                               External services
Service Providers




                              A “second generation” open service-based Grid project,
                            a test bed for the OGSI, OGSA and OGSA-DAI base services


                        a      External Services: AMBIT, SoapLab, EMBOSS…
                        myGrid       Service Stack




                                                                                                           Applications
Bioinformaticians


                                                  Taverna
                         Work bench        workflow environment       Web Portal Talisman application
                    e                                     Gateway


                                   Service and Workflow                      Personalisation         d
                        Registries
                                         Discovery                Provenance mgt      Event Notification




                                                                                                           Core services
                        Ontologies       Ontology Mgt
Tool Providers




                           c             Metadata Mgt                     myGrid Information
                                                                             Repository

                                     FreeFluo Workflow                           OGSA
                          b
                                      enactment engine                Distributed Query Processor
Service Providers




                                                                                                           External services
                                         Web Service & Grid communication fabric


                               Soaplab                  Bio Services                   AMBIT
                                                                               Text Extraction Service
                                 SRS
                                 EMBOSS                           a
          Deployment Architecture
                         User 1

                                  Workbench                                                                                                 User 2

               Enactor
                                      Ga tew ay                                                                                              Brow ser




                                                                      Team Server
                                                                                                                                                       Porta l

                                                                                                                                                         Ga tew ay




    SemanticFi nd
                                                                                                          UserPro xy
                                              Vi ews                                           User 1 proxy       User 2 proxy
                                                                                                                                                                     Enactor
                                    User 1 v iew       User 2 v iew


       DQPService
                                                                                                               Notifi ca ti ons                MIR
                                     BioServ ice6




    Organi zati on Servers
                                                                                    Regi stry Provi der
                     Regi stry          Service Provider 1
                                                                                                                                  Service Provider 2
                                                                                              Regi stry
                                          BioServ ice1
                                                                                                                             BioServ ice2     BioServ ice3

                                          TextServ ice
BioServ ice4      BioServ ice5
  Putting the services together
                                                              Semantic registration

  Service                         Registry
                                  Registry                            Knowledge
                                                                     Knowledge
                                                 Service               Services
                                                                       Service
                syntactic registration          Publication
                                   Registry
                                    View
                                     Registry
                                       View        Match
                                                   maker
                                                                 Provenance
            Find Component
                                                                   browser
mIR
                                                                      mIR
    mIR     Service &           FreeFluo Workflow
  browser   workflow                enactment                  Notification
                                                              Notification
             browser                  engine                     Service
                                                               Service
                                                                                      User
                                                                                      Proxy
                                   Distributed
                                                       Job Execution
            mIR                  Query Processor
                                                                                           AMBIT
             mIR                                                                        Information
                                                    Service Service Service              Extraction
                                                                                          Service
       A work bench for demonstrating services
 myView on
  the mIR
                                         Workflow




 Metadata                                 note about
  about                                    workflow
 workflow




NetBeans
Notification service
• A new gene with changed
  expression in Graves’
  Disease added to mIR

• User registers interest in
  notification topics

• Informs the user via a
  notification client in the
  workbench that new data
  has been added to the mIR.

• Notifications presented to
  the user with a client in the
  workbench environment.
Semantic discovery – services & workflows
                              • Services and workflows
                                described using semantic web
           A registry browser   technologies and ontologies
                              • Selection by the types of inputs
                                they use, outputs they produce,
                                the bioinformatics tasks they
                                perform…
                              • DAML+OIL  OWL
                              • RDF-based UDDI registry
                              • Multiple & 3rd party registries
                              • Multiple & 3rd party metadata




                                 A workflow wizard
The mIR holds the experimental components

                           • We need to discover
                             which workflows have
                             been published that can
                             operate on data of this
                             specific semantic type
                             (an Affymetrix probe set
                             identifier)
                           • Some might be in mIR,
                             some might be in global
                             registry
                           • mIR holds all
                             experimental
                             components
                           • Multiple mIRs
                           •   Built on RDMS & OGSA-
                               DAI
                           •   Plans: Federated
                               architecture, LSIDs and
                               RDF
 Create and run a workflow

• If an appropriate
  workflow does not
  exist, a new one can
  be created in the
  Taverna editor
• Workflow & outputs
  stored in mIR
• Freefluo workflow
  enactment engine
• WSFL & Scufl
• Joint development
  with HGMP and EBI


            http://sourceforge.net/projects/taverna)
  Provenance logging and reusing
• FreeFluo provides a
  detailed
  provenance record
  stored in the mIR
  describing what
  was done, with
  what services and
  when
• Can be viewed
  within the
  workbench
• XML document
                        Provenance is not just workflow
• Every mIR object
  have (dublin core)    Derivation paths ~ workflows, queries
  provenance            Annotations ~ notes
  properties            Evolution paths ~ workflow  workflow
  Legacy Bio Services publication
• Wrap CORBA, Perl etc to
  look like web services, to
  become Grid services
  (eventually)
• SoapLab
   – A soap-based
      programmatic interface to
      command-line
      applications
   – ~300 different classes of
      services
   – Swiss-Prot, EMBOSS,
      Medline…
• 3rd parties
   – JEMBOSS, PathPort,
      bioMoby
Talisman application: using individual services




http://www.ebi.ac.uk/collab/mygrid/service1/talisman/index.html
The annotation pipeline to identify
Genes of Interest
  Annotation Pipeline
                                 Look at contents of work bench
   What is known about my
      candidate gene?

                                 User notified of new Affy data
           Medline



EMBL                        GO
                                 Run a workflow over new Affy data
                                    – Launch workflow wizard
                                    – Discover appropriate
                                       workflow
           Query
                                    – Enact workflow
                                    – Monitor workflow

OMIM                   BLAST     Look at provenance

           DQP
                                 Select and view results
Status and plans
• Reflecting on what we have
   – All the components have an implementation in various states
     of maturity and functionality, some of which are
     downloadable already: Freefluo, Taverna, Soaplab.
   – Field evaluations with Uni. Newcastle Grave Disease
     geneticists and GSK with seeded data
• Expanding the user base
   – Use cases in Sleeping Cow
• Each component has plans, e.g.
   – More sophisticated model of provenance and other
     experimental data holdings, to store much more heavily
     linked metadata about provenance that will enable us to
     create views of the mIR along many axes.
   – The myGrid Information Repository to be significantly
     revised.
   – Review & Systematisation of type management
• Migration strategy to OGSA
Summary

• myGrid offers service based middleware
  components
• Open source and freely downloadable
• Open Grid Service Architecture-compliant
• Allows the scientist to be at the centre of the
  Grid -- Personalisation
• Generic middleware that suits the creation of
  bioinformatics applications
• Inclusion of rich semantics to facilitate the
  scientific process
• Available from http://www.mygrid.org.uk
Our Biology colleagues




    Simon Pearce                Claire Jennings



           Institute of Human Genetics
         School of Clinical Medical Sciences
             University of Newcastle
                         UK
The techy dudes

 Matthew Addis, Nedim Alpdemir, Rich
 Cawley, [Vijay Dialani], Alvaro Fernandes,
 Justin Ferris, Rob Gaizauskas, Kevin Glover,
 Carole Goble (director), Chris Greenhalgh,
 Mark Greenwood, Ananth Krishna, Peter Li,
 Xiaojian Liu, Darren Marvin, Karon Mee,
 Simon Miles, Luc Moreau, Juri Papay,
 Norman Paton, Steve Pettifer, Milena
 Radenkovic, Peter Rice, [Angus Roberts],
 Alan Robinson, Martin Senger, Nick
 Sharman, Paul Watson, Anil Wipat & Chris
 Wroe.

								
To top