VIEWS: 1 PAGES: 28 POSTED ON: 10/10/2011
The myGrid project: services, architecture and demonstrator Professor Carole Goble, Chris Wroe, Robert Stevens and the myGrid consortium http://www.mygrid.org.uk myGrid • EPSRC UK e-Science pilot project • Open Source Upper Middleware for Bioinformatics • (Web) Service-based architecture -> Newcastle OGSA Grid services Sheffield • Prototype v1 Release Oct 2003, some Manchester Nottingham services available now. Hinxton • Targeted at Tool Developers, Bioinformaticians and Service Providers Southampton AHM meta-talk for a meta-paper Experiences with e- Semantic and Science workflow The NEReSC Core Personalised Service specification and Grid Middleware Discovery enactment in bioinformatics AMBIT: Acquiring Medical and myGrid The myGrid project: Biological Information Notification services, from Text Service architecture and demonstrator Service-Based Provenance of e- Distributed Query Science Processing on the Experiments - Grid experience from Performing in Soaplab – a Unified Bioinformatics silico Experiments Sesame Door to Data on the Grid: A Analysis Tools Users Perspective [source: Data-intensive bioinformatics GlaxoSmithKline] ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 220.127.116.11) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI Graves disease Application Drivers • Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism • Weight loss, trembling, muscle weakness, increased pulse rate, increased sweating and heat intolerance, goitre, exophtalmos Biology working together with… TSH Autoimmune Antibodies attach to TSH receptors, competing • Grave’s Disease caused by Pituitary the stimulation of the Gland with TSH TSH thyrotrophin receptor by Recept thyroid-stimulating or autoantibodies secreted by Thyroid -ve feedback Cell lymphocytes of the immune effect system. • What is the molecular basis for this autoimmune Thyroid Hormones response? Released What genes might be Affymetrix data mining tool Probe IDs ESTs Wet-lab biology associated with Extract 8 datasets Gene ID A 4 patients U95A Graves’ 4 controls lymphocyte mRNA Affy chips NCBI Gene P M Disease? What genes are expressed in patient samples but not in Affymetrix controls, and vice versa? microarray studies Candidate gene pool Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003) 1School of Computing Science and 2Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene Bioinformatics pool Annotation Pipeline Genotype Assay Design System 3D Protein Structure What is known about my Select a SNP from candidate gene. What is the structure of the protein candidate gene? Is this SNP associated with product encoded by my candidate gene? Disease? Query PDB & display protein Gene ID structure using Rasmol Medline Primer Design Emboss Eprimer application PDB EMBL GO in SoapLab Use primers designed by myGrid to amplify region flanking SNP on the gene Obtain information about protein & extract information about active site SNP Query Interpro Swiss-Prot AMBIT Restriction Fragment Length Polymorphism experiment Selection of restriction enzyme Determine whether coding SNPs affects the active site of the protein OMIM BLAST Emboss Restrict Talisman in SoapLab DQP AMBIT SN SNP P SN P Workflows are in silico experiments http://cvs.mygrid.org.uk/scufl/NucleotideSeqAnnotationPipelineWithGoTerms/ Annotation Pipeline What is known about my candidate gene? Medline EMBL GO Query Experimental orchestration Exploratory OMIM BLAST Hypothesis driven Not prescriptive DQP Methodology free Ad hoc Experiment life cycle Resource & service discovery Personalised registries Repository creation Personalised workflows Workflow creation Info repository views Database query formation Personalised annotations Forming experiments Personalised metadata Security Personalisation Discovering and reusing experiments and resources Executing Workflow experiments discovery & Workflow enactment refinement Distributed Query Resource & processing service discovery Job execution Repository Providing services & Managing Provenance creation experiments experiments generation Provenance Service registration Single sign-on Workflow deposition Information repository authentician Metadata Annotation Metadata management Event notification Third party Provenance registration management Workflow evolution Event notification Bio in silico experiments service types • Making in silico experiments • Sharing experiments – workflow – semantic services for – distributed database query discovering services and processing. workflows, and managing metadata – third party service registries • Managing experimental and federated personalised outcomes views over those registries, – information management – ontologies and ontology – managing metadata management. • Scientific method • The base services that tools – provenance management that will constitute the – change notification experiments – personalisation – third party services such databases, computational analyses, simulations …. – specialised services such as AMBIT text extraction. Investigation = set of experiments + metadata • Experimental design components • Experimental instances that are records of enacted experiments • Experimental glue that groups and links design and instance components • Life Science IDs • URIs • RDF myGrid in a nutshell Applications Bioinformaticians e External Applications: workbench, portal, Talisman, Taverna semantic grid capabilities High level services for d knowledge-based technologies, e-Science experimental semantic-based service, management Core services workflow & data discovery, provenance, change notification, match making personalisation, Tool Providers c linking investigation investigation and experiment components. holdings management High level services for data intensive integration b workflow & distributed query processing External services Service Providers A “second generation” open service-based Grid project, a test bed for the OGSI, OGSA and OGSA-DAI base services a External Services: AMBIT, SoapLab, EMBOSS… myGrid Service Stack Applications Bioinformaticians Taverna Work bench workflow environment Web Portal Talisman application e Gateway Service and Workflow Personalisation d Registries Discovery Provenance mgt Event Notification Core services Ontologies Ontology Mgt Tool Providers c Metadata Mgt myGrid Information Repository FreeFluo Workflow OGSA b enactment engine Distributed Query Processor Service Providers External services Web Service & Grid communication fabric Soaplab Bio Services AMBIT Text Extraction Service SRS EMBOSS a Deployment Architecture User 1 Workbench User 2 Enactor Ga tew ay Brow ser Team Server Porta l Ga tew ay SemanticFi nd UserPro xy Vi ews User 1 proxy User 2 proxy Enactor User 1 v iew User 2 v iew DQPService Notifi ca ti ons MIR BioServ ice6 Organi zati on Servers Regi stry Provi der Regi stry Service Provider 1 Service Provider 2 Regi stry BioServ ice1 BioServ ice2 BioServ ice3 TextServ ice BioServ ice4 BioServ ice5 Putting the services together Semantic registration Service Registry Registry Knowledge Knowledge Service Services Service syntactic registration Publication Registry View Registry View Match maker Provenance Find Component browser mIR mIR mIR Service & FreeFluo Workflow browser workflow enactment Notification Notification browser engine Service Service User Proxy Distributed Job Execution mIR Query Processor AMBIT mIR Information Service Service Service Extraction Service A work bench for demonstrating services myView on the mIR Workflow Metadata note about about workflow workflow NetBeans Notification service • A new gene with changed expression in Graves’ Disease added to mIR • User registers interest in notification topics • Informs the user via a notification client in the workbench that new data has been added to the mIR. • Notifications presented to the user with a client in the workbench environment. Semantic discovery – services & workflows • Services and workflows described using semantic web A registry browser technologies and ontologies • Selection by the types of inputs they use, outputs they produce, the bioinformatics tasks they perform… • DAML+OIL OWL • RDF-based UDDI registry • Multiple & 3rd party registries • Multiple & 3rd party metadata A workflow wizard The mIR holds the experimental components • We need to discover which workflows have been published that can operate on data of this specific semantic type (an Affymetrix probe set identifier) • Some might be in mIR, some might be in global registry • mIR holds all experimental components • Multiple mIRs • Built on RDMS & OGSA- DAI • Plans: Federated architecture, LSIDs and RDF Create and run a workflow • If an appropriate workflow does not exist, a new one can be created in the Taverna editor • Workflow & outputs stored in mIR • Freefluo workflow enactment engine • WSFL & Scufl • Joint development with HGMP and EBI http://sourceforge.net/projects/taverna) Provenance logging and reusing • FreeFluo provides a detailed provenance record stored in the mIR describing what was done, with what services and when • Can be viewed within the workbench • XML document Provenance is not just workflow • Every mIR object have (dublin core) Derivation paths ~ workflows, queries provenance Annotations ~ notes properties Evolution paths ~ workflow workflow Legacy Bio Services publication • Wrap CORBA, Perl etc to look like web services, to become Grid services (eventually) • SoapLab – A soap-based programmatic interface to command-line applications – ~300 different classes of services – Swiss-Prot, EMBOSS, Medline… • 3rd parties – JEMBOSS, PathPort, bioMoby Talisman application: using individual services http://www.ebi.ac.uk/collab/mygrid/service1/talisman/index.html The annotation pipeline to identify Genes of Interest Annotation Pipeline Look at contents of work bench What is known about my candidate gene? User notified of new Affy data Medline EMBL GO Run a workflow over new Affy data – Launch workflow wizard – Discover appropriate workflow Query – Enact workflow – Monitor workflow OMIM BLAST Look at provenance DQP Select and view results Status and plans • Reflecting on what we have – All the components have an implementation in various states of maturity and functionality, some of which are downloadable already: Freefluo, Taverna, Soaplab. – Field evaluations with Uni. Newcastle Grave Disease geneticists and GSK with seeded data • Expanding the user base – Use cases in Sleeping Cow • Each component has plans, e.g. – More sophisticated model of provenance and other experimental data holdings, to store much more heavily linked metadata about provenance that will enable us to create views of the mIR along many axes. – The myGrid Information Repository to be significantly revised. – Review & Systematisation of type management • Migration strategy to OGSA Summary • myGrid offers service based middleware components • Open source and freely downloadable • Open Grid Service Architecture-compliant • Allows the scientist to be at the centre of the Grid -- Personalisation • Generic middleware that suits the creation of bioinformatics applications • Inclusion of rich semantics to facilitate the scientific process • Available from http://www.mygrid.org.uk Our Biology colleagues Simon Pearce Claire Jennings Institute of Human Genetics School of Clinical Medical Sciences University of Newcastle UK The techy dudes Matthew Addis, Nedim Alpdemir, Rich Cawley, [Vijay Dialani], Alvaro Fernandes, Justin Ferris, Rob Gaizauskas, Kevin Glover, Carole Goble (director), Chris Greenhalgh, Mark Greenwood, Ananth Krishna, Peter Li, Xiaojian Liu, Darren Marvin, Karon Mee, Simon Miles, Luc Moreau, Juri Papay, Norman Paton, Steve Pettifer, Milena Radenkovic, Peter Rice, [Angus Roberts], Alan Robinson, Martin Senger, Nick Sharman, Paul Watson, Anil Wipat & Chris Wroe.
Pages to are hidden for
"31"Please download to view full document