Docstoc

Semantic mediation

Document Sample
Semantic mediation Powered By Docstoc
					Semantic Mediation in myGrid

          Chris Wroe
      Manchester University
• UK e-Science Pilot Project.
• Oct 2001 – April 2005.
• £3.4 million.                            Newcastle

                                       Sheffield
• £0.4 million studentships.    Manchester
                                     Nottingham

                                               Hinxton

                                Southampton
                Data-intensive bioinformatics




ID   MURA_BACSU      STANDARD;     PRT;   429 AA.
DE   PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE
DE   (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE
DE   ENOLPYRUVYL TRANSFERASE) (EPT).
GN   MURA OR MURZ.
OS   BACILLUS SUBTILIS.
OC   BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;
OC   BACILLUS.
KW   PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.
FT   ACT_SITE    116     116      BINDS PEP (BY SIMILARITY).
FT   CONFLICT    374     374      S -> A (IN REF. 3).
SQ   SEQUENCE   429 AA; 46016 MW; 02018C5C CRC32;
     MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI
     GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP
     RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT
     IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
                                                   Service Stack




                                                                                                     Applications
Bioinformaticians



                     Work bench             Taverna          Web Portal            Talisman

                                                      Gateway


                                 Service and Workflow                 Personalisation
                    Registries
                                       Discovery            Provenance         Event Notification




                                                                                                     Core services
                    Ontologies      Ontology Mgt
Tool Providers




                      Views         Metadata Mgt                   myGrid Information
                                                                      Repository

                              FreeFluo Workflow                          OGSA-DQP
                              Enactment Engine                  Distributed Query Processor
Service Providers




                                                                                                     External services
                                 Web Service (Grid Service) communication fabric


                       SoapLab             GowLab                                      AMBIT
                                                                Native Web
                                                                                   Text Extraction
                                                                 Services
                     Legacy apps         Legacy apps                                  Service
Workflow approach




         Grave’s Disease
Workflow approach II
                    Issues
• Connecting web services together
  – Shim services
• Connecting data to web services
  – Data provenance delivered by LSIDs
• Connecting data to data
  – Distributed Query Processing
              Technology
– Resource Description Framework
  • Representing metadata about data and services
– Ontology Web Language
  • Representing concepts and classifications
    myGrid & Bioinformatics world
•   Automating mainstream, well known tasks
•   Well known mature data formats
•   Often no formal description of formats
•   Lots of code to manipulate formats already
    exists (BioPerl, BioJava …)

• Semantic mediation work in progress..
Williams-Beuren Syndrome Workflow
                               Explore gaps regions
               Main            within the W-B Critical
                               Region
          Bioinformatics
             Services
           Applications




                   Main
                  SHIM
                 Services
              Bioinformatics
               Application
        Williams Example (simple)
          Genbank                                Genscan
      retrieval service                   Gene predication service




Semantic level
  Genbank record                       genomic sequence in
           has_part genomic sequence
Syntactic level
           Genbank record              FASTA sequence
                             Sample Genbank Record
LOCUS       AY214156             1065 bp    mRNA   linear VRT 07-MAY-2004
DEFINITION Oncorhynchus nerka RH1 opsin mRNA, complete cds.
ACCESSION AY214156                                                               ORIGIN
VERSION      AY214156.1 GI:37787241
                                                                                       1 atgaacggca cagagggacc agatttctac gtccctatgt ccaatgctac tggcattgtt
KEYWORDS       .
SOURCE       Oncorhynchus nerka (sockeye salmon)                                       61 aggaacccct atgaataccc ccagtactac cttgtcagcc cagcggcgta ctcactcatg
ORGANISM Oncorhynchus nerka                                                           121 gctgcctaca tgttcttcct catcctcacc ggcttcccca tcaacttcct cacactctat
     Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
                                                                                      181 gtcaccatcg agcacaaaaa gctgaggacc gccctgaact acatcctgct gaacctggct
     Actinopterygii; Neopterygii; Teleostei; Euteleostei;
     Protacanthopterygii; Salmoniformes; Salmonidae; Oncorhynchus.                    241 gtggccgatc tcttcatggt aatcggaggc ttcaccacta cgatgtacac ctccatgcat
REFERENCE 1 (bases 1 to 1065)                                                         301 ggctatttcg tctttggaag aacgggctgc aacatcgagg gattctgtgc tacccatggt
AUTHORS Dann,S.G., Allison,W.T., Levin,D.B., Taylor,J.S. and Hawryshyn,C.W.
                                                                                      361 ggtgagattg ccctatggtc cctggttgtc ctggctattg agaggtggtt ggtcgtctgc
TITLE      Salmonid opsin sequences undergo positive selection and indicate an
     alternate evolutionary relationship in oncorhynchus                              421 aaacctatta gcaacttccg cttcagtgag acccatgcca tcataggcgt ggcctttacc
JOURNAL J. Mol. Evol. 58 (4), 400-412 (2004)                                          481 tgggtcatgg ctgctgcttg ctccgtcccc cctctgcttg ggtggtcccg ctatatcccc
 PUBMED 15114419
                                                                                      541 gaaggcatgc agtgctcatg tggaattgac tactacacgc gcgcccctga catcaacaat
REFERENCE 2 (bases 1 to 1065)
AUTHORS Dann,S.G., William,A.E., David,L.B. and Craig,H.W.                            601 gagtcctttg tcatccacat gttcgttgtc cactttatga ttcccctgtt catcatctcc
TITLE      Direct Submission                                                          661 ttctgctacg gcaacctgct ctgcgctgtc aaggcagctg ccgccgccca gcaggagtct
JOURNAL Submitted (08-JAN-2003) Biology, University of Victoria, PO Box
                                                                                      721 gagaccaccc agagggctga gagggaagtg acccgcatgg tcatcatgat ggtcgtctcc
     3020 Stn CSC, Victoria, British Columbia V8W 3N5, Canada
FEATURES              Location/Qualifiers                                             781 ttcctagtgt gctgggtgcc ctacgccagc gtggcctggt atatcttctg caaccaggga
  source           1..1065                                                            841 acagagttcg gccccgtctt catgacaatt ccggcattct ttgccaagag ttcgtccctg
             /organism="Oncorhynchus nerka"
                                                                                      901 tacaaccctc tcatctacgt gttgatgaac aagcagttcc gcaactgcat gatcaccacc
             /mol_type="mRNA"
             /db_xref="taxon:8023"                                                    961 ctgtgctgtg ggaagaaccc cttcgaggag gaggagggag cctccaccac tgcctccaag
  CDS          1..1065                                                                1021 accgaggcct cctccgtgtc ctccagctcc gtggctcctg cataa
             /codon_start=1
                                                                                 //
             /product="RH1 opsin"
             /protein_id="AAP58347.1"
             /db_xref="GI:37787242"
             /translation="MNGTEGPDFYVPMSNATGIVRNPYEYPQYYLVSPAAYSLMAAYM
             FFLILTGFPINFLTLYVTIEHKKLRTALNYILLNLAVADLFMVIGGFTTTMYTSMHGY
             FVFGRTGCNIEGFCATHGGEIALWSLVVLAIERWLVVCKPISNFRFSETHAIIGVAFT
             WVMAAACSVPPLLGWSRYIPEGMQCSCGIDYYTRAPDINNESFVIHMFVVHFMIPLFI
             ISFCYGNLLCAVKAAAAAQQESETTQRAEREVTRMVIMMVVSFLVCWVPYASVAWYIF
             CNQGTEFGPVFMTIPAFFAKSSSLYNPLIYVLMNKQFRNCMITTLCCGKNPFEEEEGA
             STTASKTEASSVSSSSVAPA"
                              FASTA

>gi|37787241|gb|AY214156.1| Oncorhynchus nerka RH1 opsin mRNA, complete cds
ATGAACGGCACAGAGGGACCAGATTTCTACGTCCCTATGTCCAATGCTACTGGCATTGTTAGGAACCCCT
ATGAATACCCCCAGTACTACCTTGTCAGCCCAGCGGCGTACTCACTCATGGCTGCCTACATGTTCTTCCT
CATCCTCACCGGCTTCCCCATCAACTTCCTCACACTCTATGTCACCATCGAGCACAAAAAGCTGAGGACC
GCCCTGAACTACATCCTGCTGAACCTGGCTGTGGCCGATCTCTTCATGGTAATCGGAGGCTTCACCACTA
CGATGTACACCTCCATGCATGGCTATTTCGTCTTTGGAAGAACGGGCTGCAACATCGAGGGATTCTGTGC
TACCCATGGTGGTGAGATTGCCCTATGGTCCCTGGTTGTCCTGGCTATTGAGAGGTGGTTGGTCGTCTGC
AAACCTATTAGCAACTTCCGCTTCAGTGAGACCCATGCCATCATAGGCGTGGCCTTTACCTGGGTCATGG
CTGCTGCTTGCTCCGTCCCCCCTCTGCTTGGGTGGTCCCGCTATATCCCCGAAGGCATGCAGTGCTCATG
TGGAATTGACTACTACACGCGCGCCCCTGACATCAACAATGAGTCCTTTGTCATCCACATGTTCGTTGTC
CACTTTATGATTCCCCTGTTCATCATCTCCTTCTGCTACGGCAACCTGCTCTGCGCTGTCAAGGCAGCTG
CCGCCGCCCAGCAGGAGTCTGAGACCACCCAGAGGGCTGAGAGGGAAGTGACCCGCATGGTCATCATGAT
GGTCGTCTCCTTCCTAGTGTGCTGGGTGCCCTACGCCAGCGTGGCCTGGTATATCTTCTGCAACCAGGGA
ACAGAGTTCGGCCCCGTCTTCATGACAATTCCGGCATTCTTTGCCAAGAGTTCGTCCCTGTACAACCCTC
TCATCTACGTGTTGATGAACAAGCAGTTCCGCAACTGCATGATCACCACCCTGTGCTGTGGGAAGAACCC
CTTCGAGGAGGAGGAGGGAGCCTCCACCACTGCCTCCAAGACCGAGGCCTCCTCCGTGTCCTCCAGCTCC
GTGGCTCCTGCATAA
        Williams Example (simple)
          Genbank                                    Genscan
      retrieval service                       Gene predication service




                                  Genbank service

                           EMBOSS seqret service
Semantic level
 Genbank record                            genomic sequence in
          has_part genomic sequence
Syntactic level
           Genbank record                   FASTA sequence
                   Graves disease

      Array Express                   Gene clustering service




Semantic level
           Microarray expression   Microarray expression
           data out                data in
Syntactic level
            Affymetrix CEL file    Treeview format
                                  Example data
        CEL format                                    Template
CellHeader=X   Y   MEAN     STDV     NPIXELS
                                               Cell header Probe ID
 0        0        112.0      24.4      25
 1        0        10699.0 1340.6 20           2 0               1000_at
 2        0        147.0 42.4      25          5 0               1001_at
 3        0        10602.0 2126.2 25
                                               2 3               1002_at
 4        0        100.8 29.9      20
 5        0        96.0    11.9    25
 6        0        9829.0 1983.4 25
 7        0        133.3 21.6      20                Treeview format
 8        0        9092.0 1470.7 25
                                               Probe_Id    Sample1236
                                               1000_at    147
                                               1001_at    96
                                               1002_at    -59
                   Graves disease

      Array Express                                Gene clustering service



                                   Template file


                              AffyR service
Semantic level
           Microarray expression             Microarray expression
           data out                          data in
Syntactic level
            Affymetrix CEL file              Treeview format
          Classification of shims
             Defn: experimentally neutral service used to connect
Shim service domain services that don’t quite fit
   FILTER
       MAPPER
       DEREFERENCER
       TRANSLATOR
              syntax (e.g. GenBank to EMBL)
              data (e.g. DNA to protein)
   TRANSFORMER
       SIFTER (sql SELECT type operation)
       PARSER (sql PROJECT type operation) -
              also known as SPLITTER or DECOMPOSER
   COMPARER
   SORTER
       Providing more assistance
Taverna workbench   1. Register          Pedro




 3. Query

                                                 2. Annotate




                             Taverna workbench
      myGrid’s model of services
operation                                                service

name, description             parameter                  name, description
input                                                    author
output
                              name, description          organisation
task
method                        semantic type
resource                      format
application                   transport type
                              collection type
                              collection format


                       workflow

                    WSDL operation                WSDL service

                    Soaplab service

                    bioMoby service
          Service Description Flow
                                                                 Instance Store
                                         Semantic
        XML document describing          Indexing
        service                         Component
                                                                   FACT DL
                                                                   reasoner
Pedro                   Discovery
                          Client
                                               Extract service
                                               descriptions to
                                               reason over



                                         Registry


                                    Jena RDF repository
Pedro




 <serviceDescription>                                            XML
    <organisation>http://genetics.man.ac.uk</organisation>
    <operation>
       <name>execute</name>
       <task>http://www.mygrid.org.uk/ontology#pairwise_local_aligning</task>
   …..
                                           RDF
        #service
 type                           http://genetics.man.ac.uk
                published_by                                          #aligning

    a1234
                                                               subclass
                   #operation
hasOperation
                         type           “execute”           #local_pairwise_aligning
                                name
                a2

                                 task
                                                       a3

   Queries possible within RDF repository:
               Find me an operation called “exec*”
               Find me a service provided by groups working on Williams
   disease
               Find me an operation which performs aligning?
                                          RDF
        #service
 type                          http://genetics.man.ac.uk
               published_by                                          #aligning

    a1234
                                                              subclass
                   #operation
hasOperation
                        type           “execute”           #local_pairwise_aligning
                               name
               a2

                                task
                                                      a3

   Queries not possible:
           Find me an operation which performs aligning which is local?
           Where does this service fit into a classification
                     OWL classes
#service                                                Most specific class
                                                       expression extracted
            Owl property restriction: hasOperation


           #operation
                                             #local_pairwise_aligning
           Owl property restriction: performsTask



   Definition: Service which has an operation which
   performs the task local pairwise aligning
                            OWL classes
                                                      Each service class has
                                                      its own property based
                                                           OWL definition

                  service
                       aligning service
                            local aligning service
                                 pairwise local aligning service

                                      a1234               Instance store indexes
                                                          our service instance in
Classification calculated                                  the appropriate place
 by the FACT reasoner
 using property based
       definitions
          Query by navigation
Service
browser

                         Service classified by
                                 task
           Use of ontologies
• Property based classification requires
  property based modelling
• Advantages
  – Explicit, machine interpretable, easier to
    maintain large ontologies with polyhierarchies
• Disadvantages
  – Complex definitions take time/ skill to author,
    require expert domain knowledge
  – Difficult to present back to the user
       Property based classification on steroids


    Data


nucleic acid sequence data
    RNA sequence data
    DNA sequence data
       Property based classification on steroids


    Data                        Feature


nucleic acid sequence data   nucleic acid sequence
    RNA sequence data            RNA sequence
    DNA sequence data            DNA sequence


                     encodes
       Property based classification on steroids


    Data                        Feature                   Biological Concept


nucleic acid sequence data   nucleic acid sequence   nucleic acid

    RNA sequence data            RNA sequence             RNA

    DNA sequence data            DNA sequence             DNA



                     encodes                sequence_of
       Property based classification on steroids


    Data                        Feature                   Biological Concept


nucleic acid sequence data   nucleic acid sequence   nucleic acid    nucleotide

    RNA sequence data            RNA sequence             RNA            ribonucleotide

    DNA sequence data            DNA sequence             DNA            deoxyribonucleotide



                     encodes                sequence_of             polymer_of
       Property based classification on steroids


    Data                        Feature                   Biological Concept


nucleic acid sequence data   nucleic acid sequence   nucleic acid    nucleotide

    RNA sequence data            RNA sequence             RNA            ribonucleotide

    DNA sequence data            DNA sequence             DNA            deoxyribonucleotide



                     encodes                sequence_of             polymer_of
Human readable ontologies
                            GROWL parser

                    GROWL renderer




                         OWL API
                          OWL API




                         Reasoner
          Only data to hand
• Metadata associated with data items.
• Life science identifier (LSID) protocol used
  to retrieve metadata.
• Metadata model similar to service
  parameter                  Data item

                            name, description
                            semantic type
                            format
                            collection type
                            collection format
                 Organisation level provenance                         Process level provenance
                                                                 runBy                            Service
                 Project                                         e.g. BLAST @ NCBI


                           Experiment design                                       Process
Provenance (1)


                           partOf         Workflow design           componentProcess
                                                                    e.g. web service invocation             Event
                                                                    of BLAST @ NCBI
                                       instanceOf
                                                                                                   componentEvent
                                                                                                   e.g. completion of a
                                                                                                   web service invocation
                                                            Workflow run                           at 12.04pm


                                                                    Data/ knowledge level provenance
                                                                                             knowledge statements
                                               run for                                       e.g. similar protein sequence to

           User can add templates to                                               Data item
                       Person
           each workflow process to
           determine links between
           data items.
          Organisation                                                 Data item                   Data item

                                                                  data derivation e.g. output data derived from input data
..masked_sequence_of
                                                                                              Provenance tracking                                                                                                    project
                                                                                                                                                      .. nucleotide_sequence
          >gi|19747251|gb|AC005089.3| Homo
          sapiens BAC clone CTA-315H11 from 7,
                                                                                                                                                                                                    ..part_of                                     organisation
          complete sequence
          AAGCTTTTCTGGCACTGTTTCCTTCTT
          CCTGATAACCAGAGAAGGAAAAGATC
          TCCATTTTACAGATGAG                                                                                                                               rdf:type                                      experiment definition                             ..part_of
          GAAACAGGCTCAGAGAGGTCAAGGCT
          CTGGCTCAAGGTCACACAGCCTGGGA
          ACGGCAAAGCTGATATTC
          AAACCCAAGCATCTTGGCTCCAAAGC                                                                                                                                                                ..part_of              ..author                  group
    urn:lsid:taverna:datathing:13
          CCTGGTTTCTGTTCCCACTACTGTCAG
          TGACCTTGGCAAGCCCT
          GTCCTCCTCCGGGCTTCACTCTGCAC
          ACCTGTAACCTGGGGTTAAATGGGCT                                                                                                                                                                      workflow definition                            ..works_for
          CACCTGGACTGTTGAGCG



                                                                                                                                                                                                   ..invocation_of         ..author                 person
                                                                                                                                                                               ..BLAST_Report
      ..similar_sequences_to                                                                                                                                                                             workflow invocation
                                                                                                                                                                                                                                      ..run_for
                             19747251


                             clone CTA-315H11 from 7, complete sequence
                                                                                                                                                          AC005089.3
                                                                                                                                                          831
                                                                                                                                                          Homo sapiens BAC
                                                                                                                                                                                rdf:type           ..run_during                              service description
                             15145617                                                                                                                     AC073846.6
                                                                                                                                                          815
                                                                                                                                                          Homo sapiens BAC
                             clone RP11-622P13 from 7, complete sequence
                             15384807                                                                                                                     AL365366.20
                                                                                                                                                          46.1
                                                                                                                                                          Human DNA sequence
                             from clone RP11-553N16 on chromosome 1, complete sequence




                                                                                                                                                                                                          service invocation
                             7717376                                                                                                                      AL163282.2
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens
                             chromosome 21 segment HS21C082
                             16304790                                                                                                                     AL133523.5
                                                                                                                                                          44.1




                   urn:lsid:taverna:datathing:15
                                                                                                                                                          Human chromosome
                             14 DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence
                             34367431                                                                                                                     BX648272.1
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens mRNA;
                             cDNA DKFZp686G08119 (from clone DKFZp686G08119)
                             5629923                                                                                                                      AC007298.17




                                                                                                                                                                                                                                      ..described_by
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens 12q22
                             BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence
                             34533695                                                                                                                     AK126986.1
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens cDNA
                             FLJ45040 fis, clone BRAWH3020486
                             20377057                                                                                                                     AC069363.10
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens
                             chromosome 17, clone RP11-104J23, complete sequence
                             4191263                                                                                                                      AL031674.1
                                                                                                                                                          44.1
                                                                                                                                                          Human DNA sequence
                             from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence
                             17977487                                                                                                                     AC093690.5
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens BAC
                             clone RP11-731I19 from 2, complete sequence
                             17048246                                                                                                                     AC012568.7




                                                                                                                                                                                    ..created_by
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens
                             chromosome 15, clone RP11-342M21, complete sequence
                             14485328                                                                                                                     AL355339.7
                                                                                                                                                          44.1
                                                                                                                                                          Human DNA sequence
                             from clone RP11-461K13 on chromosome 10, complete sequence
                             5757554                                                                                                                      AC007074.2
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens PAC
                             clone RP3-368G6 from X, complete sequence
                             4176355                                                                                                                      AC005509.1
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens
                             chromosome 4 clone B200N5 map 4q25, complete sequence
                             2829108                                                                                                                      AF042090.1
                                                                                                                                                          44.1
                                                                                                                                                          Homo sapiens
                             chromosome 21q22.3 PAC 171F15, complete sequence




                                                      ..filtered_version_of


 Relationship BLAST report has with
                A                                                                                                                                                                                   Other classes of information related
                                                                                                                                                                                                              B
 other items in the repository                                                                                                                                                                      to BLAST report
Using IBM’s Haystack
               GenBank
                         record




             Portion of the         Managing
                Web of             collection of
              provenance          sequences for
                                      review
                        Storage
• LSID has no protocol for storage
• Taverna/ Freefluo implements its own
  data/ metadata storage protocol
                        Publish interface



  Taverna/    data
                                            Metadata Store
  Freefluo
                                                             Data store
             metadata
                Retrieval
• LSID protocol used to retrieve data and
  metadata
• Query handled separately
               RDF aware client       LSID aware client


                          Query           LSID interface



                         Metadata Store

                                             Data store
        Queries within Workflows

             Semantic content of result depends
             on query and data source schema

         query                               query result
                         Grid Data Service


Select GO_ID
FROM GO                                           Gene ontology
WHERE GO.term LIKE “enzyme activity”;             term ID

Select GO_Annotation_ID                           protein ID
FROM GOA
WHERE GO.term LIKE “enzyme activity”;
  Distributed Query Processing
• DQP linked with the OGSA-DAI activity
• Built within myGrid project
• Plans execution of a query over multiple
  Grid Data Services
• Each Grid Data Service provides schema
  metadata
• Currently no semantic mediation
                    Example query                         DQP Plan
• select p.proteinId, blast(p.sequence)
  from p in protein, t in proteinTerm
  where t.termId = 'GO:0008372' and
  p.proteinId = t.proteinId
• “Select proteins and homologous proteins from
  SWISS-PROT which have been annotated with
  GO:008372”
                 t.proteinId     =    p.proteinId
Data encoding                                       Data encoding
the identity of a                                   the identity of a
protein in                                          protein in
SWISS-PROT                                          SWISS-PROT
namespace                                           namespace


           Gene ontology database   SWISS-PROT protein database
                                TAMBIS I
Query 1: Select motifs for antigenic human proteins that participate in
apoptosis and are homologous to the lymphocyte
associated receptor of death (also known as lard).

Translation: Select patterns in the proteins that invoke an immunological
response and participate in programmed cell death that
are similar in their sequence of amino acids to the protein that is associated
with triggering cell death in the white cells of the
immune system.
(A) Ontology expression:
 Motif which
   <isComponentOf (Protein which
      <hasOrganismClassification Species
       functionsInProcess Apoptosis
       hasFunction Antigen
        isHomologousTo
          Protein which <hasName ProteinName>)>)>

Species: Is instantiated by value “human”
ProteinName: Is instantiated by value “lard”
                                  TAMBIS II
• Informal query plan:
           • Select proteins with protein name “lard” from SWISS-PROT
           • Execute a BLAST sequence alignment process against
             SWISS-PROT results
           • Check the entries for apoptosis process and antigen function
           • Pass the resultant sequences to PROSITE to scan for their
             motifs
• CPL expression:
set-unique {(#motif1:motif1)I
\protein3 <- get-sp-entries-by-de("lard"), \protein2 <- do-blastp-by-sq-in-entry(protein3),
Check-sp-entries-by-kwd("apoptosis",protein2), check-sp-entries-by-de("antigen",protein2),
Check-sp-entry-for-species("human",protein2), \motif1 <- do-ps-scan-by-sq-in-entry(protein2)}
select p.proteinId, blast(p.sequence)
   from p in protein, t in proteinTerm
   where t.termId = 'GO:0008372' and
   p.proteinId = t.proteinId
• How we did it in the past
  – Service type directory
• How we currently plan to do it
  – Shims, genbank, microarray
• How we may want to do it in the future
  – DQP & TAMBIS
                 Overview
• We’re not attacking the same problem
• When would your problem become our
  problem
• Common descriptions of the core entities
  involved.
  – Data items, Datasets, Services.

				
DOCUMENT INFO