Docstoc

slides - W3C

Document Sample
slides - W3C Powered By Docstoc
					   From Web 1.0  Web 3.0:
Is RDF access to RDB enough?

                   Vipul Kashyap
                vkashyap1@partners.org
Senior Medical Informatician, Clinical Informatics R&D
             Partners Healthcare System

                  Martin Flanagan,
          mflanagan@insilicodiscovery.com
              CTO, InSilico Discovery


W3C Workshop on RDF Access to Relational Databases
              October 26th , 2007
    Outline
•    Position
•    Use Case Scenario
•    Solution Approach
•    A Generalized Framework for RDF Access
•    Next Steps:
     — Proposed Roadmap
     — Research Topics
 Position
There is a need for a generalized framework (format,
    representation language, algebra?) for RDF access to:
(A) Relational Databases
(B) Tabular Data Sources, e.g., Excel Spreadsheets
(C) Web Services

Motivation:
(A) Large amounts of “tabular” data and increasing number of
    web services in the Healthcare and Life Sciences
(B) Learn from the relational database success story: Declarative
    query language + Algebra + Opportunities for optimization
(C) Potential for providing incremental value, increasing the
    adoption and acceptance of the Semantic Web.
Use Case Scenario:
Biological Explanations for Statistical Correlations
•   What is the location of a given Gene, e.g., CPNE1 on the Human Genome?
    Data Repository: NCBI Entrez
    Access Mechanism: Web Services

•   For what gene(s) is a given SNP, e.g.., rs6060535 in the upstream regulatory
    region?
    Data Repository: RDBMS containing dbSNP and regulatory region data,
    Access Mechanism: JDBC/SQL

•   What genes have been found to be "coexpressed" with CPNE1 and in what
    study?
    Data Repository: Excel Spreadsheet containing the co-expression patterns of
    various genes in various studies.
    Access Mechanism: .NET API, MS Office API
Solution Approach
•   Ontology based RDF query specification
•   Mapping Framework
    — Relational Databases
    — Excel Spreadsheets
    — Web Services
•   Query Translations and Execution

Illustrations of a working system based on the Semantic Discovery
     System by InSilico Discovery
     (http://www.insilicodiscovery.com)
Ontology based RDF Query Specification
                 SPARQL Query Generated:

                prefix example <http://www.semanticdiscoverysystems.com/Example.owl#>
                prefix ns <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                select distinct ?v0, ?v1
                where
                {
                ?v0 ns:type example:gene
                ?v0 example:has_gene_region ?v1
                ?v0 example:gname ‘CPNE’
                }
Mapping to Relational Databases

                                                     Mapping to Oracle
                                                     Databases

                             Mapping to Gene Names
                             Mediator Class
Mapping to Web Services

                          Mapping to Web Services




                          Mapping to GetGenomeLocations
                          in gene_regions Mediator class
Mapping to Excel Spreadsheets

             Mapping to Spreadsheet Data




                                           Mapping to Gene Names
                                           Mediator Class
Query Translation and Execution




                                      This one SPARQL statement „joins‟ data
                                  From NCBI, Excel, Oracle – “who did what assay
                                                         Translators
                                          matching this sequence data …”
A Generalized Framework for RDF Access

                               Ontology Classes and Properties
                               Gene, GeneRegion
                               has_gene_region, gname




         Mediator Framework Classes:
         gene.mdl, gene_region.mdl, gene_names.mdl, …




       RDB specific classes:              Web service specific classes:   Excel specific classes:
       oracle.mdl                         ncbi.mdl, keg.mdl               excel.mdl




  The SDS Platform is based on the Mediator Definition Language
  work done by Val Tannen and his students at U. Pennsylvania.

  Was earlier implemented in the K3 system and was widely used in Pharma
Conclusions

•   Need to think of various types of structured/semi-
    structured/tabular data sources in a wholistic manner:
    — XML Documents (GRDDL Transforms)
    — Relational Databases
    — Web Services
    — Excel Spreadsheets
    — Other “Tabular” and “Tree” data sources
•   Potential for providing value beyond relational databases
•   Accelerate the transition to the Semantic Web
•   Increase Adoption and Acceptance
Next Steps: Proposed Roadmap

                        RDF


       Generalized Transformation Language
                        Relational
          GRDDL
                        Algebra



    XML          Relational          WSDL    Excel
                 Databases                   Spreadsheets
Next Steps: Research
•   Extension of Relational Algebra?
    — XQuery
    — RDF
    — GRDDL Transformations
    — WSDL
    — Read only Web Service Choreography/Composition
•   What aspects of the above can be “webified”?
    — Access Transformation Languages
    — Mapping Languages: Is XQuery or RDF enough?
•   Existing efforts in Mediator research
    — E.g., Mediator Definition Language (MDL)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:1/24/2012
language:
pages:14