rdf by huanghengdong


									Author: Akiyoshi Matonoy, Toshiyuki
 Amagasay, Masatoshi Yoshikawaz,
 Shunsuke Uemuray
  Semantic Web
• The World Wide Web growing ever larger and more
  complex, the Semantic Web has emerged as a vision of
  the next generation of the web. Compared with the
  current Web, the Semantic Web makes human-to-
  machine and machine-to-machine interactions more
  intelligent with the good quality and quantity of
  metadata on Web resources.
• Resource Description Framework (RDF), the core of the
    Semantic Web, describes its metadata and semantics.
    With the popular utilization of the Semantic Web, the
    storage and retrieval of RDF data come into the light
•   RDF is commonly used for large data, such as ontology
    or dictionaries. If we use conventional RDF databases to
    process such large data, some problems may emerge.
 • RDF Schema is a specification for defining
   schematic information of RDF data. It makes
   developers define a particular vocabulary for RDF
   data and specify the kinds of object.

 • RDF data can be decomposed into statements, so it
   also can be modeled as a directed graph, where
   nodes and arcs represent resources and
   relationships separately. It is composed of RDF-
   meta schema data, RDF schema data and RDF data,
   and each group are instances of the former one.
The conventional approach

• Flatly store

• Problems?

  Any query contains RDF schema information will not be handled
The conventional approach
• Creates relational tables for classes and properties,
      storing resources according to their classes.
• Problems?
   Doesn’t make any distinction between schema
   and data, will have problem when you perform a
   schema query other than RDF data query.
The conventional approach
   • Store the subject , predicate and object as
       keys into three tables. using these keys ,
       we can retrieve corresponding statements.
   •   Problems?
        – Poor performance when processing
          path-based queries.
        – Join operation makes the query string
Sub graphs
  Graph CI, inheritance relationships between classes
  Graph PI, inheritance relationships between properties
  Graph T, a single-labeled directed acyclic graph
  Graph DR, domain (rdfs:domain) or range (rdfs:range) of
   each property
  Graph G, consist of all the remaining statements not
   included in the above sub graphs
  Separate RDF schema information and RDF instance data
  Simpler structure ease to store
Path expression

     Store arc paths of the graphs into path table in relational
Extended interval numbering
   Add virtual root if the graph has more than one root
   Add new node (s) for the node which is reachable
    through multiple path
   Each node is assigned (preorder, postorder, depth)
   V is an ancestor of u: pre (v) < pre (u) ^ post (v) > post
    (u), v, u are nodes in the graph.
   V is a parent of u: v is an ancestor of u, and depth (u) –
    depth (v) = 1
Relational database schema
Query processing
 Path query - Find the title of something painted by someone:
   SELECT r.resourceName
   FROM path AS p, resource AS r
   WHERE p.pathID = r.pathID
   AND p.pathexp = '#title<#paints'
 Schema query - Find the names of the classes that are
  http://www.w3.org/2000/01/rdf-schema# Resource’s direct super
   SELECT c1.className
   FROM class AS c, class AS c1
   WHERE c.pre < c1.pre
   AND c.post > c1.post
   AND c.depth = c1.depth - 1
   AND c.className =
Summary & Conclusion
• The main reason for the study is to improve the
  performance, while retrieving RDF related data
  and path based querying of Relational RDF data
  is efficient as it reduces number of joins. Also, It
  is for both RDF without schema, and RDF with
  schema data. The paper assumes that most of
  the RDF data is acyclic. The other thing to
  observe is, sub graph extraction into 5 sub
• Data is stored based on 5 sub
    graphs. Extended interval numbering
    scheme is used to detect parent –
    child relationships, resulting into fast
    retrieval of super classes, sub
•   It is mentioned that most of the
    queries for RDF data are generally
    queries to detect sub graphs
    matching a given graph. Also, they
    are, in general, queries to detect a
    set of nodes, which can be reached
    via given path expression. So, RDF
    data can be dealt more efficiently
    using path based queries.
Why Relational RDF…
  • Because Flat & Hash approaches do
      not make any distinction between
      schema information & resource
  •   Schema approach is able to process
      RDF based queries. What about
      schema less RDF data. Also, there is
      a big overhead while maintaining
      schema, as it evolves.
  •   Hence, Relational DB and store the
      RDF data, schema in separate tables.
Conclusions :
As both RDF schema & RDF instance
  data are stored in to distinct
  relational tables, We
1.Can handle schema less RDF data.
2.Can process, schema based queries.
  (using the extended interval
  numbering scheme.)
3.Can process, path based expressions
  as the RDF data is stored in the
  Relational DB based on path
• Also, the performance is
  dramatically improved, as the
  length of path expression is
  increased. Refer to the graph
  on Page 6.

• Problems:
• Sub graphing, Assumption of
  Acyclic data, No mention of
  ETL if we want to convert from
  conventional. Not easy to
  query (compared SQL).

To top