Maintenance of RDF Aggregate Views by ewghwehws

VIEWS: 4 PAGES: 38

									              RDF Aggregate
              Queries and Views
Edward Hung, Yu Deng, V.S.
Subrahmanian
University of Maryland, College Park
Maintenance of RDF Aggregate
Views
 Introduction of RDF and RDQL
 RDQL Extension for Aggregate Views
 Aggregate View Maintenance Algorithms
  AMX
 Implementation and Experiments
 Related Work
Publication
   Edward Hung, Yu Deng, V.S.
    Subrahmanian, "RDF Aggregate Queries
    and Views", to appear in the Proc. of the
    21st International Conference on Data
    Engineering (ICDE), Tokyo, Japan, 2005.
Introduction
   Resource Description Framework (RDF)
     W3C   Recommendation
     Represents metadata about resources
      identifiable on the web (by Uniform Resource
      Identifier (URI))
     Triple: (Resource, Property, Value)
       (Artist, rdf:type, rdfs:Class)
       (Painter, rdf:type, rdfs:Class)

       (Painter, rdfs:subClassOf, Artist)
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xml:base="http://www.auctionschema.com/schema1#">
 <rdfs:Class rdf:ID="Artist"/>
 <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf                           RDF
   rdf:resource="#Artist"/></rdfs:Class>
                                                                         Schema
 <rdfs:Datatype rdf:about="&xsd;string"/>
 <rdf:Property rdf:ID="fname">
  <rdfs:domain rdf:resource="#Artist"/>
  <rdfs:range rdf:resource="&xsd;string"/>
 </rdf:Property>
</rdf:RDF>

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ns1="http://www.auctionschema.com/schema1#">
 <rdf:Description rdf:about="http://www.artist.net#guyrose">             RDF
  <rdf:type rdf:resource="ns1:Painter"/>                                 Instance
  <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname>
 </rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xml:base="http://www.auctionschema.com/schema1#">
 <rdfs:Class rdf:ID="Artist"/>                                             fname
                                                                 String             Artist
 <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf
   rdf:resource="#Artist"/></rdfs:Class>
 <rdfs:Datatype rdf:about="&xsd;string"/>                                 subClassOf
 <rdf:Property rdf:ID="fname">
  <rdfs:domain rdf:resource="#Artist"/>                                            Painter
  <rdfs:range rdf:resource="&xsd;string"/>
 </rdf:Property>
</rdf:RDF>

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ns1="http://www.auctionschema.com/schema1#">
 <rdf:Description rdf:about="http://www.artist.net#guyrose">
                                                                            fname
  <rdf:type rdf:resource="ns1:Painter"/>                     Guy                      &r1
  <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname>
 </rdf:Description>
</rdf:RDF>                                              &r1 = http://www.artist.net#guyrose
RDQL: RDF Query Language




SELECT?highprice
WHERE (?artist, <ns1:lname>, "Rose"),
(?artist, <ns1:fname>, "Guy"),                view pattern
(?artist, <ns1:creates>, ?artifact),
(?artifact, <ns1:estimated>, ?price),
(?price, <ns1:high>, ?highprice),
(?artifact, <ns1:presented>, ?date)
AND 2004-04-01 <= ?date <= 2004-04-30
USING ns1 FOR http://www.auctionschema.com/schema1#>
RDQL Extension for Aggregates and Views



CREATEVIEW AS
SELECT max(?highprice)
WHERE (?artist, <ns1:lname>, "Rose"),
(?artist, <ns1:fname>, "Guy"),
(?artist, <ns1:creates>, ?artifact),
(?artifact, <ns1:estimated>, ?price),
(?price, <ns1:high>, ?highprice),
(?artifact, <ns1:presented>, ?date)
AND 2004-04-01 <= ?date <= 2004-04-30
USING ns1 FOR http://www.auctionschema.com/schema1#>
   We are expanding the syntax of RDQL so that it allows
    constants in SELECT clauses which equivalently creates new
    resources and properties using the constants.
   For example, the previous query can be modified as follows
        CREATEVIEW AS
        SELECT <ns1:works_by_guyrose>,
         <ns1:maxprice>, max(?highprice)
        WHERE (?artist, <ns1:lname>, "Rose"),
        (?artist, <ns1:fname>, "Guy"),
        (?artist, <ns1:creates>, ?artifact),
        (?artifact, <ns1:estimated>, ?price),
        (?price, <ns1:high>, ?highprice),
        (?artifact, <ns1:presented>, ?date)
        AND 2004-04-01 <= ?date <= 2004-04-30
        USING ns1 FOR http://www.auctionschema.com/schema1#>
   The result is a valid RDF statement
    (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)
Aggregate View Maintenance
   Relational Approach
     Store  all triples in a relational table with schema
      (Resource, Property, Value)
    OR
     Store resources and values of the same property in a
      separate relational table with schema (Resource,
      Value)
     #self-joins = (#triples in where-clause) – 1
     Large number of delta rules during relational view
      maintenance  expensive
Aggregate View Maintenance
   Graph-structured DB (GSDB) [Zhuge,
    Garcia-Molina, ICDE 1998]
     GSDB  assumes a rooted graph model while
      RDF is a general graph
     A GSDB view contains a set of nodes while
      our RDF views can contain nodes, edges, or
      any combinations.
Aggregate View Maintenance
   Our Approach
     Localized  search in RDF graphs
     breadth-first search starting at the
      inserted/deleted edge
     auxiliary data are needed for certain
      aggregate views
         min, max, avg
Compute Aggregates Algorithm
CAA
view pattern
BAG
BAG
800000
SELECT max(?highprice)       BAG
                         800000, 500000
Aggregate View Maintenance
Algorithms AMX
 AMI – Insertion
 AMD – Deletion
 AMT – Triple Modification
 AMR – Resource Modification
                             BAG
Update: Insertion        800000, 500000




                paints
             BAG
         800000, 500000




paints
SELECT max(?highprice)          BAG
                         800000, 500000, 60000




        paints
AMI for Insertion
Distributive Aggregate Function
   An aggregate function f is distributive w.r.t a source
    update operation if and only if after such an operation,
    the updated value of the function can be computed
    based on its old value and the value(s) of the source
    update without reference to the source.
   More formally, f is distributive w.r.t. an update operation
    U if and only if there exists a function g such that f(I') =
    g(f(I), v) where f(I) is the aggregate value, I' is the
    updated instance after the update operation U(I, v), and
    v is the value(s) used in the update (e.g., the new value
    to add, the old value to remove, etc).
Distributive Aggregate Function
   Examples of distributive aggregate functions:
       count, sum, average w.r.t. insertion, deletion and update
       For average, we will need an additional attribute size which
        stores the size of S (in line 3 of CAA) in order to compute the
        correct updated value (or, we can use sum, count to calculate it)
   max and min are distributive w.r.t. insertion, but not
    deletion and update
       Auxiliary data computed from the source (such as S) can help to
        maintain non-distributive aggregate functions to avoid the need
        to refer to the source.
TMaintainI
                                   BAG
Update: Deletion            800000, 500000, 60000




                   paints
                BAG
         800000, 500000, 60000




paints
SELECT max(?highprice)      BAG
                         500000, 60000




        paints
AMD for Deletion
TMaintainD
Implementation and Experiment
   Implemented in Java
   Jena – RDQL Engine of HP
   Comparison with Relational Approach (standard
    view maintenance algorithm on relational tables)
     CountingAlgorithm in Gupta et al. "Maintaining Views
      Incrementally", SIGMOD 1993
   Dataset: Chef Moz Project RDF dump
   Data stored in memory
Other Related Work
   Voltz et al. [DBFUSION’02]
       the first to introduce a view mechanism for RDF data
       Their views require that
         1.   the results contain class instances (i.e., a subject or object
              variable), or
         2.   the result itself has the pattern of RDF statement (i.e., a triple
              containing subject, predicate and object).
   Magkanaraki et al [ISWC’03]
       proposed RVL, a view definition language that can also create
        virtual RDF schemas and restructure class and property
        hierarchies such that new resources, property values, classes
        and property types can be created.
   None of these works specifically address (i) aggregates
    in RDF or (ii) the problem of maintaining aggregate RDF
    views.

								
To top