Databases by ewghwehws

VIEWS: 11 PAGES: 48

									 Topics covered

                                Databases
                                QL
                                Query containment
                                An evaluation of QL




CS848: Topics in Databases: Information Integration
 A simple case of information integration

                                              SQL Server

                                                           Global Schema
           Subsystem1



           Subsystem2            SQL                        Open, Scan


           Subsystem3

                                                           A single table: T




            A “local as view” (LAV) integration schema: T ´ Q2.
            User submits Q1.
            Query optimizer must determine if a scan of T suffices.
            True iff Q1 is equivalent to Q2.

CS848: Topics in Databases: Information Integration
 In the beginning …

Infinite countable sets of each of the following kinds of
symbols:
     C = {C1, C2, … }                                 (primitive concepts)
     A = {A1, A2, …} [ {B1, B2, …}                    (attributes)
     R = {R1, R2, … }                                 (roles)

Conventions: Attributes (resp. primitive concepts and roles)
correspond to words in lower case or to positive integers
(resp. words in upper case and words in mixed case).

CS848: Topics in Databases: Information Integration
 For a particular database I

                                           hD, (¢)Ii

  where D is a countable possibly infinite domain, and where
  for each symbol
                                            (C)I µ D

                                          (A)I : D ! D

                                        (R)I µ (D £ D)



CS848: Topics in Databases: Information Integration
 Partial databases
                               e                      e2D

       e : {C1, … , Cn}                               e 2 (Ci)I
                       A
               e1             e2                      (A)I(e1) = e2
                       R
               e1             e2                      (e1, e2) 2 (R)I

               e1             e2                      e1 = e2

               e1             e2                      e1  e2
CS848: Topics in Databases: Information Integration
 Relational databases
 EMP
  name             age
                                                            name
                                                                   “John”
  “John”            33
  “Mary”            33                         e1 : {EMP}
                                                             age
                                                                   33

                                               e2 : {EMP}
                                                            name
                                                                   “Mary”



CS848: Topics in Databases: Information Integration
 Relational databases (cont’d)


                       name
                                   “John”
                                                      {e1, e2, “John”, 33, “Mary”} µ D
e1 : {EMP}                                            {e1, e2} µ (EMP)I
                          age                         (name)I(e1) = “John”
                                   33
                                                      (age)I(e1) = (age)I(e2) = 33
e2 : {EMP}                                            e1  e2
                       name                           “John” ? 33
                                   “Mary”



CS848: Topics in Databases: Information Integration
   Dialects of QL


                    Conjunctive QL               Positive QL          First order QL
                         with                        with                   with
(semantics)         bag semantics†              bag semantics‡        bag semantics


                    Conjunctive QL                Positive QL         First order QL


                                              (expressiveness)

                           †[Khizder    et al., 1999], ‡[Lui et al., 2002]

  CS848: Topics in Databases: Information Integration
 Conjunctive QL
           Q ::=       D as A                         (quantification)
               |       A1 = A2.R                      (unnest)
               |       A1.Pf1 = A2.Pf2                (selection)
               |       elim A1, … , An Q              (projection)
               |       true                           (null tuple)
               |       from Q1, Q2                    (natural join)
               |       (Q)
           D ::= THING | C                            (basic description)
           Pf ::= id | A.Pf                           (path function)

CS848: Topics in Databases: Information Integration
 Well formed queries: a(Q)

                     a(D as A)                   ´    {A}
                 a(A1 = A2.R)                    ´    {A1, A2}
             a(A1.Pf1 = A2.Pf2)                  ´    {A1, A2}
          a(elim A1, … , An Q)                   ´    {A1, … , An}
                        a(true)                  ´    ;
                a(from Q1, Q2)                   ´    a(Q1) [ a(Q2)

       Require {A1, … , An} µ a(Q) for projection operators.


CS848: Topics in Databases: Information Integration
 Tuples and bags

  A (duplicate) tuple t with attribute bindings for attributes {A1, … , An}
  over a database I = hD, (¢)Ii has the general form

                                hA1 : e1, … , An : en , cnt : ii,

  where {e1, … , en} µ D, “cnt” is a distinct symbol not used as an
  attribute, and i a positive integer.

  A set of duplicate tuples that contain the same attribute bindings is
  called a bag.




CS848: Topics in Databases: Information Integration
 Operations on tuples

                    a(t) ´ set of attributes occurring in t

                 t@cnt      ´ integer i such that “cnt : i” occurs in t

                   t@A ´ element e 2 D such that “A : e” occurs in t;
                          defined only when A 2 a(t)

    t[{A1, … , An}] ´ {A1 : t@A1, … , An : t@An};
                       defined only when {A1, … , An} µ a(t)

                       [t] ´ t[a(t)]



CS848: Topics in Databases: Information Integration
 Semantics
The meaning of a query Q, denoted «Q¬, is a function that maps databases
to bags. The behavior of this function on a particular database I = hD, (¢)Ii
is defined as follows.
    «THING as A¬(I) ´ {hA : e, cnt : 1i : e 2 D }
            «C as A¬(I) ´ {hA : e, cnt : 1i : e 2 (C)I}
        «A1 = A2.R¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (e2, e1) 2 RI}
 «A1.Pf1 = A2.Pf2¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (Pf1)I(e1) = (Pf2)I(e2)}


where
                     (id)I    ´ {(e, e) : e 2 D}
                 (A.Pf )I     ´ {(e1, e2) : (Pf )I((A)I(e1)) = e2}
CS848: Topics in Databases: Information Integration
  Semantics (cont’d)

«elim A1, … , An Q ¬(I) ´ ; , if not well formed; otherwise
                            {hA1 : t@A1, … , An : t@An , cnt : 1i : t 2 «Q¬(I)}

                   «true¬(I) ´ {hcnt : 1i}

        «from Q1, Q2¬(I) ´ {t : a(t) = a(Q1) [ a(Q2)
                                     Æ 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) :
                                               t@cnt = t1@cnt £ t2@cnt
                                               Æ t[a(t1)] = [t1]
                                               Æ t[a(t2)] = [t2]}




 CS848: Topics in Databases: Information Integration
 Syntactic sugar

                         A1. L .An.id             ´ A1. L .An
 select distinct A1, … , An Q                     ´ elim A1, … , An Q
                             select * Q           ´ Q
                         Q1 where Q2 ´ from Q1, Q2
                             Q1 and Q2 ´ from Q1, Q2
                                       from ´ true
              from Q1, Q2, … , Qn ´ from (from Q1, Q2, …) Qn

CS848: Topics in Databases: Information Integration
 Examples
The names of employees who have the same age as another employee with
a given name.

          select distinct :p, name
          from EMP as e, ( select distinct :p, e1
                              from EMP as e1, EMP as e2
                              where e1.age = e2.age and e2.name = :p )
          where e.name = name
          and e.id = e1.id




CS848: Topics in Databases: Information Integration
 Method calls (more syntactic sugar)

               A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.Pfn
                           select distinct A1, … , An
                   ´       from C as A
                           where A.1 = A1.Pf1 and … and A.n = An.Pfn


               A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) as An
                   ´       A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.id




CS848: Topics in Databases: Information Integration
 Examples (cont’d)
The names of employees who have an age double that of another employee.


               select distinct name
               from EMP as e, ( select distinct e1
                                  from EMP as e1, EMP as e2
                                  where e2.age.+(e2.age) = e1.age )
               where e.name = name
               and e.id = e1.id




CS848: Topics in Databases: Information Integration
 Conjunctive datalog (more syntactic sugar)


               C(A1, … , An)
                           select distinct A1, … , An
                   ´       from C as A
                           where A.1 = A1.id and … and A.n = An.id

               (A1, … , Am) :- Q1, … , Qn.
                   ´       select distinct A1, … , Am from Q1, … , Qn




CS848: Topics in Databases: Information Integration
 Positive QL

           Q ::= empty A1, … , An                     (empty set)
               | Q1 union Q2                          (union)


                         a(empty A1, …, An) ´ {A1, …, An}
                           a(Q1 union all Q2) ´ a(Q1)


            Require a(Q1) = a(Q2) in union operations.



CS848: Topics in Databases: Information Integration
 Semantics
«empty A1, … , An¬(I) ´ ;

      «Q1 union Q2¬(I)
                ´     {t : t@cnt = 1 Æ a(t) = a(Q1) Æ a(t) = a(Q2)
                            Æ(
           ( 9 t1 2 «Q1¬(I) : [t] = [t1] Æ :9 t2 2 «Q2¬(I) : [t] = [t2] )
         Ç ( 9 t2 2 «Q2¬(I) : [t] = [t2] Æ :9 t1 2 «Q1¬(I) : [t] = [t1] )
         Ç ( 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1] Æ [t] = [t2] ) )}




CS848: Topics in Databases: Information Integration
 First order QL

         Q ::= Q1 minus Q2                            (difference)


                 a(Q1 minus Q2) ´ a(Q1)

         Require a(Q1) = a(Q2) in difference operations.




CS848: Topics in Databases: Information Integration
 Semantics

          «Q1 minus Q2¬(I)
                   ´ {t : t@cnt = 1 Æ a(t) = a(Q1) Æ a(t) = a(Q2)
                           Æ ( 9t1 2 «Q1¬(I) : [t] = [t1] )
                           Æ ( :9t2 2 «Q2¬(I) : [t] = [t2] )}




CS848: Topics in Databases: Information Integration
 QL with duplicates


Q ::= select A1, … , An Q                       (duplicate preserving projection)
      |    Q1 union all Q2                      (bag union)
      |    Q1 minus all Q2                      (bag difference)




CS848: Topics in Databases: Information Integration
 Well formed queries (cont’d)

              a(select A1, … , An Q) ´ {A1, … , An}
                 a(Q1 union all Q2 ) ´ a(Q1)
                 a(Q1 minus all Q2) ´ a(Q1)

    Require a(Q1) = a(Q2) in bag union and bag difference
    operations, and that {A1, … , An} µ a(Q) in (duplicate
    preserving) projection operations.



CS848: Topics in Databases: Information Integration
 Semantics
«select A1, … , An Q ¬(I)
                  ´ ; , if not well formed and representable†; otherwise
                      {hA1 : t1@A1, … , An : t1@An , cnt : ni : t1 2 «Q¬(I)
                                    Æ n =                  (t2@cnt)                 }
                                                      t2 2 «Q¬(I) :
                                              t2[{A1, … , An}] = t1 [{A1, … , An}]



†The  selection operation is representable on database I iff, for every
t1 2 «Q¬(I),
                |{t2 2 «Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}]}|
is finite.

CS848: Topics in Databases: Information Integration
 Example

    A duplicate preserving projection operation that is not
    representable in any database with an infinite domain.

                 select e1
                 from THING as e1, THING as e2

    Observation: All well-formed duplicate preserving
    projection operations on databases with finite domains
    are representable.


CS848: Topics in Databases: Information Integration
 Semantics (cont’d)
 «Q1 union all Q2¬(I)
              ´ ; , if not well formed; otherwise
                      {t 2 «Q1¬(I) : :9 t2 2 «Q2¬(I) : [t] = [t2]}
                    [ {t 2 «Q2¬(I) : :9 t1 2 «Q1¬(I) : [t] = [t1]}
                    [ {t : 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1]} Æ [t] = [t2]
                                 Æ t@cnt = t1@cnt + t2@cnt}
 «Q1 minus all Q2¬(I)
              ´ ; , if not well formed; otherwise
                 {t 2 «Q1¬(I) : :9 t2 2 «Q2¬(I) : [t] = [t2]}
               [ {t : 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1]} Æ [t] = [t2]
                                Æ t@cnt = t1@cnt - t2@cnt
                                Æ t1@cnt > t2@cnt }
CS848: Topics in Databases: Information Integration
  Summary

                                                                         at, =, elim,
                                                    at, =, elim,
                           at, =, elim,                                  true, from,
(bag semantics)                                     true, from,
                           true, from,                                      select,
                                                       select,
                              select                                   empty, union all,
                                                  empty, union all
                                                                          minus all

                                                                         at, =, elim,
                                                        at, =, elim,
(set semantics)            at, =, elim,                                  true, from,
                                                        true, from,
                           true, from                                   empty, union,
                                                       empty, union
                                                                            minus

                         (conjunctive)                  (positive)       (first order)



 CS848: Topics in Databases: Information Integration
 Query contexts

 An expression Q[] in the language QL enriched by an additional terminal
 symbol [] is called a query context.

 For a query Q1 2 QL, the expression Q1[Q2] denotes the syntactical
 substitution of Q2 for [].

 Q2 is compatible with Q1 if Q1[Q2] 2 QL. For example, Q2 is compatible
 with Q1 in the following.
                             Q2: EMP as e where e.name = :p

          Q1: select distinct :p, d from DEPT as d, [] where d = e.dept



CS848: Topics in Databases: Information Integration
 The query equivalence problem


Q1 is equivalent to Q2 for database I, written I ² (Q1 ´ Q2 ), if «Q1¬(I) =
«Q2¬(I).

A query equivalence dependency E has the form (Q1 ´ Q2). E = (Q1 ´ Q2) is
an axiom if, for any database I, I ² (Q1 ´ Q2 ).

A query equivalence problem for a given set of query equivalence
dependencies is to determine if a given member of the set is an axiom.




CS848: Topics in Databases: Information Integration
 Some axioms

 Question: Is it true that any E with the following form is an axiom?
    (elim A1, … , Am Q1)[elim B1, … , Bn Q2] ´ elim A1, … , Am Q1 [Q2]


 Answer: No. However, any such E is an axiom if any attribute in
                                  (a(Q2) – {B1, … , Bn})
 does not occur in query context
                                 (elim A1, … , Am Q1 []).




CS848: Topics in Databases: Information Integration
 Excluding variable reuse in QL
Q has an occurrence of variable reuse if there is a query context Q1[] and
a query of the form
                                      elim A1, … , An Q2
or of the form
                                     select A1, … , An Q2
such that Q = Q1[Q2] and there exists A in (a(Q2) – {A1, … , An}) that
also occurs in Q1[].

Observation: For any Q1, there exists an equivalent class of query Q2
that has no occurrence of variable reuse.


CS848: Topics in Databases: Information Integration
 The query containment problem

Q1 is contained in Q2 for database I, written I ² (Q1 v Q2), if, for any tuple
t1 in «Q1¬(I), there exists t2 in «Q2¬(I) such that [t1] = [t2] and t1@cnt 
t2@cnt.

A query containment dependency C has the form (Q1 v Q2). C = (Q1 v Q2)
is an axiom if, for any database I, I ² (Q1 v Q2).

A query containment problem for a given set of query containment
dependencies is to determine if a given member of the set is an axiom.




CS848: Topics in Databases: Information Integration
 Equivalence and containment

   Observation: Equivalence reduces to containment.
                           Q1 ´ Q2 iff Q1 v Q2 and Q2 v Q1


    Observation: Containment reduces to equivalence in first order QL.
                   Q1 v Q2 iff (Q1 minus all Q2) ´ empty a(Q1)




CS848: Topics in Databases: Information Integration
 Some complexity results

 Theorem: The query equivalence and containment problems for
 conjunctive QL is NP-complete.†




 †Chandra, A. K. and P. M. Merlin. Optimal implementation of
 conjunctive queries in relational databases. Proc. Ninth Annual ACM
 Symposium on the Theory of Computing, pp. 77–90, 1977.




CS848: Topics in Databases: Information Integration
 A decision procedure
 Theorem: The following procedure decides if C = (Q1 v Q2) is an
 axiom for conjunctive QL.†

 1.    Freeze the body of Q1 by creating a partial database consisting of
       individuals that include its variables.
 2.    If the tuple
                        hA1 : A1 , … , An : An , cnt : 1i
       occurs in «Q2¬(I), where a(Q1) = {A1, … , An}, then return true;
       otherwise return false.‡

 †Derived from [Ullman, 1999].
 ‡Use forced semantics for selection operations.




CS848: Topics in Databases: Information Integration
 Obtaining a partial database from Q
    THING as A                        A                 A1 = A2.R

                                                                         R
                                                                    A2        A1
      C as A                  A : {C}


   A1.A2. L .Am = B1.B2. L .Bn

                                                       A2           … Am
                                                  A1

                                                       B2           …    Bn
                                                  B1

CS848: Topics in Databases: Information Integration
 Derived partial databases (cont’d)
                       A                                       A
           u : L1              v : L3                 u : L1       v : L3

                       A                                       A
           w : L2              x : L4                 w : L2       x : L4



                       A                                       A
           w:L                 u : L1                 w:L          u : L1

                       A                                       A
                               v : L2                              v : L2



CS848: Topics in Databases: Information Integration
 Deriving partial databases (cont’d)

 n 1 : L1             n 2 : L2                         n 1 : L1 [ L2              n 2 : L1 [ L2




  n 1 : L1             n 2 : L2             n 3 : L3

                                                  n 1 : L1             n 2 : L2        n 3 : L3




CS848: Topics in Databases: Information Integration
 Evaluating selections on partial databases

Note that selection conditions can navigate missing attribute values. In
such cases, assume a forced semantics. In particular, two nodes n1 and n2
satisfy a selection condition iff the condition has the form
                                     n1.Pf1.Pf = n2.Pf2.Pf
where (Pf1)I(n1) and (Pf2)I(n2) are defined and lead to nodes connected by
an equality arc.




CS848: Topics in Databases: Information Integration
 Some complexity results (cont’d)
 Theorem: The query equivalence problem for conjunctive QL with bag
 semantics is NP-complete.
 Observation: The complexity of the query containment problem for
 conjunctive QL with bag semantics remains open at this time.
 Example:† In conjunctive QL with bag semantics, the query containment
 dependency Q1 v Q2 is an axiom, where Q1 and Q2 have the respective
 definitions
        select x, z                                   select x, z
        from P as x, R as z                           from P as x, R as z
        where x = u.Q and z = v.Q                     where y = u.Q and y = v.Q

 † [Chaudhuri      and Vardi, 1993]

CS848: Topics in Databases: Information Integration
 The query membership problem

 A database schema, denoted T, consists of a finite set {C1, … , Cn} of
 query containment dependencies.

 C is an axiom relative to database schema T = {C1, … , Cn}, written
 T ² C, if, for any database I, I ² C if I ² Ci for each i.

 A query membership problem for a given set of query containment
 dependencies is to determine if a given member of the set is an axiom
 relative to a given database schema also consisting of members of the
 set.




CS848: Topics in Databases: Information Integration
 More complexity results

     Theorem: The query membership problem for conjunctive QL is
     undecidable.


     Theorem: The query membership problem for first order QL is
     equivalent to the query containment problem for first order QL.

     Proof: Assignment.




CS848: Topics in Databases: Information Integration
 Evaluating QL

                    Defining database schema
                    Expressing access plans
                    Fine grained APIs: record addresses
                    Protocols
                    Safety
                    Binding patterns
                    Adequacy: SQL, OQL, XQuery


CS848: Topics in Databases: Information Integration
  Modeling generalization taxonomies
Consider a simple object-oriented schema language consisting of sentences
of the following form.†
            class C {A1 : ref C1, … , Am : ref Cm} [isa C1 , … , Cn];
Assignment: Encode a fixed collection of such sentences as a database
schema in conjunctive QL. Your encoding should be as compact as possible
and should enable the following questions to be expressed as query
containment dependencies over your schema.
1. Is C a defined class?
2. Is attribute A defined on class C?
3. Can an object reside in both class C1 and class C2?
†Assume that any object in a database was created with respect to a single

class.

 CS848: Topics in Databases: Information Integration
    Modeling pipelined query access plans

                                (syntax)                                   (defn of b(¢))
(parameter)          Q ::=         (PARAM as A)                            {A}
(index scan)              |        (from C as A, A.1 = B1, … , A.n = Bn)   {A}
(nested loops)            |        (from Q1, Q2)                           b(Q1) [ b(Q2)
(noop)                    |        (select A1, … , An Q)                   b(Q) Å {A1, … , An}
(record field access)    |         (A1 = A2.B)                             {A1}
(comparison)              |        (A1 = A2)                               ;
(catenation)              |        (Q1 union all Q2)                       b(Q1) Å b(Q2)
(cut)                     |        (elim A1, … , An Q)                     ;
                            |      …

Require:
1. (a(Q2) – b(Q2)) µ a(Q1) for nested loops, and
2. a(Q) = b(Q) for top-level queries.

   CS848: Topics in Databases: Information Integration
 Alternative semantics

                       Require richer models theories for
                       1. sort operations, and
                       2. named cuts.




CS848: Topics in Databases: Information Integration

								
To top