Learning Center
Plans & pricing Sign in
Sign Out



									                      Local as View:
                     Some refinements

        IM: Filtering irrelevant sources
       Views with restricted access patterns
        A summary of IM

2005                          lav-ii            1
               IM: Filtering irrelevant sources

When there are many sources, it is important to weed out those
 that are irrelevant to a query
Comparison constraints can help (e.g., qu >= w98)
What more can be done?

The IM system suggests to introduce
                  classes with a class hierarchy
into source descriptions

2005                            lav-ii                    2
Example :                                car

    carForSale   usedCar   newCar    AmericanCar         EurpoeanCar   JapaneseCar

                                 GermanCar              ItalianCar     FrenchCar

                       -- disjoint classes
Additionally, the global schema contains a relation
details(car, year, mileage, price, sellerContact)
[        c,       y,       mi,      p,            s ]
(we will also abbreviate class names)

2005                                     lav-ii                             3
The views:
v1(c, y, mi, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), ,y >= 1990
 v2(c, y, p, s) :- details(c,y,mi,p,s) , cFSale(c), EurCar(c)
 v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p>= $25000 // luxury cars
 v4(c, y, p):- details(c,y,mi,p,s), cFSale(c), uCar(c), y<= 1980 //vintage cars

 v5(c, y, p, s) :- details(c, mc, y, p, s), cFSale(c), nCar(c), c=Toyota

Assume a query:
Q: q(c, mc, y, p, s) :- details(c, y, mi, p, s) , cFSale(c), Jcar(c),
                                                        y>= 1992 , p<= $12000

Some candidate rewritings will be rejected, since they are
  inconsistent with Q

 2005                                     lav-ii                             4
When a view is considered for consistency with Q,
•      v4 will be discarded – y<=1980, y>=1992 is inconsistent
•      v3 will be discarded – p>=$25000, p<=$12000 is inconsistent
•      v2 will be discarded – EurCar(c), JCar(c) is inconsistent
•      v5 – depends on what is known about the relationship between
       Toyota and the various car classes

Reasoning about disjoint-ness of classes (given a hierarchy as
   above) is easy and efficient

2005                              lav-ii                      5
The true story (a side trip):
IM uses a (PTIME) Description Logic for source description
A DL is a formalism that describes
  classes & binary relationships               intentionally.
For example, a class can be given by a name (e.g. JCar) or by an
  expression that describes its properties:
  cheapJCar :- uCar and JCar and price < $9000
A DL also contains containment and disjoint-ness axioms for
  class expressions (containment is called subsumption in DL jargon)

To be useful, a DL needs to support containment and disjoint-
  ness queries on classes and membership queries on individuals
  – this is an inference problem

2005                              lav-ii                        6
 Many DL’s are known
Complexity (for subsumption) ranges from polynomial (rare), to
  NP-complete, to exptime-complete, to undecidable

Recent interest focuses on using DL’s for the Semantic Web
The W3C OWL standard is essentially a DL
(this use is essentially the same as in IM)

                              That is it on DL’s

2005                                   lav-ii              7
           Views with restricted access patterns

       Many sources do not support full SQL:
       • They are legacy systems, e.g.
          – finger on UNIX accepts email, returns other attributes
          – A bibliography source requires author, or title, or but does not
            accept a year as input
       • They do not want to disclose all their data, e.g.,
          – a carSale source will not present all the cars it has for sale
          – An airline requires from and destination as input for flight info
       The questions:
       • How do we describe such sources?
       • What are good rewritings and do we find them?

2005                                    lav-ii                            8
Restricted sources can be described by binding patterns
Two equivalent styles : (there are more sophisticated schemes)
Example: assume global relations
       email(F, L, E), office(F, L, O), phone(O, P)
            (F-first, L-last, E-email, O-office, P-phone)
The views are        finger, userId, described as follows:
• Adding $ to attributes that can be given as input
        finger(F, L, $E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P)
        userId($O, E) :- office(F, L, O), email(F, L, E)

• Using b, f strings on predicates, where b means bound (i.e., in)
        fingerffbff(F, L, E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P)
        userIdbf(O, E) :- office(F, L, O), email(F, L, E)

2005                                        lav-ii                            9
Example, cont’d :
Q: qbf(O, F) :- office(F, L, O) (or q($O, F) :- office(F, L, O) )
• Cannot be answered by using finger – it requires E as input
• Cannot be answered by using userId – it does not return F

The following is a good rewriting:
   q’(O, F):- userId(O, E), finger(F, L, E, O, P)         jump

For two reasons:
• It is executable with respect to the sources: executing the body
  left-to-right respects the access restrictions
  O for userId –from the query, E for finger – from userId
• Its expansion is contained in the query (check!)

2005                                   lav-ii                10
These two reasons are a characterization of a good rewriting:

• It is executable with respect to the sources: executing the body
  left-to-right respects the access restrictions
• Its expansion is contained in the query (check!)

• If it is not a contained rewriting, then being executable is no
• Being contained but not executable is also no good

2005                            lav-ii                      11
The IM approach:
After a rewriting is found to be consistent and contained, it is
  checked for being executable – can the sub-goals in the body
  be ordered so that the input required for each is supplied from
  the query or the sub-goals to its left

2005                           lav-ii                      12
                       A summary of IM

       • Introduced (with other concurrent systems) the notion of
         LAV and query rewriting using views
       • Also, detailed source descriptions using DL’s
       • An efficient algorithm for finding contained and
         executable rewritings
       • Worked well, for about 100 sources

2005                              lav-ii                     13
Here is a graph from the paper

2005                             lav-ii   14
       But :
       • The fact that a contained rewriting needs a number of
         views at most the number of atoms in the query has been
         proved only for CQ’s , without
          • comparisons,
          • access restrictions
          • constraints on the global db
       Does it hold for these cases? (see example in p. 10)
       For access restricted sources, it has been proved that for
         equivalent rewritings one needs at most n+m views, where
         n is the number of atoms in the query, m is the number of
         different variables in it
       The proof does not hold for contained rewritings

2005                                   lav-ii                 15
• Even for “pure” CQ’s, is the bucket algorithm guaranteed to
  find all rewritings?
The answers to all these questions are negative!
• The bucket algorithm does not find all rewritings
• For the more general cases, longer rewritings are needed;
  actually, there may be an infinite number of them, with no
  bound on length

There is a need for another approach

2005                            lav-ii                     16

To top