A Principled Approach to Data Integration and Reconciliation in

Document Sample
A Principled Approach to Data Integration and Reconciliation in Powered By Docstoc
					A Principled Approach to Data Integration
  and Reconciliation in Data Warehousing

                               Diego Calvanese
                          Giuseppe De Giacomo
                              Maurizio Lenzerini
                                  Daniele Nardi
                                Riccardo Rosati

                       Presented by Alan Wessman
                                   Introduction
   Problem: Acquire data from a set of
    sources for a particular application
       Typical architecture: wrappers and mediators
       Core problem: specify and implement
        mediators
   Paper focus: Data warehouses
            Data Warehouse Integration
   Most sources internal to organization
   Need global corporate view of data
   Conceptual model defines sources and
    data warehouse (local-as-view)
   Three levels of architecture
       Conceptual: Global model
       Logical: Query specifications for sources and
        warehouse
       Physical: Wrappers and mediators
        implementing query specifications
                              Architecture

           Conceptual Model

  q1,           q3,           q6,
  q2            q4,           q7
                q5




Source 1   Source 2     Data Warehouse
              Specifying Logical Schemas
   For each table of source S, create an
    adorned query
       Head: Table name, # columns
       Body: Content of table (query over conceptual
        model)
       Adornment:
         • Domains (data types) of columns
         • Key attributes
                           Adorned Query: Example
Conceptual Model              Source 1                                Source 2

          Menu                      Halibut              Swordfish             SushiMenu
Date   Item        Price     Date       Price     Date       Price   TunaPrice SquidPrice Date




                    Euro                        Lira                        Yen


Halibut(Date, Price) <- Menu(Date, „Halibut‟, Price) | Price :: Lira, Date :: JulianDate
Swordfish(Date, Price) <- Menu(Date, „Swordfish‟, Price) | Price :: Lira, Date ::
JulianDate
SushiMenu(TunaPrice, SquidPrice, Date) <- Menu(Date, „Tuna‟, TunaPrice), Menu(Date,
„Squid‟, SquidPrice) | TunaPrice :: Yen, SquidPrice :: Yen, Date :: JulianDate
                   Query Consistency
Let Q be an adorned query and B its body.
Let M be the conceptual model.
 B is inconsistent wrt M if for every
  interpretation of M, evaluation of B is
  empty
 Q is inconsistent wrt M if either B is
  inconsistent or the annotations are
  inconsistent
 Inference techniques exist for checking
  query consistency
    Interschema Correspondences
 Specify how data in different
  schemas relates
 Non-materialized relational tables
  (computed on-demand)
 Like adorned query but annotations
  identify helper programs
 Reusable by other correspondences
         Interschema Correspondences
Three types of correspondence
 Conversion
       How data from one source is converted into
        data fitting a different schema
   Matching
       How data from different sources matches
   Reconciliation
       How data from different sources is reconciled
        to become data in the warehouse
              Conversion Correspondence
How data from one source is converted into data fitting a different schema


convert([x], [y]) <- conj(x, y, z)
through program(x, y, z)

   conj: Conjunctive query, specifies when conversion applies
   program: Program that performs the conversion
   x: Input tuple of values satisfying conditions for x in conj
   y: Output tuple of values satisfying conditions for y in conj
   z: Additional parameters required by program
                Matching Correspondence
How data from different sources matches


match([x1], …, [xk]) <- conj(x1, …, xk, z)
through program(x1, …, xk, z)

Differs from Conversion Correspondence in use of k tuples
   that may be matched
program returns true if the k tuples match
       Reconciliation Correspondence
How data from different sources is reconciled to the warehouse


reconcile([x1], …, [xk], [z]) <- conj(x1, …,
  xk, z, w)
through program(x1, …, xk, z, w)

z: Data warehouse tuple; result of reconciliation.
w: Additional parameters (like z in previous slides)
               Reusing Correspondences
   Only reuse if previously defined
   Example 1
       match([x], [y]) <- convert1([x], [z]),
            convert2([y], [z]), conj(x, y, z, w)
             through none

   Example 2
       reconcile([x], [y], [z]) <- convert1([x], [w1]),
             convert2([y], [w2]), match1([w1], [w2]),
             convert3([w1], [z]), conj(x, y, z, w)
             through none
                           Specifying Mediators
Aim: Specify for each relation in warehouse how the tuples should
    be constructed from the sources

Task: Materialize a new relation T in the warehouse
Steps:
1.  Specify T as an adorned query
    q <- q’ | c1, …, cn
2.  Look for a rewriting of q in terms of queries q1, …, qs
    corresponding to materialized views in the warehouse
3.  Look for a rewriting of (what remains of q) in terms of queries
    corresponding to tables in the sources and the conversion,
    matching, and reconciliation correspondences
Resulting query is specification for the mediator for T
               Computing the Rewriting
   Rewriting typically needs to merge results
    of several queries
   Produce set of merging clauses
    Form:
    merging tuple-spec1 and … and tuple-specn
    such that matching-condition
    into tuple-spect1 and … and tuple-spectm
   Generates template; designer specifies
    “such that” and “into” parts, or writes
    custom merging clauses
                               Conclusion
   Start with conceptual model and several
    types of correspondences
   Query rewriting algorithm generates
    mediator specifications
   Designer fills in any remaining details
   No empirical results

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:8/5/2011
language:English
pages:16
pptfiles pptfiles
About