; Architecture
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Architecture

VIEWS: 68 PAGES: 73

  • pg 1
									  From Enterprise Information Integration to
  Community-Based Mediation

                      A presentation by

                 Yannis Papakonstantinou
                      on joint works with

                      Alin Deutsch,
                      Yannis Katsis,
                   Michalis Petropoulos


CSE Department
  Data Integration Requirements & Desiderata
                  (high level)


• Provide application with integrated database
   – single point of (query/update) access to the
     data
• Provide distribution and heterogeneity
  transparency
   – heterogenous formats, heterogenous
     interfaces, different rates of change (static
     versus dynamic), autonomous sources
• Decouple application logic from integration
• Easily add/change sources
• Customize the delivery of content
Most-Generic Integration System Architecture


     Client         Client              Client
   Application    Application         Application




       Integration Software



                                ...
Information   Information               Information
  Source        Source                     Source
SIGMOD Community’s Architecture for
  Unified Access to Data & Services



  Integrated (XML) Global View / Ontology + Services

                                                        Cache &
                    Mediator
                                                       Replication

       Local Common          Local Common
      Model (XML) View      Model (XML) View
         + Services            + Services

          Wrapper               Wrapper

         Information           Information
          + Service             + Service
           Source                Source
           Approaches towards View-Based
                  Data Integration
            Integration
           Specification
              Method


                                            Info Model &
GLAV=GAV+LAV                               Query Language

 Local As View
    (LAV)
                                       XML (XQuery)
Global As View
    (GAV)                         Object-Oriented

                              Relational (SQL)


                                                                   Storage
                                                                   Method
                           Warehousing            On-Demand
                           (materialized         (virtual views)
                              views)
          Enterprise Information Integration
                   Reaches Maturity
Enterprise


     • Materialized View (Warehousing) approach well-adopted since
       mid/late 90s
        – GAV function role played by Extract-Transform-Load tools
        – Human Intervention Occasionally Needed in Cleaning Up
             • Concordance tables for Object Identification
     • Virtual View (Mediation) approach at early adoption
        – many years of research
            • Distributed db’s, federated db’s, mediators
        – moving well into mainstream
            • BEA AquaLogic (XML, Virtual, GAV view)
            • IBM DB2
               Current Enterprise Information
                 Integration Deployments
Enterprise

                •   Small Domain
                •   Mostly Vertical Partition of Sources
                •   Primarily Application-Driven View or Identity View
                •   Integration Administrator/Developer in charge


                             Integrated Global View V(M, S, E)

 Integration
 Admin                                 GAV View
                    View Builder                       Mediator Query
                    (design time)         V          Processor (run time)
               Schemas                                               Data

                    Local View M      Local View S       Local View E


                      Marketing           Sales             Service
        Opportunities and Needs Presented by
              “Motivated” Communities
Communities

      • Emerging Myriads of Internet Communities of
         – Myriads of sources and clients
         – Source owners motivated to participate
      • EII does not address needs
         – Expensive
         – Bottleneck of Single Integration Admin
      • Make building corresponding portals similar to starting
        and participating in newsgroups
      • Appropriate tools needed to enable source owner and
        client participation
              A Community-Based Information
                  Modeling Architecture

           Client Application 1       Client Application m

          Application View V1a (G)      Application View Vma (G)


                    GAV: V1a               GAV: Vma

Integrated View          Integrated XML View G
Owner’s Domain
                                                      Data Services


                   GLAV: V1               GLAV: Vn
Source                                                        Source
Owner’s      Local XML View S1       Local XML View Sn       Owner’s
Domain                                                       Domain
       Data          Information     Information        Data
     Services 1        Source 1        Source n      Services n
                 Visual Tools Matter!
        (example from the Enosys Query Builder)
     C:\Enosys\projects\allPONS.qpr* - Enosys Query Builder
                                                                                       TARGET SCHEMA
                                                                                         (XML VIEW)




1
    OPEN & VIEW SOURCE                             2
      SCHEMAS IN XML
                                                   DRAG & DROP TO CREATE TARGET XML VIEW




                                                                                     AUTOMATICALLY
                                                                                     GENERATED MAPS
C:\Enosys\projects\allPONS.qpr* - Enosys Query Builder
              3
                  RUN & TEST XQUERY

                                                                           XML RESULT
                                                         XQUERY BASED ON
                                                           DESIGN SPECS
                     Architecture for Large-Scale
              Data Integration System and Design Tools

               Web                                                 How can the user query and
               Domain        Web Forms
                             & Reports                             Browse the integrated data?

                                                                  QURSED
                                                                   What queries can my app issue?
Developer      Application
               Domain           
                             Application
                                               
                                            Application
                                                                   What integrated view services
                                                                   can I build?
                                                                   CLIDE


              Integration
               Domain                      Global View        Web
Integration
                                Mediator                                      Cache
 Engineer                                    Schema         Services
                                                                            (Metadata)

               Source
                                                                   How do I export my database
               Domain
 
 Source
                          Web
                         Service
                                       Web     Web
                                      Service Service   …
                                                                   services functionality?
                                                                   RIDE-Services
 Owner
                                                                   How do I export my data?
               Source                         Source
                         Data         Data                …        RIDE
               Schema                         Schema
                        Source       Source
       Dual Interactive Registration Problems

 Register Source Given Global            Register Client Given Sources
Schema, Constraints &Queries
                                         New App New Query          Guide the
Apps          Queries

                                                         ??
                                                                     client in

              ??                                                   query/form
                                                                     writing




Global View                                Global View


                        Guide the
                        source owner
                        in registering
                        a new source
                        and services


       New Source and Services                   Source Services
                   Source Data Registration

              Server Side      • How do my source
                                 attributes map to global
Apps          Queries
                                 attributes
              ??                  – mappers & automatic
                                    matchers
                               • How do my data relate to
                                 queries & other sources
Global View                       – Inconsistencies?
                                  – What takes to
                                    contribute to queries?

                                 – How much should I
                                    clean up?
                               • Multiple ways of dealing
                                 with redundancy
          New Source
                   How to achieve this Goal

               Before                                  Now

Apps          Queries                  Apps          Queries

              ??            
                        Look at all                  ??           
                                                                Follow the
                        sources &
                                                               suggestions
                         queries                                   of the
                                                                interface
                        Decide how
Global View                            Global View
                         to register
                        your source


                                                     
                                                           Source
                                                           Registration
                                                           Tool
                                                                          
          New Source                             New Source
Our Goal in Source Registration




    Guide the source owner visually through
    the registration of the source


    so as to avoid/warn about
    (potential) inconsistencies
    and contribute information to
    the answer of the queries
    while exposing the minimum information
    possible and/or minimizing effort
                               The Problem



              Client Queries

                  ?
              ?       ?
  Mediator
(Global DB)




  Sources (Actual Local DBs)

                                             17
                  The Contribution Problem



              Client Queries   • What is the contribution of source S

                  ?              to the result of the query Q?

              ?   Q   ?
  Mediator
(Global DB)




                          S

  Sources (Actual Local DBs)

                                                                        18
                               The Problem



              Client Queries      • What is the contribution of source S

                 ?                  to the result of the query Q?

                 Q
                                   Q: cars

                                         S is Self Sufficient w.r.t. Q
  Mediator
(Global DB)
                                   Q: cars JOIN reviews
              reviews cars               S is Now Complementary w.r.t. Q

                          S

  Sources (Actual Local DBs)

                                                                           19
Relational Schemas: Local and Global


       • Relational Schemas

       • Visual Representation                        relations

?
          S1                S2             G
            make              auto             car
             Carmake           Id               Model
             Origin            Model            Carmake
             Sales             Carmake          Doors
                              detail            Baseprice
                               Id              brand
                               Engine           Carmake
                               Baseprice        Origin
                                                     attributes
             Source 1         Source 2          Global
        Business Magazine   Car Magazine       Car Portal


                                                            20
Source Registration using GLAV Mappings



         • Source Registration:
           Correspondence between       a source schema and

 ?
                                       the global schema
                               =
               Set of Mapping Constraints of the form

                             (U  V)

                   CQ=                    CQ=
                   over source         over global
                   schema              schema


         • Open World

         • Global and Local As View (GLAV)

                                                              21
       Target Constraints



    • Constraints on the global schema
                            =

?
              Set of Constraints of the form

                         (U  V)

               CQ=                    CQ=
               over global         over global
               schema              schema


    • Also Expresses Dependencies (PKs, Ref Integrity, …)




                                                       22
Visual Representation of Mappings (1)


       • Visual Representation (IBM Clio)

           S1                         G
?            make
              Carmake
                         make
                          C
                                          car
                                           Model
              Origin      O                Carmake
              Sales       S                Doors
                                           Baseprice
                                          brand        brand
                                           Carmake       C
                                           Origin        O




           Business Magazine: Provides Carmake and Origin


                     U1(C, O) :- make(C, O, S)
        (U1  V1)
                     V1(C, O) :- brand(C, O)

                                                               23
Visual Representation of Mappings (2)


       • Visual Representation (IBM Clio)

           S2                         G
?            auto
              Id
                          auto
                            I
                                          car
                                           Model
                                                       car
                                                        M
              Model        M               Carmake      C
              Carmake      C               Doors        ?
             detail                        Baseprice    B
                          detail
              Id                          brand
                            I
              Engine        E
                                           Carmake
              Baseprice     B              Origin




           Car Magazine: Provides Model, Carmake and Baseprice




                                                                 24
        Query Semantics


    • Queries in UCQ=

    • Set of Possible Global Instances
?       Set of global instances that satisfy all constraints

    • Query Answers = Set of Certain Answers
        The tuples appearing in the answer to Q for any
        possible global instance



                            Q                 
                                 Answer to Q
                                                  Certain
          Possible global       for any of the
                                                  Answers
            instances           possible global
                                                    to Q
                                  instances
                                                               26
               Source Instance’s Contribution



• For given instances of the sources

 Contribution to Q of Source Instance
                                        Answer to
                                           Q        -   Answer to
                                                           Q




                  =

    The tuples in answer of Q not
    provided by the other sources




                                                               27
         Source Registration’s Contribution


• Source Registration: Source Mappings

• Degrees of Source Registration’s Contribution

          Self Sufficient                   More contribution

          Now Complementary

          Later Complementary

          Unusable                          Less contribution




                                                                 28
          Self Sufficient Registration: Example


Example
                                                      car
                                                    Model
                 Baseprices of Models              Carmake
                                                    Doors

     ?                                             Baseprice


                                  G
                                      car             car
                                       Model          M3
                                       Carmake       BMW
                                       Doors           ?
                                       Baseprice     45K
                                      brand
                                       Carmake
                                       Origin



                                      Green Registration is
                                        Self Sufficient
                                                               29
       Self Sufficient Registration: Definition


 Self Sufficient

      Source instance        Answer to
                                 Q        -   Answer to
                                                 Q
                                                          


           s.t.

The source has a non empty
contribution in the absence
    of the other sources
                              XX              XX

                                                           30
  Now Complementary Registration: Example


Example
                                              car
                                            Model        brand
           Baseprices of Models
                                           Carmake      Carmake
           by German manufacturers          Doors       Origin =

     ?                                     Baseprice   ‘Germany’



                          G
                              car             car
                               Model          M3
                               Carmake       BMW
                               Doors           ?
                               Baseprice     45K
                              brand
                                            brand
                               Carmake
                                             BMW
                               Origin      Germany



                            Green Registration is
                           Now Complementary
                                                              31
  Now Complementary Registration: Definition


 Now Complementary

    Not Self Sufficient

            &
                              Answer to
                                 Q        -   Answer to
                                                 Q
                                                          


     Source instances



           s.t.

The source has a non empty
contribution in combination
  with the other existing
          sources

                                                           32
 Later Complementary Registration: Example


Example
                                              car
                                            Model        brand
           Baseprices of Models
                                           Carmake      Carmake
           by German manufacturers          Doors       Origin =

     ?                                     Baseprice   ‘Germany’



                          G
                              car             car
                               Model          M3
                               Carmake       BMW
                               Doors           ?
                               Baseprice     45K
                              brand
                                            brand
                               Carmake
                                             BMW
                               Origin      Germany



                               Green Registration is
                              Later Complementary
                                                              33
  Later Complementary Registration: Definition


 Later Complementary


                                          -
   Not Self Sufficient &
                              Answer to       Answer to
 Not Now Complementary                                    
                                 Q               Q
             &
  Potential future sources



   & Source instances


            s.t.
The source has a non empty
contribution in combination
  with the future sources
                                                           34
          Unusable Registration: Example


Example
                                                            brand
              Origin of Carmakes                           Carmake
                                                            Origin

     ?
                              G
                                   car
                                    Model
                                    Carmake
                                    Doors
                                    Baseprice
                                   brand
                                    Carmake
                                    Origin



                                   Green Registration is
                                       Unusable
                                                                 35
          Unusable Registration: Definition


 Unusable

   Not Self Sufficient &
Not Now Complementary &
Not Later Complementary
                             Answer to
                                Q        -   Answer to
                                                Q
                                                         =   


           

 The source has a empty
contribution regardless of
 what sources enter the
         system




                                                             36
          Subtleties for Unusable Registrations


Example
                                                     car
                 Baseprices and Doors              Model
                                                  Carmake
                 of Models                         Doors

     ?                                            Baseprice


                                 G
                                     car             car       car
                                      Model          M3        M3
                                      Carmake       BMW       BMW
                                      Doors           ?         2
                                      Baseprice     45K         ?
                                     brand
                                      Carmake
                                      Origin



                                     Green Registration is
                                         Unusable
                                                                     37
In presence of PK Unusable Example becomes
            Later Complementary

Example
                                              car
          Baseprices and Doors              Model
                                           Carmake
          of Models                         Doors

     ?                                     Baseprice


                          G
                              car             car    car    car
                               Model          M3     M3     M3




                                       
                               Carmake       BMW    BMW    BMW
                               Doors           ?      2      2
                               Baseprice     45K    45K      ?
                              brand
                               Carmake
                               Origin



                               Green Registration is
                              Later Complementary
                                                                  38
                       Decidability Results


Overview: What is decidable


                                     Target constraints
                        None   Primary keys   Primary keys + Referential
                                                 Integrity Constraints

     Self Sufficient
                         Yes       Yes                    No
 D
 e
     Now
 g                       Yes       Yes                    No
     complementary
 r
 e   Later
 e   complementary       Yes       Yes                    ?
 s
     Unusable
                         Yes       Yes                    ?




                                                                           39
                                    Issues


Unique client query   Vs   Multiple client queries                             
                           Contribute to: -   all queries?
                                          -   one query?
                                          -   specific queries?
                                          -   some queries based on some ranking?



Data independence     Vs   Data dependence                                     
                           e.g. M1: cars, refPrices
                                M2: reviews
                                Q: cars JOIN reviews JOIN refPrices

                                DB1: cars, refPrices (Audis)
                                DB2: reviews (Hondas)

                                (M2, Q) now-complementary
                                but Certain Answers for Instances DB1, DB2 =   
                        Putting it all together


Architecture                                        Architecture

Query Answering / Mappings / Schemas
                                         Query                    ?   Q
Contribution
4 categories:
Self Sufficient / Now Complementary /    Global
Later Complementary / Unusable           Schema                   S’


Goal
                                         Mappings        M1   …   Mn       Mn+1
Guide the source owner visually
through the registration of the source
                                         Local
so as to raise contribution to the       Schemas    S1        …       Sn     Sn+1
answer of the queries

while exposing the minimum info                          Registered         New
possible and/or minimizing effort                         sources          source
                             Example 1

         Without primary keys in the target

Local Schemas     Global Schema                            Query

  AutoTrader        Community                            AppQuery
   car *             car *                                car *
    id                 model                                model
    cmodel             drive                                drive
   ad *              review *                             review *
    vin                model                                model
    carId              quality                              quality
    price
                     usedAd *                             usedAd *
                       vin                                  vin
                       model                                model
                       price                                price
                     refPrice *                           refPrice *
                       model                                model
                       condition                            condition
                                                            price
                       price



   Unusable       BLUE: Map at least one of the groups
                             Example 1

         Without primary keys in the target

Local Schemas     Global Schema                 Query

  AutoTrader        Community                 AppQuery
   car *             car *                     car *
    id                 model                     model
    cmodel             drive                     drive
   ad *              review *                  review *
    vin                model                     model
    carId              quality                   quality
    price
                     usedAd *                  usedAd *
                       vin                       vin
                       model       = cmo         model
                       price                     price
                     refPrice *                refPrice *
                       model                     model
                       condition                 condition
                                                 price
                       price



   Unusable
                             Example 1

         Without primary keys in the target

Local Schemas     Global Schema                 Query

  AutoTrader        Community                 AppQuery
   car *             car *                     car *
    id                 model                     model
    cmodel             drive                     drive
   ad *              review *                  review *
    vin                model                     model
    carId              quality                   quality
    price
                     usedAd *                  usedAd *
                       vin                       vin
                       model       = cmo         model
                       price       = price       price
                     refPrice *                refPrice *
                       model                     model
                       condition                 condition
                                                 price
                       price


    Later
Complementary
                             Example 2

         With primary keys in the target

Local Schemas     Global Schema              Query

  AutoTrader        Community              AppQuery
   car *             car *                  car *
    id                 model                  model
    cmodel             drive                  drive
   ad *              review *               review *
    vin                model                  model
    carId              quality                quality
    price
                     usedAd *               usedAd *
                       vin                    vin
                       model                  model
                       price                  price
                     refPrice *             refPrice *
                       model                  model
                       condition              condition
                                              price
                       price



   Unusable
                             Example 2

         With primary keys in the target

Local Schemas     Global Schema                Query

  AutoTrader        Community                AppQuery
   car *             car *                    car *
    id                 model                    model
    cmodel             drive                    drive
   ad *              review *                 review *
    vin                model
    carId                                       model
                       quality                  quality
    price
                     usedAd *                 usedAd *
                       vin         = vin        vin
                       model                    model
                       price       = price      price
                     refPrice *               refPrice *
                       model                    model
                       condition                condition
                                                price
                       price


    Later
Complementary
                           Lessons learned

Target constraints make a difference

   To merge data with that of other sources (become complementary):

        Pick a relation and provide…

              In absence of primary keys
              …all its attributes asked by the query

              In presence of primary keys
              …its primary key and one of its attributes asked by the query


    The number of choices increases in presence of primary keys


   Foreign keys on the target affect the suggestions
              Large-Scale Data Integration Systems

               Web                                            How can the user query and
               Domain        Web Forms
                             & Reports                        Browse the integrated data?

                                                             QURSED
Developer      Application
               Domain           
                             Application
                                               
                                            Application
                                                              What queries can the
                                                              mediator answer for me?
                                                              CLIDE


 
Integration
               Integration
               Domain                      Global View
                                Mediator     Schema
 Engineer




               Source
                                                              How do I export my database
               Domain
 
 Source
                          Web
                         Service
                                       Web     Web
                                      Service Service   …
                                                              services functionality?
                                                              RIDE-Services
 Owner
                                                              How do I export my data?
               Source                         Source
                         Data         Data                …   RIDE
               Schema                         Schema
                        Source       Source
                           Running Example

                         Parameterized Views

                 Dell                                       Cisco
Schema                                      Schema
   Computers(cid, cpu, ram, price)
   NetCards(cid, rate, standard,               Routers(rate, standard, price, type)
   interface)
                                            Views
Views                             Computers
                                              cpu
                                  for a given V3 RouByTypeW()  (Router)*
V1 ComByCpu(cpu)  (Computer)*
   SELECT DISTINCT Com1.*
                                                 SELECT DISTINCT Rou1.*   Wired
   FROM Computers Com1
                                                 FROM Routers Rou1        Routers
   WHERE Com1.cpu=cpu             Computers & NetCards
                                                 WHERE Rou1.type='Wired'
                                  for a given cpu & rate
V2 ComNetByCpuRate(cpu, rate) 
                                              V4 RouByTypeWL()  (Router)*
                 (Computer,
   NetCard)*                                                                Wireless
   SELECT DISTINCT Com1.*, Net1.*                SELECT DISTINCT Rou1.*     Routers
   FROM Computers Com1, Network Net1             FROM Routers Rou1
   WHERE Com1.cid=Net1.cid                       WHERE Rou1.type='Wireless'
   AND Com1.cpu=cpu
   AND Net1.rate=rate
                                  Running Example

                                   Global Schema



                        
                    Developer
Application



                    Global
                                     • Global schema puts together
         Mediator
                    Schema             the Dell and Cisco schemas
                                     • Resembles the schema of
                                       CNET.com portal
 V1          V2     V3       V4


                                     Column Associations
      Dell           Cisco
                                     • (Computers.cid, NetCards.cid)
                                     • (NetCards.rate, Routers.rate)
Sophisticated Mediators Make Feasibility Hard to Predict


 Feasible Queries FQ
 • Equivalent CQ query rewritings using the views
 • Might involve more than one views
 • Order might matter
 Query: Feasible                                                         Query:Infeasible
 Get all ‘P4’ Computers, together with their                             Get all Computers
   NetCards
 and their compatible ‘Wireless’ Routers
                  Computers.*     NetCards.*      Routers.*        E
                A12             A12    .11 US    .11     Wirele
                     P4 512 400     10        10     50
                3               3      b B       b       ss
                B12     102     B12    .11 US    .11     Wirele
                3
                     P4
                 Routers.* 550 3
                        4
                                    54
                                       g B
                                              54 Computers.*
                                                 g
                                                     120
                                                         ss         NetCards.*
         B      .11     Wirele                    A12              A12    .11 US   D
             10      50                               P4 512 400       10
                 b        ss                      3                3      b B
                .11     Wirele     Mediator       B12    102       B12    .11 US
             54     120                               P4     550       54
                 g        ss                      3      4         3      g B
                    A                               C
               RouByTypeW               ComNetByCpuRate(‘P4’,
                  L()                   ComNetByCpuRate(‘P4’,
                                                        ‘10’)
                                                        ‘54’)
          V4                                                        V2
                    Problem


1. Large number of sources
2. Large number of views
3. Mediator capabilities

Developer formulates an application query
 Is an application query feasible?
 If not, how do I know which ones are feasible?

Previous options:
   – The developer had to browse the view
     definitions and somehow formulate a feasible
     query
   – Or formulate queries until a feasible one is
     found
                              The CLIDE Solution

                             CLIDE


                        
                    Developer
Application

                                     A query formulation
         Mediator
                    Global            interface, which
                    Schema
                                      interactively guides the
                                      user toward feasible
 V1          V2     V3       V4
                                      queries by employing a
                                      coloring scheme

      Dell           Cisco
QBE-Like Interfaces

Microsoft SQL-Server
       CLIDE Interface



                        Feasibility Flag
          Table Alias
                         Selection Boxes




                     Table Boxes
            Projection Boxes




• Table, selection, projection and join
  actions
• Color-based suggestions
• Feasibility Flag
       CLIDE Interface

         Snapshot 1




Yellow  required action
  – All feasible queries require
     this action

White  optional action
  – Feasible queries can be
               CLIDE Interface

                     Snapshot 2




Blue  required choice of action
   – At least one feasible (next) query cannot
     be formulated unless this action is
     performed             C
                                 ram
                                     pric
                Mediator                                e
                               cp      pric        512 400
                          cid     ram             1024 550
                               u        e
     A ComByCpu(‘P        A123 P4 512 400     B
               4’)        B123 P4 1024 550

                     V1
       CLIDE Interface

          Snapshot 3




Join Lines:
• Only yellow and blue are
  displayed
• Must appear in Column
  Associations
CLIDE Interface

 Snapshot 4
         CLIDE Interface

           Snapshot 5




• *  any other constant
• Red  prohibited action
   – Does not appear in any
     feasible query
   – Lead to “Dead End” state
                                 CLIDE Interface

                                   Snapshot 6




                               pric      interfac pric
                           ram      rate                 F
                               e         e        e
                           512 400 10      USB     50
                           102
                               550 54      USB    120
                           4
                                   Mediator

    A              Routers.*
                                                     D            Computers.* NetCards.*
RouByTypeW        .11     Wirele         ComNetByCpuRate(‘P4’,    A12            A12    .11
        L()    10     512                                             P4 512 400     10     50
                   b        ss                          rate)     3              3      b
                  .11 102 Wirele                                  B12    102   EB12 54 .11 120
            V4 54 g B  4    ss                                 V2 3 P4 4     550
                                                                                 3      g
               CLIDE Facts




• Rapid Convergence
  – At every step, yellow and blue actions
    lead to a feasible query in a minimum
    number of steps
• Completeness of Suggestions
  – Every feasible query can be formulated
    by performing yellow and blue actions
    at every step
• Minimality of Suggestions
  – At every step, only a minimal number of
                              Interaction Graph

                                    Selection      Table      Join Action
                                     Action        Action
s   Com1   Com1.ram   Com1.price   Com1.cpu=‘P4’    Net1    Com1.cid=Net1.cid   Rou1   …
     …




            …




                        …




                                    …




                                                     …




                                                             …




                                                                                 …




                                                                                       …




                                                                                           …
    • Nodes are queries
       – One for each qCQ
    • Edges are actions
       – Table, selection, projection and join actions
    • Green nodes are feasible queries
    • Infinitely big structure
       – All CQ queries
                Interaction Graph: Colorable Actions




Current Node    Com1.cid
                              …
                Com1.cpu
                              …




               Com1.cid=*
                                  • Colorable actions AC label
                              …




               Com1.cpu=*

                                    outgoing edges of the current
                              …




               Com1.ram=*
                              …




               Com1.price=*
                                    node
                              …




                  Net1
                              …




                  Rou1
                              …




                  Com2
                              …
                         Interaction Graph: Colors


                                         • Yellow action 
                                            – Every path from current node
                                              n to a feasible node contains 
                                         • Blue action 
                                            – At least one feasible query
                                              cannot be formulated unless
Current
  Node
           Com1.cid
                                              this action is performed
                              …




                                              (minimality)
           Com1.cpu
                              …




          Com1.cid=*                     • Red action 
                                                             to a feasible
                                            – No path Net1.rate=’54Mbps’ node
                              …
                          …




          Com1.cpu=*              Net1     Com1.cid=Net1.cid
                                              contains 
                              …




                                             …




                                                                  …




                                                                                         …
          Com1.ram=*
                                                                 Com1.cid=Net1.cid
                              …




                                           Net1.rate=’54Mbps’
                          …




          Com1.price=*



                                                                   …
                              …
                          …




                              Com1.cpu=*   Rou1     Com1.cid=Net1.cid   Net1.rate=Rou1.rate     …
             Net1
                                                      …




                                                                          …




                                                                                               …




                                                                                                     …
                              …




             Rou1                          Com2
                              …




             Com2                           Com2.cid=Net1.cid   Com2.cpu=‘P4’   Net1.rate=‘54Mbps’
                              …




                                             …




                                                                  …




                                                                                     …




                                                                                                     …
Color Determined By a Finite Set of Feasible Queries

       Challenge: Infinitely Many Feasible Queries

                            …
                                 ?

                                     …
        Closest       n                          Infinitely
        Feasible                             …
        Queries FQC                      …         many
                                     …
                                                 feasible
                                                   queries
                            …




 • Start by considering the closest feasible
   queries FQC
 • FQC is sufficient to color actions in AC
 • Theorem: Set of Closest Feasible Queries is
   Finite

 • How far can closest feasible queries FQC be?
                            Color Algorithm

               Maximally Contained Queries FQMC

  Query: Q1
                                          Maximally Contained Query
  Get all Computers
                                          Query: Q2
                                          Get all Computers
                                          with a given cpu

Maximally Contained Query                 Not Maximally Contained

Query: Q4                                 Query: Q3
Get all Computers                         Get all Computers
with a given ram                          with a given cpu & ram



  • Assuming fixed SELECT clause (projection list)
  • Covered extensively in literature
     – MiniCon, Bucket, InverseRules
  • FQMC is finite
                      Color Algorithm

          Maximally Contained Queries FQMC

                            …
                                        Maximally
                                        Contained
                                   …
        Closest        n                Queries FQMC
        Feasible                              …
        Queries FQC                     …
                                   …
                            …




• Compute maximally contained queries FQMC
• The radius pL is the longest path to a node n’
  such that q(n’) in FQMC

All FQC queries are reachable via a path of
  length p  pL
                     Color Algorithm

          More on Closest Feasible Queries

                                …               Maximally
                                                Contained
                                                Feasible
                                           …
       Closest        n                         Queries FQMC
       Feasible                                       …
       Queries FQC                              …
                                           …
                                …


                          More feasible nodes




• Theorem: All queries in FQMC are in FQC
• But not all queries in FQC are in FQMC
                     Color Algorithm

          More on Closest Feasible Queries

                           …           Maximally
                                       Contained
                                       Feasible
                                   …
       Closest        n                Queries FQMC
       Feasible                              …
       Queries FQC                     …
                                   …
                           …




• Naïve Approach
  – Start from n and explore paths up to length pL
                    Color Algorithm

                    Collapse Aliases

                             …             Maximally
                                           Contained
                                           Feasible
                                       …
      Closest        n                     Queries FQMC
      Feasible                                   …
      Queries FQC                          …
                                       …
                             …




• Collapse Aliases to compute FQC \ FQMC
• Check satisfiability
                    Color Algorithm

Coloring Non-Projection Actions
• No interaction graph materialization
• Use of containment mapping from current query to the
  closest feasible ones
• An action  is colored
   – Yellow, if  is mapped into all queries in FQC
   – Red, if  is not mapped into any query in FQC
   – Blue, if  is mapped into at least one query qF in FQC, no
     other action in AP is mapped into qF, and  is neither
     yellow nor red

Coloring Projection Actions
• Never colored yellow
• Can be colored blue only if
   – the current query is feasible
   – it is not colored red
                          CLIDE Implementation

                 Other Back-End
                 Parameterized Views Back-End
 Developer
                                                Optimal
Action                    Maximally            Maximally              Closest
          Current         Contained           Contained               Feasible             Colored
          Query            Queries Containment Queries     Collapse   Queries     Color    Actions
Front-End         MiniCon
                                       Test                 Aliases              Actions

                                                                         Column
                  Views                                                               Schemas
                                                                       Associations




    MiniCon
    • Outputs redundant and non-minimal queries
    • Affects CLIDE’s rapid convergence and minimality
      properties
    Containment Test
    • Well-known NP-complete problem
    • Polynomial when query is acyclic
    Collapse Aliases / Color Actions
                             CLIDE Performance

                                  Chains of Stars



               14
               12   112   Views
               10   168   Views
  Time (sec)




                8   224   Views
                6   280   Views
                4
                2
                0




               (1 0)

               (1 1)

               (1 2)

               (1 3)

                       )
               (1 9)
                      )

                      )

                      )

                      )

                      )

                      )

                      )

                (1 )




                   14
                    ,1

                    ,2

                    ,3

                    ,4

                    ,5

                    ,6

                    ,7

                    ,8
                  0,

                   1

                   1

                   1

                   1
                 (2

                 (2

                 (4

                 (4

                 (6

                 (6

                 (8

                 (8



                 0,

                 2,

                 2,

                 4,

                 4,
                          (# of Joins, # of Selections) in Current Query

                                                                     C1
  Schem
  View
• Querie

                                                                    …
                                                                    … … …
  s
  a                                        B1                          C
                                           B1                        C1 i1
                                                                      Ci   A-span = 7
                                                                       CiM
                                          …
                                          … … …




                                           B2Bi1
                                                                        …
                    A
                                             Bi                       CL B-span = 4
                                                                     C1
                                             BiM                            Selections =
                                                                              4,6,8,10
                                              ……




                                              BK

								
To top