Docstoc

semantic

Document Sample
semantic Powered By Docstoc
					  Semantic Matching
         Fausto Giunchiglia

work in collaboration with Pavel Shvaiko




        The Italian-Israeli Forum on Computer Science,
        Haifa, June 17-18, 2003
                                                                            2

  Outline

Matching

Syntactic Matching

Semantic Matching

On Implementing Semantic Matching

Conclusions



   The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         3




                         MATCHING




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                            4

   Application Domains

Generic Model Management
  Schema integration
  Data warehouses
  E-commerce
  Semantic query processing


Data Coordination in P2P systems




   The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                              5

     Matching Problems

1.   RDB Schemas

2.   OODB Schemas

3.   XML Schemas

4.   Concept Hierarchies

5.   Ontologies


     The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         6

Example of Matching
    www.google.com                             www.yahoo.com

          Arts                                Arts&Humanities

              Art History              Art History
Music                         Sr={}                        Design Art
            Organizations            Organizations
                             Sc=1.0
                                                           Architecture
History
                            Sr={}
                                                              History
                            Sr={}

Baroque
                            Sc=1.0                           Baroque
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             7

    Matching

Match is an operator that takes two graph-like
structures (e.g., database schemas or ontologies)
and produces a mapping between elements of the
two graphs that correspond semantically to each
other




    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             8

   Matching

The problem of matching can be decomposed in
two steps:

    Extract graphs from the data and conceptual models

    Match the resulting graphs (generic matching)




    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                              9

    Matching
Mapping element is a 4-tuple < mID, Ni1, Nj2, R >, i=1...h; j=1..k;
where
   mID is a unique identifier of the given mapping element;
   Ni1 is the i-th node of the first graph,
     h is the number of nodes in the first graph;
   Nj2 is the j-th node of the second graph,
     k is the number of nodes in the second graph
   R specifies a similarity relation of the given nodes

Mapping is a set of mapping elements

Matching is the process of discovering mappings between
two graphs through the application of a matching algorithm


     The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         10

Matching: Syntactic AND Semantic

                         Matching




Syntactic Matching                        Semantic Matching
•R is computed between                    •Ris computed between
labels at nodes                           concepts at nodes
•R   = [0,1]                              •R = {set-theoretic relations,
                                          e.g., =, , , , }




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         11




              SYNTACTIC MATCHING




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             12

    Syntactic Matching
Mapping element is a 4-tuple < mID, Li1, Lj2, R >, where
   Li1 is the label at the i-th node of the first graph;
   Lj2 is the label at the j-th node of the second graph;
   R specifies a similarity relation in the form of a
  coefficient, which measures the similarity between the
  labels of the given nodes

Example: R is a similarity coefficient in [0,1]
   R = <m21,telephone, phone, 0.7>




    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         13

               (final result)
Example: Cupid (tentative links)
    www.google.com                             www.yahoo.com

          Arts                                Arts&Humanities
                         Sc=0.9

              Art History               Art History
                             Sc=1.0
Music                                                       Design Art
                            Sc=0.7
            Organizations             Organizations
                               Sc=1.0
                                                           Architecture
History

                          Sc=1.0                              History
                      Sc=0.7                 Sc=0.7
Baroque
                             Sc=1.0                           Baroque

The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                            14

   The State of the Art
Cupid
  … is a hybrid matching prototype. It exploits linguistic and structural
  schema matching heuristics, and computes similarity coefficients
  between nodes of the trees.

Similarity Flooding
  … is a hybrid matching prototype. It uses fix-point computation to
  determine correspondences between nodes of the graphs.

COMA
  …is a composite matching prototype. It provides an extensible library
  of different matchers which manipulate DAGs and supports various
  ways of combining final results.

     As far as we know, so far only syntactic matching…

   The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         15




               SEMANTIC MATCHING




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             16

    Semantic Matching

Mapping element is a 4-tuple < mID, Ci1, Cj2, R >, where
   Ci1 is the concept of the i-th node of the first graph;
   Cj2 is the concept of the j-th node of the second graph;
   R specifies a similarity relation in the form of a semantic
  relation between the extensions of concepts at the given
  nodes
Possible R’s:
   equality {=},
   overlapping {},
   mismatch {},
   more general/specific {, }
Example: R = <m21,telephone, phone, {=}>



    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                     17

Examples: Analysis of Siblings
                        A                                        A
                        1
             is-a                  is-a            is-a
                                                                 1
                                                                             is-a
                is-a    is-a                          is-a       is-a
     B   2
                                   C   5   C   2
                                                                             B   5
               D    3   E      4
                                                     D    3      E       4




 Suppose that we want to match nodes 51 and 22
 Cupid: R = 0,8. This is because A1=A2, C1=C2 and we
have the same structures on both sides (no importance
of order of links)
 A semantic matching approach compares concepts
A1C1 with A2C2 and produces C5 = C2                    1       2




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                        18

Examples: Analysis of Ancestors. Case 1
                       A                          C
                        1                         1
            is-a                  is-a                           is-a
               is-a                       is-a    is-a
                       is-a
    B   2
                                  C   5                          A   4


              D    3   E      4           D   2
                                                  E      3
                                                                         is-a

                                                                 B   5



 Suppose that we want to match nodes 51 and 12
 Cupid does not find a similarity coefficient between the
nodes under consideration, due to the significant
differences in structure of the given graphs
 Semantic matching: The concept denoted by the label
at node 51 is C1, while the concept at node 51 is C5 =                         1


A1C1. The concept at node 12 is C1 = C . Thus, C5  C1   2           2      1   2




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                       19

Examples: Analysis of Ancestors. Case 2
                      A                          A
                      1                          1
           is-a                  is-a                       is-a
              is-a                       is-a    is-a
                      is-a
   B   2
                                 C   5                      *   …


             D    3
                      E      4           D   2
                                                 E      3
                                                                    is-a

                                                            C   5



 Suppose that we want to match nodes 51 and 52
 Cupid: R= 0,86. This is because of the identity of labels
A1=A2, C1=C2
 Semantic matching: The concept at node 51 is C5 =                             1


A1C1; while the concept at node 52 is C5 = A2*C2.        2


Since we have that A1=A2 and C1=C2, then C5  C5                       2       1




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         20

Examples: Enriched Analysis of Siblings
                                 World                    World
                                      1                        1
                      is-a                  is-a
                                                        is-a
                               is-a
              2
                                               4               2
       Benelux                             Luxembourg   Belgium
                                  3
                             Netherlands
 Suppose that we want to match nodes 21 and 22
 Cupid: R= 0,68. This is mainly because of the entry in
the thesaurus specifying Belgium as a part of Benelux,
and due to the fact that the nodes with labels Benelux1
and Belgium2 are leaves
 Semantic matching: We treat C2 as                 1


Benelux1  Netherlands1  Luxembourg1 = Belgium.
Thus, C2 = C2
         1        2



The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         21




                      ON
                 IMPLEMENTING
               SEMANTIC MATCHING




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         22

On Implementation

                     Semantic Matching




   Element - level                           Structure - level


                          Weak Semantics
                           Techniques

                         Strong Semantics
                            Techniques


The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             23

    Element-level Semantic Matching

Weak Semantics Techniques
    Analysis of strings {=}
      <phone, telephone,{=}>
    Analysis of data types {=, , , , }
      <string, integer,{}>
      <integer, real,{}>
    Analysis of soundex {=}
     < Fausto, Phausto,{=}>

Strong Semantics Techniques
    Precompiled thesaurus
      syn key <Discount, Rebate,{=}>
    WordNet
      <Art_#1, Humanities_#1,{}>, where #1 … sense number 1 of
     the word Art according to WordNet


    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                            24

  Element-level Semantic Matching (cont.)

Semantic Relations via WordNet
  Equality: one concept is equal to another if there is at least
  one sense of the first concept, which is a synonym of the
  second
  Overlapping: one concept is overlapped with the other if
  there are some senses in common
  Mismatch: two concepts are mismatched if they have no
  sense in common
  More general: one concept is more general then the other
  iff there exists at least one sense of the first concept that
  has a sense of the other as a hyponym or meronym
  Less general: one concept is less general than the other iff
  there exists at least one sense of the first concept that has
  a sense of the other concept as hypernym or as a holonym


   The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         25

Structure-level Semantic Matching

 We translate the matching problem, namely the
 two graphs (in particular, the pair of nodes
 submitted to matching) into a propositional
 formula and then check for its validity

 We check for validity using SAT




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                              26

     Semantic Matching Algorithm
1.   Extract the two graphs

2.   Compute element-level semantic matching

3.   Compute concepts at nodes

4.   Construct the propositional formula

5.   Run SAT

6.   Perform iterations

     The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                  27

Semantic Matching Algorithm: Example – (1)

    Extract the two graphs

                             A                          C
                              1                         1
                  is-a                  is-a                       is-a
                     is-a                       is-a    is-a
                             is-a
          B   2
                                        C   5                      A   4


                    D    3   E      4           D   2
                                                        E      3
                                                                           is-a

                                                                   B   5




•   In the case of RDB, XML and OODB schemas, it is
    necessary to extract useful semantic information, for
    instance in the form of ontologies

The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                            28

Semantic Matching Algorithm: Example – (2)
  Element-level semantic matching. For each node,
  compute semantic relations holding among all the
  concepts denoted by labels at nodes under
  consideration
                                            A                          C
                                             1                         1
                                 is-a                  is-a                       is-a
                                    is-a                       is-a    is-a
                                            is-a
                         B   2
                                                       C   5                      A   4


                                   D    3   E      4           D   2
                                                                       E      3
                                                                                          is-a
   A1 = A2                                                                      B   5

  B1 = B2
  C 1 = C 2
  D 1 = D 2
  E1 = E2
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                        29

Semantic Matching Algorithm: Example – (3)
  Compute concepts at nodes. Suppose, we want to find
  a semantic relation between nodes 51 and 12

                                A                             C
                     is-a
                                 1
                                           is-a
                                                   ?          1
                                                                         is-a
                        is-a                           is-a   is-a
                                is-a
             B   2
                                           C   5                         A   4


                       D    3   E      4           D      2
                                                              E      3
                                                                                 is-a

                                                                         B   5
   C11   = A1
   C51   = A1  C2
   C12   = C 2
   C51    C12




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                                            30

Semantic Matching Algorithm: Example – (4)
  Construct the propositional formula. We translate all
  the semantic relations computed in step 2 into
  propositional formulas under the following rules:
        A1  A2  A2  A1                             A                             C
                                                                                         1

        A1  A2  A1  A2
                                                is-a
                                                            1
                                                                      is-a    ?                     is-a

        A1 = A2  A1  A2
                                                   is-a                           is-a   is-a
                                                           is-a
                                        B
        A1  A2  (A1  A2)
                                            2
                                                                      C   5                         A   4


                                                  D    3   E      4           D      2
                                                                                         E      3
                                                                                                            is-a

  From step 2 we have: C1  C2                                                                    B   5




  We want to prove that C5  C1 ( we guess relation
                                    1       2


  between nodes at this stage)

  (A1  C1)  C2

  (C1  C2)  ((A1  C1)  C2)
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         31

Semantic Matching Algorithm: Example – (5)

  Run SAT

  In order to prove that (C1  C2)  ((A1  C1 )  C2) is
  valid, we prove that its negation is unsatisfiabile

  (C1  C2)  ((A1  C1)  C2)

  SAT returns FALSE

  Thus, C5  C1
            1     2




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                                                           32

Semantic Matching Algorithm: Example – (6.1)
  Iterations. Iterations are performed re-running SAT
                                              A                                 F
                                              1                                 1


                                        is-a       is-a
                                            is-a                   is-a         is-a

                             B      2     C   3    D      4     B       2       C      3




 Suppose, that C2  C2       1              2



 …an oracle tells us that A1 = F2  G2
                         A                                          F                                   G
                         1                                          1                                   1

                    is-a     is-a                             is-a     is-a                        is-a     is-a
                        is-a                                      is-a                                 is-a

            B   2    C   3    D     4               B     2    C    3       D    4         B   2    C   3    D     4




 After this additional analysis we can infer C2 = C2                                                         1         2




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                                         33

Semantic Matching Algorithm: Example – (6.2)
 Iterations. …to use the result of a previous match
                              A                                          A
                              1                                              1
                       is-a       is-a                            is-a           is-a


               F   2              C   3               B   2                      C   3



        is-a       is-a                       is-a        is-a

       D   4       E      5
                                              D   4       E        5




  Suppose, that F1  B2
  Having found that C4  C4       1       2



  We can automatically infer that C5  C5                     1          2




The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                         34

Example: Cupid vs. Semantic Matching
    www.google.com           {}               www.yahoo.com

          Arts                                Arts&Humanities
                              {}

              Art History             Art History
Music                         {}                           Design Art
            Organizations           Organizations
                                                           Architecture
History                                             {}
                             {}
                                                              History
                             {}

Baroque
                                                              Baroque
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             35

   Conclusions

We have made a rational reconstruction of the major
matching problems and articulated them in terms of the
more generic problem of matching graphs

We have identified semantic matching as a new approach for
performing generic matching

We have proposed an implementation of semantic matching
using SAT




    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                             36

    Future Work

Extend to a full graph matcher

How to extract semantics from schemas

Study how to take into account attributes and instances

Develop an efficient implementation of the system

Do a thorough testing of the system




    The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
                                                                               37

      References

Project website: http://www.dit.unitn.it/~p2p/

F. Giunchiglia, P.Shvaiko “Semantic Matching”. Technical
Report #DIT-03-013, Trento, 2003. Also to appear in Proc. of
ODS at IJCAI – 03.

F. Giunchiglia, I. Zaihrayeu “Making peer databases interact –
a vision for an architecture supporting data coordination” In
Proc. Of the Conference of Information Agents (CIA 2002),
Madrid, 2002




      The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:7/24/2012
language:English
pages:37