VIEWS: 7 PAGES: 37 POSTED ON: 7/24/2012 Public Domain
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 2 Outline Matching Syntactic Matching Semantic Matching On Implementing Semantic Matching Conclusions The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 3 MATCHING The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 4 Application Domains Generic Model Management Schema integration Data warehouses E-commerce Semantic query processing Data Coordination in P2P systems The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 5 Matching Problems 1. RDB Schemas 2. OODB Schemas 3. XML Schemas 4. Concept Hierarchies 5. Ontologies The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 6 Example of Matching www.google.com www.yahoo.com Arts Arts&Humanities Art History Art History Music Sr={} Design Art Organizations Organizations Sc=1.0 Architecture History Sr={} History Sr={} Baroque Sc=1.0 Baroque The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 7 Matching Match is an operator that takes two graph-like structures (e.g., database schemas or ontologies) and produces a mapping between elements of the two graphs that correspond semantically to each other The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 8 Matching The problem of matching can be decomposed in two steps: Extract graphs from the data and conceptual models Match the resulting graphs (generic matching) The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 9 Matching Mapping element is a 4-tuple < mID, Ni1, Nj2, R >, i=1...h; j=1..k; where mID is a unique identifier of the given mapping element; Ni1 is the i-th node of the first graph, h is the number of nodes in the first graph; Nj2 is the j-th node of the second graph, k is the number of nodes in the second graph R specifies a similarity relation of the given nodes Mapping is a set of mapping elements Matching is the process of discovering mappings between two graphs through the application of a matching algorithm The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 10 Matching: Syntactic AND Semantic Matching Syntactic Matching Semantic Matching •R is computed between •Ris computed between labels at nodes concepts at nodes •R = [0,1] •R = {set-theoretic relations, e.g., =, , , , } The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 11 SYNTACTIC MATCHING The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 12 Syntactic Matching Mapping element is a 4-tuple < mID, Li1, Lj2, R >, where Li1 is the label at the i-th node of the first graph; Lj2 is the label at the j-th node of the second graph; R specifies a similarity relation in the form of a coefficient, which measures the similarity between the labels of the given nodes Example: R is a similarity coefficient in [0,1] R = <m21,telephone, phone, 0.7> The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 13 (final result) Example: Cupid (tentative links) www.google.com www.yahoo.com Arts Arts&Humanities Sc=0.9 Art History Art History Sc=1.0 Music Design Art Sc=0.7 Organizations Organizations Sc=1.0 Architecture History Sc=1.0 History Sc=0.7 Sc=0.7 Baroque Sc=1.0 Baroque The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 14 The State of the Art Cupid … is a hybrid matching prototype. It exploits linguistic and structural schema matching heuristics, and computes similarity coefficients between nodes of the trees. Similarity Flooding … is a hybrid matching prototype. It uses fix-point computation to determine correspondences between nodes of the graphs. COMA …is a composite matching prototype. It provides an extensible library of different matchers which manipulate DAGs and supports various ways of combining final results. As far as we know, so far only syntactic matching… The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 15 SEMANTIC MATCHING The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 16 Semantic Matching Mapping element is a 4-tuple < mID, Ci1, Cj2, R >, where Ci1 is the concept of the i-th node of the first graph; Cj2 is the concept of the j-th node of the second graph; R specifies a similarity relation in the form of a semantic relation between the extensions of concepts at the given nodes Possible R’s: equality {=}, overlapping {}, mismatch {}, more general/specific {, } Example: R = <m21,telephone, phone, {=}> The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 17 Examples: Analysis of Siblings A A 1 is-a is-a is-a 1 is-a is-a is-a is-a is-a B 2 C 5 C 2 B 5 D 3 E 4 D 3 E 4 Suppose that we want to match nodes 51 and 22 Cupid: R = 0,8. This is because A1=A2, C1=C2 and we have the same structures on both sides (no importance of order of links) A semantic matching approach compares concepts A1C1 with A2C2 and produces C5 = C2 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 18 Examples: Analysis of Ancestors. Case 1 A C 1 1 is-a is-a is-a is-a is-a is-a is-a B 2 C 5 A 4 D 3 E 4 D 2 E 3 is-a B 5 Suppose that we want to match nodes 51 and 12 Cupid does not find a similarity coefficient between the nodes under consideration, due to the significant differences in structure of the given graphs Semantic matching: The concept denoted by the label at node 51 is C1, while the concept at node 51 is C5 = 1 A1C1. The concept at node 12 is C1 = C . Thus, C5 C1 2 2 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 19 Examples: Analysis of Ancestors. Case 2 A A 1 1 is-a is-a is-a is-a is-a is-a is-a B 2 C 5 * … D 3 E 4 D 2 E 3 is-a C 5 Suppose that we want to match nodes 51 and 52 Cupid: R= 0,86. This is because of the identity of labels A1=A2, C1=C2 Semantic matching: The concept at node 51 is C5 = 1 A1C1; while the concept at node 52 is C5 = A2*C2. 2 Since we have that A1=A2 and C1=C2, then C5 C5 2 1 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 20 Examples: Enriched Analysis of Siblings World World 1 1 is-a is-a is-a is-a 2 4 2 Benelux Luxembourg Belgium 3 Netherlands Suppose that we want to match nodes 21 and 22 Cupid: R= 0,68. This is mainly because of the entry in the thesaurus specifying Belgium as a part of Benelux, and due to the fact that the nodes with labels Benelux1 and Belgium2 are leaves Semantic matching: We treat C2 as 1 Benelux1 Netherlands1 Luxembourg1 = Belgium. Thus, C2 = C2 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 21 ON IMPLEMENTING SEMANTIC MATCHING The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 22 On Implementation Semantic Matching Element - level Structure - level Weak Semantics Techniques Strong Semantics Techniques The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 23 Element-level Semantic Matching Weak Semantics Techniques Analysis of strings {=} <phone, telephone,{=}> Analysis of data types {=, , , , } <string, integer,{}> <integer, real,{}> Analysis of soundex {=} < Fausto, Phausto,{=}> Strong Semantics Techniques Precompiled thesaurus syn key <Discount, Rebate,{=}> WordNet <Art_#1, Humanities_#1,{}>, where #1 … sense number 1 of the word Art according to WordNet The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 24 Element-level Semantic Matching (cont.) Semantic Relations via WordNet Equality: one concept is equal to another if there is at least one sense of the first concept, which is a synonym of the second Overlapping: one concept is overlapped with the other if there are some senses in common Mismatch: two concepts are mismatched if they have no sense in common More general: one concept is more general then the other iff there exists at least one sense of the first concept that has a sense of the other as a hyponym or meronym Less general: one concept is less general than the other iff there exists at least one sense of the first concept that has a sense of the other concept as hypernym or as a holonym The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 25 Structure-level Semantic Matching We translate the matching problem, namely the two graphs (in particular, the pair of nodes submitted to matching) into a propositional formula and then check for its validity We check for validity using SAT The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 26 Semantic Matching Algorithm 1. Extract the two graphs 2. Compute element-level semantic matching 3. Compute concepts at nodes 4. Construct the propositional formula 5. Run SAT 6. Perform iterations The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 27 Semantic Matching Algorithm: Example – (1) Extract the two graphs A C 1 1 is-a is-a is-a is-a is-a is-a is-a B 2 C 5 A 4 D 3 E 4 D 2 E 3 is-a B 5 • In the case of RDB, XML and OODB schemas, it is necessary to extract useful semantic information, for instance in the form of ontologies The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 28 Semantic Matching Algorithm: Example – (2) Element-level semantic matching. For each node, compute semantic relations holding among all the concepts denoted by labels at nodes under consideration A C 1 1 is-a is-a is-a is-a is-a is-a is-a B 2 C 5 A 4 D 3 E 4 D 2 E 3 is-a A1 = A2 B 5 B1 = B2 C 1 = C 2 D 1 = D 2 E1 = E2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 29 Semantic Matching Algorithm: Example – (3) Compute concepts at nodes. Suppose, we want to find a semantic relation between nodes 51 and 12 A C is-a 1 is-a ? 1 is-a is-a is-a is-a is-a B 2 C 5 A 4 D 3 E 4 D 2 E 3 is-a B 5 C11 = A1 C51 = A1 C2 C12 = C 2 C51 C12 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 30 Semantic Matching Algorithm: Example – (4) Construct the propositional formula. We translate all the semantic relations computed in step 2 into propositional formulas under the following rules: A1 A2 A2 A1 A C 1 A1 A2 A1 A2 is-a 1 is-a ? is-a A1 = A2 A1 A2 is-a is-a is-a is-a B A1 A2 (A1 A2) 2 C 5 A 4 D 3 E 4 D 2 E 3 is-a From step 2 we have: C1 C2 B 5 We want to prove that C5 C1 ( we guess relation 1 2 between nodes at this stage) (A1 C1) C2 (C1 C2) ((A1 C1) C2) The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 31 Semantic Matching Algorithm: Example – (5) Run SAT In order to prove that (C1 C2) ((A1 C1 ) C2) is valid, we prove that its negation is unsatisfiabile (C1 C2) ((A1 C1) C2) SAT returns FALSE Thus, C5 C1 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 32 Semantic Matching Algorithm: Example – (6.1) Iterations. Iterations are performed re-running SAT A F 1 1 is-a is-a is-a is-a is-a B 2 C 3 D 4 B 2 C 3 Suppose, that C2 C2 1 2 …an oracle tells us that A1 = F2 G2 A F G 1 1 1 is-a is-a is-a is-a is-a is-a is-a is-a is-a B 2 C 3 D 4 B 2 C 3 D 4 B 2 C 3 D 4 After this additional analysis we can infer C2 = C2 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 33 Semantic Matching Algorithm: Example – (6.2) Iterations. …to use the result of a previous match A A 1 1 is-a is-a is-a is-a F 2 C 3 B 2 C 3 is-a is-a is-a is-a D 4 E 5 D 4 E 5 Suppose, that F1 B2 Having found that C4 C4 1 2 We can automatically infer that C5 C5 1 2 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 34 Example: Cupid vs. Semantic Matching www.google.com {} www.yahoo.com Arts Arts&Humanities {} Art History Art History Music {} Design Art Organizations Organizations Architecture History {} {} History {} Baroque Baroque The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 35 Conclusions We have made a rational reconstruction of the major matching problems and articulated them in terms of the more generic problem of matching graphs We have identified semantic matching as a new approach for performing generic matching We have proposed an implementation of semantic matching using SAT The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 36 Future Work Extend to a full graph matcher How to extract semantics from schemas Study how to take into account attributes and instances Develop an efficient implementation of the system Do a thorough testing of the system The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003 37 References Project website: http://www.dit.unitn.it/~p2p/ F. Giunchiglia, P.Shvaiko “Semantic Matching”. Technical Report #DIT-03-013, Trento, 2003. Also to appear in Proc. of ODS at IJCAI – 03. F. Giunchiglia, I. Zaihrayeu “Making peer databases interact – a vision for an architecture supporting data coordination” In Proc. Of the Conference of Information Agents (CIA 2002), Madrid, 2002 The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003