mapping by changcheng2

VIEWS: 4 PAGES: 16

									              Mapping a Data Structure to the
             CIDOC Conceptual Reference Model

                                     Martin Doerr
                                (ICS-FORTH, Crete, Greece)




                           Heraklion, Crete, April 2, 2002


Heraklion, April 2, 2002                                     1
                           What Means Mapping of
                           One Schema to Another

 Defining an (automated) transformation of each instance of
    schema 1 into an instance of schema 2 with the same
    meaning.
 CRM Approach:
       Interpretation of schema 1 as semantic model (nodes and links),
       mapping each element of that to an equivalent CIDOC CRM path,
       such that each instance of an element of the semantic model 1 can
       be converted into a valid instance of the CIDOC CRM with the same
       meaning.

 This is the most simple theory. Works for good structures
Heraklion, April 2, 2002                                                    2
                           Interpreting a Schema
                             as Semantic Model

 1.     Interpreting tables, columns as entities
 2.      Interpreting records as entity instances
 3.      Interpreting fieldnames as relationships and entities
 4.      Interpreting field contents as entity instances


 Each field is interpreted as entity-relationship-entity (e-r-e)
 The whole schema is decomposed into e-r-e’s
 Each e-r-e is mapped individually to the CRM.




Heraklion, April 2, 2002                                            3
                              Interpreting a Schema
                           as Semantic Model, Example
                     The field name stands for a                  The field contents stand for
                 relationship and the kind of contents               an entity instance :

     Object                        has ID:
    1975-7309
                                                                              1975-7309


                                  ID               1975-7309
                                  Category         NRM - Railway furniture
                                  Description      Armchair, Upholstered in blue moquette with curved,
                                                   buttoned back & scroll arms. Wooden legs
                                  Item name(s)     armchairs (AAT Hierarchy: Furnishings)


    The whole record
                                  Part             Aspect              Term                 (AAT Hierarchy)
      corresponds
                                  overall          physical            upholstering         Processes        &
      to one entity:                               descriptor                               techniques
It stands for one object          overall          material            moquette             Materials
which is not referred to          overall          colour              blue                 Color
                                  legs             material            wood                 Materials
                                  back             physical            buttoning            Processes         &
                                                   descriptor                               techniques
                                  back             shape               curved               Physical attributes
                                  arms             shape               scrolled arms        Components              (data example from the
                                                                                                                  Science Museum of London)

Heraklion, April 2, 2002                                                                                                                      4
                  Mapping the First Element:
              Creating an Equivalent Proposition
       Whole                       “has ID ”                                        Source Schema
                                                                    ID               interpretation
       Record


                      Object                                                                  Instance,
                                                                               1975-7309
                      1975-7309                                                             valid for both
                                                                                              schemata
 maps to:



       Man-Made                   is identified by                 Object
                                                                                     CRM Schema
        Object                                                    Identifier


                    Possible Mapping Annotation:
                    Whole Record       = E22 Man-Made Object
                    ID                 = E42 Object identifier
                    Whole Record->ID   = P47 is identified by

                    Possible CRM instance Annotation:
                    Object 1975-7309 (E22: Man-Made_Object)
                          is_identified_by 1975-7309 (E42 Object_Identifier)

Heraklion, April 2, 2002                                                                                     5
                           Mapping the Interpreted Schema
                                     to the CRM

 Each Entity-link-entity can be instantiated as self-explanatory, context
    independent proposition
 The mapping allows to create sets of propositions equivalent to the
 meaning of each source document, but in terms of the CIDOC CRM.
 As the CRM-compatible propositions are self-explanatory, they can be
 merged into huge knowledge pools and the document boundaries can be
 ignored.
 buzz words: Data warehouses, Semantic Web



Heraklion, April 2, 2002                                                     6
                     Interpreting a Schema:
                Advanced Stuff: Value Dependency
                        The first field name stands for a         The field contents stands for
                      relationship and the kind of contents           an entity instance :

   Object                          has part:                                   legs of obj.
   1975-7309                                                                    1975-7309




                                                   ID                  1975-7309
                                                   Category            NRM - Railway furniture
       Mapping condition:                          Description         Armchair, Upholstered in blue moquette with curved,
             If part = overall,                                        buttoned back & scroll arms. Wooden legs
         it stands for the whole                   Item name(s)        armchairs (AAT Hierarchy: Furnishings)


                                                   Part                Aspect               Term                (AAT Hierarchy)
                                                   overall             physical             upholstering        Processes        &
                                                                       descriptor                               techniques
                                                   overall             material             moquette            Materials
                                                   overall             colour               blue                Color
           The whole row corresponds
                   to one entity:                  legs                material             wood                Materials

              It stands for one part               back                physical             buttoning           Processes         &
                                                                       descriptor                               techniques
                                                   back                shape                curved              Physical attributes
                                                   arms                shape                scrolled arms       Components

Heraklion, April 2, 2002                                                                                                              7
                   Mapping under condition:
                Creating an equivalent statement
         Whole                         “has Part ”                              Row                 Source Schema
         Record                                                                “Part”                interpretation


                           Object                                                           legs of obj.      Instance,
                           1975-7309                                                          1975-7309     valid for both
                                                                                                              schemata
   maps to:                                   If Part /= “overall”



         Man-Made                      is composed of                          Man-Made
                                                                                                     CRM Schema
          Object                                                                Object


                     Possible Mapping Annotation:
                     Whole Record                  =              E22 Man-Made Object
                     Row “Part”                    =              E22 Man-Made Object
                     If (in Row “Part”, Part /= “overall”) then
                     Whole Record-> Row “Part”                    =          P46 is composed of

                     Possible CRM instance Annotation:
                     Object 1975-7309 (E22: Man-Made_Object)
                           is_composed_of legs of 1975-7309 (E22: Man-Made_Object)


Heraklion, April 2, 2002                                                                                                 8
                     Interpreting a Schema:
               Advanced Stuff: Values as Properties
                            The field “Aspect” contents            The field contents stands for
                                state a relationship                   an entity instance :

   Object                        has material:
                                                                                 moquette
   1975-7309




                                                    ID                  1975-7309
                                                    Category            NRM - Railway furniture
                                                    Description         Armchair, Upholstered in blue moquette with curved,
                                                                        buttoned back & scroll arms. Wooden legs
                                                    Item name(s)        armchairs (AAT Hierarchy: Furnishings)
               Value based mapping
                    If part = overall,
                          AND                       Part                Aspect              Term                 (AAT Hierarchy)
                   Aspect = material                overall             physical            upholstering         Processes        &
                                                                        descriptor                               techniques
                                                    overall             material            moquette             Materials
                                                    overall             colour              blue                 Color
                                                    legs                material            wood                 Materials
                                                    back                physical            buttoning            Processes         &
                                                                        descriptor                               techniques
                                                    back                shape               curved               Physical attributes
                                                    arms                shape               scrolled arms        Components

Heraklion, April 2, 2002                                                                                                               9
                       Interpreting a Schema:
                   Advanced Stuff: Mapping to Paths
                            The field “Aspect” contents            The field contents stands for
                                state a relationship                   an entity instance :

   Object                 has physical descriptor:
                                                                            upholstering
   1975-7309




                                                    ID                  1975-7309
                                                    Category            NRM - Railway furniture
                                                    Description         Armchair, Upholstered in blue moquette with curved,
                                                                        buttoned back & scroll arms. Wooden legs
                                                    Item name(s)        armchairs (AAT Hierarchy: Furnishings)
               Value based mapping
                    If part = overall,
                          AND                       Part                Aspect              Term                 (AAT Hierarchy)
               Aspect = physical descriptor         overall             physical            upholstering         Processes        &
                                                                        descriptor                               techniques
                                                    overall             material            moquette             Materials
                                                    overall             colour              blue                 Color
                                                    legs                material            wood                 Materials
                                                    back                physical            buttoning            Processes         &
                                                                        descriptor                               techniques
                                                    back                shape               curved               Physical attributes
                                                    arms                shape               scrolled arms        Components

Heraklion, April 2, 2002                                                                                                               10
                       Mapping to Paths:
                Introducing an intermediate node
         Whole                “has physical descriptor ”                                                  Source Schema
                                                                                      Term                 interpretation
         Record
                                                    If Part = “overall” &
                                                    Aspect= physical descriptor
                           Object                                                                                    Instance of
                                                                                                  upholstering
                           1975-7309                                                                                   source
   maps to:                                                                                                           Instance of
                           Object                              Obj.1975-7309                      upholstering
                       1975-7309                                 Production                                              target


        Man-Made   was produced by                      used general technique
                                           Production                                  Type                CRM Schema
         Object


                                  Possible Mapping Annotation:
                                  Whole Record           =                      E22 Man-Made Object
                                  Term                   =                      E55 Type
                                  If Part = “overall” & Aspect= physical descriptor
                                  Whole Record-> Term            =              P108 was produced by – E12 Production
                                                                                            - P32 used general technique
                                  Possible CRM instance Annotation:
                                  Object 1975-7309 (E22: Man-Made_Object)
                                        was_produced_by Obj. 1975-7309 Production (E12: Production)
                                               used general technique upholstering (E55 Type)
Heraklion, April 2, 2002                                                                                                        11
                       Interpreting a Schema:
                  Advanced Stuff: Nested Structures
                           The contents of field “Aspect”          The field contents stands for
                                state a relationship                   an entity instance :

 legs of obj.                    has material:
  1975-7309                                                                        wood


                                                    ID                  1975-7309
                                                    Category            NRM - Railway furniture
                  Value based mapping               Description         Armchair, Upholstered in blue moquette with curved,
                        If Aspect = material                            buttoned back & scroll arms. Wooden legs
                                                    Item name(s)        armchairs (AAT Hierarchy: Furnishings)


                                                    Part                Aspect              Term                 (AAT Hierarchy)
                                                    overall             physical            upholstering         Processes        &
                                                                        descriptor                               techniques
                                                    overall             material            moquette             Materials
                                                    overall             colour              blue                 Color
              The whole row corresponds
                    to one entity:                  legs                material            wood                 Materials
                    If part /= overall              back                physical            buttoning            Processes         &
                 it stands for one part                                 descriptor                               techniques
                                                    back                shape               curved               Physical attributes
                                                    arms                shape               scrolled arms        Components

Heraklion, April 2, 2002                                                                                                               12
                      Mapping Nested Structures :
                      Continuing on a Range Entity
              Row                          “has material ”                                     Source Schema
                                                                                 Term
             “Part”                                                                             interpretation


Object                        legs of obj.                                                               Instance,
1975-7309                      1975-7309
                                                                                            wood       valid for both
                                                  If Part /= “overall” &
                                                  Aspect = “material”
                                                                                                         schemata
       maps to:


            Man-Made                          consists of                                          CRM Schema
             Object                                                              Material


                      Possible Mapping Annotation:
                      Row “Part”             =                 E22 Man-Made Object
                      If Aspect= “material”
                      Term                        =            E57 Material
                      Row “Part” -> Term          =            P45 consists of

                      Possible CRM instance Annotation:
                      Object 1975-7309 (E22: Man-Made_Object)
                            is_composed_of legs of 1975-7309 (E22: Man-Made_Object)
                                  consists_of wood (E57 Material)

Heraklion, April 2, 2002                                                                                         13
                               Other Forms of Maps:
                               Cases of Heterogeneity
                                     “a ”
                           A                      B                      Source Schema
                                                          “b ”            interpretation
      Parallel                                                       C
     to nested:

                                      c                      d
                           D                  E                      F    CRM Schema


                                     “a ”
                           A                      B                      Source Schema
                                                          “b ”            interpretation
      Parallel                                                   C
 to intermediate-
      parallel:
   (frequent with
       events!)                  c            d
                           D              E                 F
                                                                          CRM Schema
                                                      e
                                                                 G
Heraklion, April 2, 2002                                                                   14
                           Other Mapping Forms:
                           Cases of Heterogeneity


                                    “a ”
   Compound                A                      B              Source Schema
   contraction:                       “b ”                        interpretation
                                                      C          B,C,D are parts
    (frequent with                         “c ”
      addresses,                                          D      of an identifier
   species names                                              for one real-life thing
         etc!)
                                d
                           D                 E                CRM Schema




Heraklion, April 2, 2002                                                                15
                           Mapping to the CRM:
                              Conclusions

 Mapping to the CRM can serve just as guide for good-
    practice data structures.
 It can be used to create a Semantic Web of cultural
 knowledge.
 It can be used to preserve data in a neutral form.
 Even though mapping can become weird, good data
 structures transform easily, and there are commercial tools.
 No tool can guess all the experts intention in a data
 structure: Domain experts must assist the mapping.

Heraklion, April 2, 2002                                    16

								
To top