Mapping a Data Structure to the
CIDOC Conceptual Reference Model
Martin Doerr
(ICS-FORTH, Crete, Greece)
Heraklion, Crete, April 2, 2002
Heraklion, April 2, 2002 1
What Means Mapping of
One Schema to Another
Defining an (automated) transformation of each instance of
schema 1 into an instance of schema 2 with the same
meaning.
CRM Approach:
Interpretation of schema 1 as semantic model (nodes and links),
mapping each element of that to an equivalent CIDOC CRM path,
such that each instance of an element of the semantic model 1 can
be converted into a valid instance of the CIDOC CRM with the same
meaning.
This is the most simple theory. Works for good structures
Heraklion, April 2, 2002 2
Interpreting a Schema
as Semantic Model
1. Interpreting tables, columns as entities
2. Interpreting records as entity instances
3. Interpreting fieldnames as relationships and entities
4. Interpreting field contents as entity instances
Each field is interpreted as entity-relationship-entity (e-r-e)
The whole schema is decomposed into e-r-e’s
Each e-r-e is mapped individually to the CRM.
Heraklion, April 2, 2002 3
Interpreting a Schema
as Semantic Model, Example
The field name stands for a The field contents stand for
relationship and the kind of contents an entity instance :
Object has ID:
1975-7309
1975-7309
ID 1975-7309
Category NRM - Railway furniture
Description Armchair, Upholstered in blue moquette with curved,
buttoned back & scroll arms. Wooden legs
Item name(s) armchairs (AAT Hierarchy: Furnishings)
The whole record
Part Aspect Term (AAT Hierarchy)
corresponds
overall physical upholstering Processes &
to one entity: descriptor techniques
It stands for one object overall material moquette Materials
which is not referred to overall colour blue Color
legs material wood Materials
back physical buttoning Processes &
descriptor techniques
back shape curved Physical attributes
arms shape scrolled arms Components (data example from the
Science Museum of London)
Heraklion, April 2, 2002 4
Mapping the First Element:
Creating an Equivalent Proposition
Whole “has ID ” Source Schema
ID interpretation
Record
Object Instance,
1975-7309
1975-7309 valid for both
schemata
maps to:
Man-Made is identified by Object
CRM Schema
Object Identifier
Possible Mapping Annotation:
Whole Record = E22 Man-Made Object
ID = E42 Object identifier
Whole Record->ID = P47 is identified by
Possible CRM instance Annotation:
Object 1975-7309 (E22: Man-Made_Object)
is_identified_by 1975-7309 (E42 Object_Identifier)
Heraklion, April 2, 2002 5
Mapping the Interpreted Schema
to the CRM
Each Entity-link-entity can be instantiated as self-explanatory, context
independent proposition
The mapping allows to create sets of propositions equivalent to the
meaning of each source document, but in terms of the CIDOC CRM.
As the CRM-compatible propositions are self-explanatory, they can be
merged into huge knowledge pools and the document boundaries can be
ignored.
buzz words: Data warehouses, Semantic Web
Heraklion, April 2, 2002 6
Interpreting a Schema:
Advanced Stuff: Value Dependency
The first field name stands for a The field contents stands for
relationship and the kind of contents an entity instance :
Object has part: legs of obj.
1975-7309 1975-7309
ID 1975-7309
Category NRM - Railway furniture
Mapping condition: Description Armchair, Upholstered in blue moquette with curved,
If part = overall, buttoned back & scroll arms. Wooden legs
it stands for the whole Item name(s) armchairs (AAT Hierarchy: Furnishings)
Part Aspect Term (AAT Hierarchy)
overall physical upholstering Processes &
descriptor techniques
overall material moquette Materials
overall colour blue Color
The whole row corresponds
to one entity: legs material wood Materials
It stands for one part back physical buttoning Processes &
descriptor techniques
back shape curved Physical attributes
arms shape scrolled arms Components
Heraklion, April 2, 2002 7
Mapping under condition:
Creating an equivalent statement
Whole “has Part ” Row Source Schema
Record “Part” interpretation
Object legs of obj. Instance,
1975-7309 1975-7309 valid for both
schemata
maps to: If Part /= “overall”
Man-Made is composed of Man-Made
CRM Schema
Object Object
Possible Mapping Annotation:
Whole Record = E22 Man-Made Object
Row “Part” = E22 Man-Made Object
If (in Row “Part”, Part /= “overall”) then
Whole Record-> Row “Part” = P46 is composed of
Possible CRM instance Annotation:
Object 1975-7309 (E22: Man-Made_Object)
is_composed_of legs of 1975-7309 (E22: Man-Made_Object)
Heraklion, April 2, 2002 8
Interpreting a Schema:
Advanced Stuff: Values as Properties
The field “Aspect” contents The field contents stands for
state a relationship an entity instance :
Object has material:
moquette
1975-7309
ID 1975-7309
Category NRM - Railway furniture
Description Armchair, Upholstered in blue moquette with curved,
buttoned back & scroll arms. Wooden legs
Item name(s) armchairs (AAT Hierarchy: Furnishings)
Value based mapping
If part = overall,
AND Part Aspect Term (AAT Hierarchy)
Aspect = material overall physical upholstering Processes &
descriptor techniques
overall material moquette Materials
overall colour blue Color
legs material wood Materials
back physical buttoning Processes &
descriptor techniques
back shape curved Physical attributes
arms shape scrolled arms Components
Heraklion, April 2, 2002 9
Interpreting a Schema:
Advanced Stuff: Mapping to Paths
The field “Aspect” contents The field contents stands for
state a relationship an entity instance :
Object has physical descriptor:
upholstering
1975-7309
ID 1975-7309
Category NRM - Railway furniture
Description Armchair, Upholstered in blue moquette with curved,
buttoned back & scroll arms. Wooden legs
Item name(s) armchairs (AAT Hierarchy: Furnishings)
Value based mapping
If part = overall,
AND Part Aspect Term (AAT Hierarchy)
Aspect = physical descriptor overall physical upholstering Processes &
descriptor techniques
overall material moquette Materials
overall colour blue Color
legs material wood Materials
back physical buttoning Processes &
descriptor techniques
back shape curved Physical attributes
arms shape scrolled arms Components
Heraklion, April 2, 2002 10
Mapping to Paths:
Introducing an intermediate node
Whole “has physical descriptor ” Source Schema
Term interpretation
Record
If Part = “overall” &
Aspect= physical descriptor
Object Instance of
upholstering
1975-7309 source
maps to: Instance of
Object Obj.1975-7309 upholstering
1975-7309 Production target
Man-Made was produced by used general technique
Production Type CRM Schema
Object
Possible Mapping Annotation:
Whole Record = E22 Man-Made Object
Term = E55 Type
If Part = “overall” & Aspect= physical descriptor
Whole Record-> Term = P108 was produced by – E12 Production
- P32 used general technique
Possible CRM instance Annotation:
Object 1975-7309 (E22: Man-Made_Object)
was_produced_by Obj. 1975-7309 Production (E12: Production)
used general technique upholstering (E55 Type)
Heraklion, April 2, 2002 11
Interpreting a Schema:
Advanced Stuff: Nested Structures
The contents of field “Aspect” The field contents stands for
state a relationship an entity instance :
legs of obj. has material:
1975-7309 wood
ID 1975-7309
Category NRM - Railway furniture
Value based mapping Description Armchair, Upholstered in blue moquette with curved,
If Aspect = material buttoned back & scroll arms. Wooden legs
Item name(s) armchairs (AAT Hierarchy: Furnishings)
Part Aspect Term (AAT Hierarchy)
overall physical upholstering Processes &
descriptor techniques
overall material moquette Materials
overall colour blue Color
The whole row corresponds
to one entity: legs material wood Materials
If part /= overall back physical buttoning Processes &
it stands for one part descriptor techniques
back shape curved Physical attributes
arms shape scrolled arms Components
Heraklion, April 2, 2002 12
Mapping Nested Structures :
Continuing on a Range Entity
Row “has material ” Source Schema
Term
“Part” interpretation
Object legs of obj. Instance,
1975-7309 1975-7309
wood valid for both
If Part /= “overall” &
Aspect = “material”
schemata
maps to:
Man-Made consists of CRM Schema
Object Material
Possible Mapping Annotation:
Row “Part” = E22 Man-Made Object
If Aspect= “material”
Term = E57 Material
Row “Part” -> Term = P45 consists of
Possible CRM instance Annotation:
Object 1975-7309 (E22: Man-Made_Object)
is_composed_of legs of 1975-7309 (E22: Man-Made_Object)
consists_of wood (E57 Material)
Heraklion, April 2, 2002 13
Other Forms of Maps:
Cases of Heterogeneity
“a ”
A B Source Schema
“b ” interpretation
Parallel C
to nested:
c d
D E F CRM Schema
“a ”
A B Source Schema
“b ” interpretation
Parallel C
to intermediate-
parallel:
(frequent with
events!) c d
D E F
CRM Schema
e
G
Heraklion, April 2, 2002 14
Other Mapping Forms:
Cases of Heterogeneity
“a ”
Compound A B Source Schema
contraction: “b ” interpretation
C B,C,D are parts
(frequent with “c ”
addresses, D of an identifier
species names for one real-life thing
etc!)
d
D E CRM Schema
Heraklion, April 2, 2002 15
Mapping to the CRM:
Conclusions
Mapping to the CRM can serve just as guide for good-
practice data structures.
It can be used to create a Semantic Web of cultural
knowledge.
It can be used to preserve data in a neutral form.
Even though mapping can become weird, good data
structures transform easily, and there are commercial tools.
No tool can guess all the experts intention in a data
structure: Domain experts must assist the mapping.
Heraklion, April 2, 2002 16