Themes
and
Directions
Charlie Bachman, Dist. FBCS
Hosted by the
BCS Data Management Interest Group
and UK DAMA
London, England, October 21, 2008
2
Themes
1. Layers of Abstraction
2. Evolution of Network Data Models
3. Generalized Systems
3
Motivations for Abstraction
• To break large systems into a set of layers
where a coordinated set of functions are to
be offered.
• To establish a discipline as to who may
request a given service, and who will carry
out the functions associated with that
service.
• Being able to replace one implementation
of a layer, with another, by matching its
external specifications.
4
Rules for
Layers of Abstraction
• Each layer offers services only to the
layer immediately above it.
• Each layer only calls on the services
of the layer immediately below it.
• The two adjacent layers must have
explicit agreement as to their planned
actions and responses.
• Communication within a layer is by a
well defined protocol. 5
Theme: Layers of Abstraction
Directions:
• ISO/TC97/Open Systems Interconnection
“Seven Layer” communication model
• Database Management (DBMS) model
• The “Three Schema” approach to data
independence
6
Reviewing the
OSI Layers of Abstraction,
circa 1980
1. application layer
2. presentation layer
3. session layer
4. transport layer
5. network layer
6. data link layer
7. physical layer 7
Theme: Layers of Abstraction
Directions:
• ISO/TC97/Open Systems Interconnection
“Seven Layer” communication model
• Database Management (DBMS) model
• The “Three Schema” approach to data
independence
9
Reviewing the
DBMS Layers of Abstraction,
circa 1964
1. application layer
2. presentation layer
3. physical record layer
4. virtual memory layer
5. storage device layer
11
Theme: Layers of Abstraction
Directions:
• ISO/TC97/Open Systems Interconnection
“Seven Layer” communication model
• Database Management (DBMS) model
• The “Three Schema” approach to data
independence
13
Data Independence and the
“Three Schema” Approach
Direction:
• In the mid 1970s, an ANSI/SPARC
study group began studying the problem
of sharing file data and database data
between application programs with
different histories of development and
different formats.
14
Early Practice:
• The early practice, to facilitate the transfer
of data between a pair of programs, was to
look at their formats and to write a
dedicated translation program which could
specifically mediate between the
differences in their data and relational
structures.
15
Two Schema Approach
C
B
1
A
6
1
5
2 4
3 4
16
Simplifying the Work
• They came up with the three schema approach:
i.e., the external schema, the conceptual
schema, and the internal schema.
• This required a map between each external
schema and the conceptual schema.
• This also required a map between each internal
schema and the conceptual schema.
• With these maps, taken pair-wise, they could
compute as many of direct mappings as might
be required.
17
Three Schema Approach
Computed
Maps
18
The Three Schema Approach
Yields Two Layers of Abstraction,
when the “external” and “internal” schema were
recognized as really being the same, i.e., a
“physical record” schema, as seen from
different points of view.
1. Conceptual Schema Layer
2. Physical Record Schema Layer
19
The Conceptual Schema
Needs to Understand:
1. all record, table, other record-like
constructs, as entity types,
2. all field, column, other item-like
constructs, as attributes,
3. all chains, sorted records, primary and
foreign keys, other relationship-like
constructs, as sets of binary relationships,
4. and all other data modeling constructs
appropriately, at the conceptual level.
20
Reviewing the
OSI Layers of Abstraction,
circa 1980
1. application layer 1. application layer
2. presentation layer 2. presentation layer
3. session layer 3. session layer
4. transport layer 4. transport layer
5. network layer 5. network layer
6. data link layer 6. data link layer
7. physical layer 7. physical layer
21
Reviewing the
DBMS Layers of Abstraction,
circa 1964
1. application layer
2. presentation layer
3. physical record
4. virtual memory layer
5. memory device layer
22
The Layers of Abstraction and
the Conceptual Schema provide
the means by which the
Presentation Layer could support:
1) data communication,
2) database management, and
3) middleware.
23
Themes
• Layers of Abstraction
• Evolution of Network Data Models
• Generalized Systems
26
IDS: Owner/Member Sets
(early 1960s)
• original network data model (1962)
• explicit O/M set declarations
• optional member associations
• ordered member sets
• multiple membership record types
• cascading deletes
(all, as seen in CA-IDMS today) 30
Role-Oriented Capabilities:
• Provides the ability to bind together two or more records,
when records of differing types represent the same
business, or real world entity,
* if “Dr. Smith” and “Baby Joe”
represent the same real-world person,
then,
* when the record representing Dr.
Smith is retrieved, the existence of
record of representing Baby Joe is
known and directly accessible,
and visa versa.
31
The Partnership Set
Concept
• Built on non-directed binary relationships.
• Terminates both ends of a relationship at a
partnership set.
• Each partnership set may have cardinality
constrains with regard to the numbers and
types of relationships that may be
concurrently terminated there.
• One-to-one, one-to-many, and many-to-many
cardinalities are supported.
32
Nouns, As Entities
• The boy hit the ball.
• The ball was hit by the boy.
The words, “boy” and “ball”,
are both nouns, being used as the
subject or the object of a sentence.
33
Verbs and Verb-Phrases
• The boy hits the ball.
• The ball was hit by the boy.
The words, "hits” and “was hit by,” can be used
as partnership set names, which apply to the
same relationship, based on the subject’s point
of view.
There are two entities, two partnership sets, and
a single binary relationship.
34
Dimensions and Domains
• Dimensions define information spaces used
to categorize compatible values or measures.
• Domains define the means for representing
comparable values, of the same dimension.
“Feet,” “inches,” “miles,” “millimeters,”
“meters,” and “kilometers” are names of
different domains, used to capture the
measure of a value, representing the
“distance” or “length” dimension.
39
Inheritance is the Means for
Factoring or Simplifying a Data
Structure
• This is the same thing that we do when we
factor or simplify an algebraic expression.
• This minimizes the number of data model
elements, reduces the chance of errors in
defining common constructs, and helps the
user to understand the total model.
• Inheritance and multiple inheritance offer
important data modeling opportunities.
42
The Partnership Set Oriented,
Network Data Model (today)
• started with the original IDS “owner/member set” data
model (Bachman Diagrams)
• added role-oriented capabilities
• added partnership set capabilities
• added one-to-one, and many-to-many and relationships
• added recursive relationships
• added dimensions and domains
• added inheritance
44
Themes
• Layers of Abstraction
• The Evolution of Network Data
Models
• Generalized Systems
47
What are Generalized Systems?
• They are systems designed to support the
needs of a large number of users.
• Typically, they are created by a vendor
with the intent to be sold or licensed.
• Compilers and operating systems were
early examples.
• Accounting, personnel, and resource
planning systems are newer candidates.
• Internet browsers are very new.
48
Arguments For:
• Generalized systems are quicker to install,
and should cost less than custom tailored
systems.
• Generalized systems have fewer startup
problems.
• Generalized systems have a broader set of
functions and options.
• Generalized system developers can afford
to invest the time required to build a well
engineered solution. 49
Arguments Against:
• Generalized systems will not fit my
business requirements – “we are
different.”
• Generalized systems will be too slow,
trying to solve everybody’s problems.
• Generalized systems will be wasteful of
computer and communication resources.
• Generalized systems will force us to use
associated package systems, foreign to
our operations. 50
Cord Blood Registry (Cbr)
• Cbr is in the business of collecting,
processing, freezing, and storing blood
cells from new born babies.
• These blood cells can provide stem cells
for that child, or a near relative, if a stem
cell transplant should ever be necessary.
• Capturing and preserving records of all
aspects of the business is essential to
the integrity of the business and the
storage of blood cells.
51
Cbr Enterprise Model - 2005
53
Cbr ERP Structure, circa 2005
54
Cbr ERP Structure, circa 2005
55
Item Schema object and the
specialized sub schema objects
that inherit properties from it.
56
Item Schema object and the
specialized sub schema objects
that inherit properties from it.
57
Enterprise Resource Planning:
Generalized Items:
• The focus is on the similarities and
differences in four kinds of “items.”
* material-oriented (one time use)
* resource-oriented (serial re-use)
* information-oriented (unlimited use)
* project-oriented (one-of-a-kind)
58
Inheritance: A Big Boost
• Inheritance, as a data modeling and
programming tool, is a powerful plus in
support of generalization.
• Inheritance permits factoring out common
data properties and functions for one-time
definition and the reduction of
programming errors.
• It greatly reduces the complexity of data
models by grouping similar objects
showing how they are the same and how
they are different. 59
Generalized “NdbaEntityObject”
• Cbr created the NdbaEntityObject” to
represent a super entity class, which could
support a number of specialized roles,
which all make heavy use of a set of
common data properties.
“Ndba” (New DataBase Architecture)
65
Some of the NdbaEntityObjects
• person • organization
* caregiver * carrier
** doctor * insurance company
** nurse * bank
** * supplier
* * other caregiver * customer
* hospital
* employee *
* patient * other organization
*
* other person • other NdbaEntityObject
66
NdbaEntityObjects:
Generalized Properties
• addresses, each with its purpose
• identifiers, each with purpose
• phone numbers, each with its purpose
• names, each with its purpose, and
• internet addresses, each with its
purpose
67
Mapping Packaged Systems
Communications
68
Mapping Packaged Systems
Processes
Communications
69
Mapping Packaged Systems
Processes
Communications
People Organizations
70
The Enterprise Model is the
Gold Standard
• The packaged application systems provide a
practical realization.
• They must be evaluated to see if they will
provide an acceptable solution.
• Custom-coded applications fill the vacuum.
• Data exchange between packaged systems and
between packaged systems and custom
systems must be bridged:
1. by manual processes, and
2. by middleware.
71
In Summary:
Better Abstractions Lead To:
• More robust systems
• Partitioning of work effort.
• Reduction of errors
• Interchangeable parts
• Reduction in complexity
72
Better Data Models Led To:
• Greater semantic power
• Better understanding of our business
systems
• Greater support for data independence,
and therefore greater support for
interconnecting heterogeneous
business systems.
73
Better Generalizations Lead To
Better Packaged Systems
• Finance and Accounting
• Manufacturing
• Personnel
• Marketing
• Communication
• Industry-Specific Systems
74
Better Abstractions
Better Data Models
Better Generalizations
75