Conceptual database design Entity-relationship diagrams - PDF - PDF
Document Sample


Conceptual database design:
Entity-relationship diagrams
Alexis Dimitriadis
Databases for Language Typology
Contents First Last Prev Next Back Close Quit
1. The relational database model
• Data is formally represented as instances of one or more relations.
• A relation (also known as a file) is a table:
Language Name ISO Code Speakers Continent Source ID
English eng 341.000.000 Europe Grimes00
Italian ita 62.000.000 Europe Grimes00
Dyirbal dbl 40 Australia Grimes00
Mongol mgt 340 Asia Grimes00
Halh Mongolian khk 2.330.000 Asia Grimes00
• An attribute is a named column of a relation table, representing a
“property.”
• A tuple is a row of the table, representing a “record.”
Contents First Last Prev Next Back Close Quit
1.1. Keys
• A key is a set of attributes (usually just one) that will always uniquely identify
a record.
• A key consisting of more than one attribute is called a composite key.
• The primary key for a table is the key chosen to uniquely identify tuples
within this table. It is conventionally indicated by underlining.
• A foreign key is an attribute (or set of attributes) within one table, that
matches a key of some (other) table.
Language Name ISO Code Speakers Continent Source ID
English eng 341.000.000 Europe Grimes00
(etc.)
Contents First Last Prev Next Back Close Quit
Example 1
• The Source ID column in the Languages table is a foreign key matching the
Source ID column in the Sources table.
Language details
Language Name ISO Code Speakers Continent Source ID
English eng 341.000.000 Europe Grimes00
(etc.)
Bibliographic sources
Source ID Title Author Year Publisher Address ...
Grimes00 Ethnologue: Languages Grimes, B.F. 2000 SIL Dallas, Texas
of the world & J.E. Grimes
Ashton47 Swahili grammar Ashton, E. O. 1947 Longmans London
(including intonation)
Contents First Last Prev Next Back Close Quit
2. Designing a database
• The relational database model is a simple, but very powerful basis for
structuring our data. But what is the right structure?
• The answer always depends on the nature of the data, and on its intended uses.
• Database specialists have countless procedures, guidelines and rules of thumb
for designing and building a database.
• The guiding principle:
Think about the data first, then about the application program
Contents First Last Prev Next Back Close Quit
2.1. The design process
1. Work out your needs, in writing. If possible, examine real data and/or any
paper forms used in the past.
2. Carry out the conceptual design (abstract design) of the database. This
means thinking about how our data is organized without worrying about the
specifics of tables, keys and attributes.
3. In the logical database design stage, we define tables and relationships
that reflect the conceptual design.
4. The physical design stage involves the actual programming of the
database, taking into account the features and limitations of our specific
DBMS and client.
5. Build a prototype database.
6. Try it out.
7. Clarify your needs, and revise steps 1–6 as necessary.
Contents First Last Prev Next Back Close Quit
3. Preview: What a database should look like
• A correctly structured database should have the following properties:
1. Each cell in a table should contain only one unit of information (value).
2. Each unit of information should be stored only once in a database.
Basic word order
Language Name ISO Code Order Source ID Pages
German deu SOV, SVO Comrie89 89
(etc)
• The above table violates the first rule; we must modify its design:
i. We could add another column (e.g., “Secondary order”)
or. . .
Contents First Last Prev Next Back Close Quit
ii. We could allow a language to occupy multiple rows:
Language Name ISO Code Order Source ID Pages
German deu SOV Comrie89 89
German deu SVO Comrie89 90
• Note that this table has a composite primary key: The language part can
be repeated, as long as the entire key is not unique.
3.1. Avoiding redundancy
• This table violates the second of our rules: The language name is recorded
multiple times. (The primary key is enough to identify the language uniquely).
Contents First Last Prev Next Back Close Quit
• The solution: Language and Word Order must appear on separate tables.
Language details
Language Name ISO Code Speakers Continent Source ID
English eng 341.000.000 Europe Grimes00
German deu 95.392.978 Europe Eth15
(etc.)
Basic Word Order
Language ID Order Source ID Pages
eng SVO Chomsky74 21
deu SOV Comrie89 89
deu SVO Comrie89 90
• The Language ID field (a foreign key) links the two tables.
Contents First Last Prev Next Back Close Quit
4. Entities and relationships
• In the conceptual database design stage, we think about how our data is
organized without worrying about the specifics of tables, keys and attributes.
• We identify entities, or rather entity types, which are idealized objects that
our data is about; and the types of relationships between them.
• Entities don’t need to represent physical objects. Roughly, they correspond to
notions or concepts that we, our project, consider to have “an independent
existence”, and can ask questions about.
• For an application in typology, we can begin by treating languages and
bibliographic sources as entities, and identifying their attributes.
• We then identify a relationship between these entities: A bibliographic source is
cited as the source of the demographic data on each language.
Contents First Last Prev Next Back Close Quit
4.1. ER diagrams
• Database designers use Entity-Relationship diagrams to model conceptual
structure:
• An ER diagram shows entities, entity attributes, and relationships. It also
shows the cardinality of each relationship.
Contents First Last Prev Next Back Close Quit
4.2. Attributes
• Entity attributes are a generalization of the attributes of tables. They express
properties of the entity.
• An attribute should depend only on the entity: In a database containing
languages and books used as sources, page number is not a sensible attribute
for a book.
• An attribute can be simple or complex, if it consists of multiple parts.
Example: Author as an attribute of the Book entity (has first name, last
name, etc.).
• An attribute (simple or complex) can be single-valued or multi-valued.
Example: Author is a multi-valued attribute of a book. Telephone number is
multi-valued for people.
Contents First Last Prev Next Back Close Quit
4.3. Relationships
• We say that a relationship is total (for some type of entity) if every entity of
that type participates in the relationship. It is partial otherwise. Example: A
relationship between languages and grammars, called describes. For which
entities is it total? Partial?
• A relationship is one-to-one if only one of each entity can be related.
Example: A relationship is-the-capital-of, between cities and countries.
• A relationship is one-to-many if one entity type can be related to many
entities of the other type, but not vice versa. Example: A relationship location
between cities and countries.
• A relationship is many-to-many if both entity types can be related to many
entities of the other type. Example: The relationship flies-to between airlines
and cities.
• Relationships can have attributes, just like entities.
Contents First Last Prev Next Back Close Quit
Example 2: Reciprocals
• For a cross-linguistic study of reciprocals, we identify several different ones in
each language. E.g.:
1) They like each other
2) John and Mary argued on the way home
3) We looked at one another
4) They were at one another’s throats
5) They spread rumours about each other
6) Each of them likes the other
• We look for patterns: Our goal is not to study every sentence we come across,
but to identify the distinct kinds of reciprocals and to understand (describe)
each one of them.
Contents First Last Prev Next Back Close Quit
user’s computer
4.4. A common pattern: Language-Construction-Example
1:N
Language Construction
1:N N:N
Source
Example
Person
Contents First Last Prev Next Back Close Quit
Example 3: Languages and countries
• On the Ethnologue database, we find listings of the languages spoken in each
country, and of the countries where each language is spoken.
• What is the right ER diagram for this kind of information?
• What is the cardinality of the relationship?
• Is the relationship total or partial?
Contents First Last Prev Next Back Close Quit
5. Rules of thumb for logical design
• Every entity normally becomes a table.
• A simple one-to-one or one-to-many relationship can be represented with a
foreign key (as in example 1).
• For a many-to-many relationship, we need an additional table; this table
could contain just the primary keys for the two entities being related.
• If a relationship has its own attributes, it probably needs to be given its own
table.
• Multi-valued attributes also need a separate table.
• It is often tidier to place complex attributes in a separate table.
Contents First Last Prev Next Back Close Quit
Example 1: Languages and reciprocals
• There can be multiple reciprocals per language (many to one).
• The LangID column in the Reciprocal table is a foreign key matching the
ISO Code column of the Language table.
Language
Language Name ISO Code Speakers Continent ...
English eng 341.000.000 Europe
(etc.)
Reciprocal construction
ID Name Type Morph. Form LangID
101 Each-other Argument each other eng
102 Null Verbal eng
104 Egymas Argument a
egym´s hun
105 Verb morphology Verbal (various) hun
Contents First Last Prev Next Back Close Quit
Example 2: The Spoken-In relationship
• Each language is spoken in many countries, and in each country there are
many languages spoken. Both are entities.
• The relationship between them is many to many, and requires an additional
table (which can include properties of the relationship, e.g., Population).
Spoken-In
Language ID Country Population ...
eng England 58.100.000
eng Australia 17.400.000
eng U.S.A. 215.000.000
eng South Africa 3.670.000
zul South Africa 9.980.000
(etc.)
• The primary key for the relationship table consists of the keys for the two
related entities: neither of these keys by itself is unique!
Contents First Last Prev Next Back Close Quit
Example 3: Alternative names
• Alternative names are a multivalued attribute of the Language entity. They
must appear in a separate table, since a column can only take one value for
each record.
Alternative names
Language ID Name
zul Isizulu
zul Zunda
nld Nederlands
nld Hollands
(etc.)
• In this case, no additional attributes are needed: Each row of the table contains
just the Language ID (a foreign key) and an alternative name.
Contents First Last Prev Next Back Close Quit
5.1. Values from a fixed set
• In many cases we want the possible values of a field to be restricted to a fixed
set: For example a field Continent, that tells us where a language is spoken,
should take values from just the six continents.
• In this case we also use a table to store the values, possibly with nothing else:
Continents
Continent
Africa
Europe
Asia
(etc.)
Contents First Last Prev Next Back Close Quit
• It is common to use small integers in place of text strings:
Continents
Continent ID Name
1 Africa
2 Europe
3 Asia
(etc.)
• In the Languages table, we define Continent (or Continent ID) as a foreign
key into the Continents table.
Language details
Language ISO Code Speakers Continent ID
German deu 100.000.000 2
(etc.)
Contents First Last Prev Next Back Close Quit
6. Modeling typological properties
• How do we model typological properties? One possibility is to treat them as
attributes of a language:
Language details and properties
Language ISO Code Speakers Continent Demographics Basic WO WO
source WO source pages
German deu 100.000.000 Europe Grimes00 SVO Comrie89 89
(etc.)
• Additional columns need to be created for each typological property we wish to
maintain.
• It can be seen that each property needs a column for its value, and additional
columns for the source citation.
Contents First Last Prev Next Back Close Quit
• Another possibility is to treat typological parameters as entities, and define
them in a separate table (one parameter per row).
• We then create a table that relates them to languages:
Typological properties
Language ID Parameter Value Source ID Source page
swa Basic word order SVO Ashton47 39
lez Basic word order SOV Haspelmath93 5
lez Affixation suffixes Haspelmath93 4
spa Basic word order SVO Butt94 464
(etc.)
• This is a many-to-many relationship between languages and typological
parameters.
• It is still impossible for one language to be paired to the same parameter (e.g.,
“affixation”) twice: this would duplicate a primary key.
Contents First Last Prev Next Back Close Quit
6.1. A flexible database template
• We have seen that the Language-Construction-Example is broadly applicable
to cross-linguistic surveys.
• Storing typological parameter definitions in a table, as we just did, gives us a
generic database design that can be filled in with whatever questions we are
interested in.
• We have created a web database based on this design:
http://languagelink.let.uu.nl/burs/
• The software is configurable, and is already in use by a number of other
projects.
Contents First Last Prev Next Back Close Quit
Contents First Last Prev Next Back Close Quit
Contents First Last Prev Next Back Close Quit
Contents First Last Prev Next Back Close Quit
Contents
1 The relational database model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Designing a database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Preview: What a database should look like . . . . . . . . . . . . . . . . . . 7
4 Entities and relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Rules of thumb for logical design . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Modeling typological properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Contents First Last Prev Next Back Close Quit
Related docs
Get documents about "