Conceptual database design Entity-relationship diagrams - PDF - PDF

Document Sample
scope of work template
							Conceptual database design:
Entity-relationship diagrams

    Alexis Dimitriadis
 Databases for Language Typology




      Contents   First   Last   Prev   Next   Back   Close   Quit
1.    The relational database model
• Data is formally represented as instances of one or more relations.

• A relation (also known as a file) is a table:

     Language Name      ISO Code        Speakers      Continent     Source ID
     English                eng       341.000.000       Europe       Grimes00
     Italian                ita        62.000.000       Europe       Grimes00
     Dyirbal                dbl                40      Australia     Grimes00
     Mongol                 mgt               340        Asia        Grimes00
     Halh Mongolian         khk         2.330.000        Asia        Grimes00


• An attribute is a named column of a relation table, representing a
  “property.”

• A tuple is a row of the table, representing a “record.”
                           Contents   First   Last   Prev   Next   Back   Close   Quit
1.1.    Keys

• A key is a set of attributes (usually just one) that will always uniquely identify
  a record.

• A key consisting of more than one attribute is called a composite key.

• The primary key for a table is the key chosen to uniquely identify tuples
  within this table. It is conventionally indicated by underlining.

• A foreign key is an attribute (or set of attributes) within one table, that
  matches a key of some (other) table.

    Language Name ISO Code                Speakers      Continent     Source ID
    English                  eng        341.000.000       Europe       Grimes00
    (etc.)


                             Contents   First   Last   Prev   Next   Back   Close   Quit
Example 1

• The Source ID column in the Languages table is a foreign key matching the
  Source ID column in the Sources table.

                       Language details
    Language Name ISO Code   Speakers Continent                               Source ID
    English                   eng          341.000.000           Europe        Grimes00
    (etc.)

                              Bibliographic sources
 Source ID    Title                      Author           Year     Publisher     Address          ...
 Grimes00     Ethnologue: Languages      Grimes, B.F.     2000     SIL           Dallas, Texas
              of the world               & J.E. Grimes
 Ashton47     Swahili grammar            Ashton, E. O.    1947     Longmans      London
              (including intonation)



                              Contents     First   Last   Prev     Next   Back     Close   Quit
2.    Designing a database
• The relational database model is a simple, but very powerful basis for
  structuring our data. But what is the right structure?

• The answer always depends on the nature of the data, and on its intended uses.

• Database specialists have countless procedures, guidelines and rules of thumb
  for designing and building a database.

• The guiding principle:

        Think about the data first, then about the application program




                            Contents   First   Last   Prev   Next   Back   Close   Quit
2.1.   The design process
  1. Work out your needs, in writing. If possible, examine real data and/or any
     paper forms used in the past.
  2. Carry out the conceptual design (abstract design) of the database. This
     means thinking about how our data is organized without worrying about the
     specifics of tables, keys and attributes.
  3. In the logical database design stage, we define tables and relationships
     that reflect the conceptual design.
  4. The physical design stage involves the actual programming of the
     database, taking into account the features and limitations of our specific
     DBMS and client.
  5. Build a prototype database.
  6. Try it out.
  7. Clarify your needs, and revise steps 1–6 as necessary.

                            Contents   First   Last   Prev   Next   Back   Close   Quit
3.     Preview: What a database should look like
• A correctly structured database should have the following properties:

     1. Each cell in a table should contain only one unit of information (value).
     2. Each unit of information should be stored only once in a database.

                               Basic word order
       Language Name         ISO Code    Order            Source ID       Pages
             German              deu       SOV, SVO           Comrie89        89
              (etc)


• The above table violates the first rule; we must modify its design:

 i. We could add another column (e.g., “Secondary order”)
                                    or. . .
                             Contents   First   Last   Prev     Next   Back   Close   Quit
 ii. We could allow a language to occupy multiple rows:

         Language Name ISO Code                Order    Source ID     Pages
              German             deu           SOV       Comrie89       89
              German             deu           SVO       Comrie89       90


• Note that this table has a composite primary key: The language part can
  be repeated, as long as the entire key is not unique.


3.1.   Avoiding redundancy

• This table violates the second of our rules: The language name is recorded
  multiple times. (The primary key is enough to identify the language uniquely).




                           Contents    First    Last   Prev   Next   Back    Close   Quit
• The solution: Language and Word Order must appear on separate tables.

                       Language details
    Language Name ISO Code   Speakers Continent                        Source ID
    English                 eng         341.000.000       Europe        Grimes00
    German                  deu          95.392.978       Europe         Eth15
    (etc.)

                           Basic Word Order
                 Language ID Order Source ID                  Pages
                      eng              SVO      Chomsky74      21
                      deu              SOV       Comrie89      89
                      deu              SVO       Comrie89      90


• The Language ID field (a foreign key) links the two tables.

                            Contents    First   Last   Prev   Next    Back   Close   Quit
4.    Entities and relationships
• In the conceptual database design stage, we think about how our data is
  organized without worrying about the specifics of tables, keys and attributes.

• We identify entities, or rather entity types, which are idealized objects that
  our data is about; and the types of relationships between them.

• Entities don’t need to represent physical objects. Roughly, they correspond to
  notions or concepts that we, our project, consider to have “an independent
  existence”, and can ask questions about.

• For an application in typology, we can begin by treating languages and
  bibliographic sources as entities, and identifying their attributes.

• We then identify a relationship between these entities: A bibliographic source is
  cited as the source of the demographic data on each language.


                            Contents   First   Last   Prev   Next   Back   Close   Quit
4.1.   ER diagrams

• Database designers use Entity-Relationship diagrams to model conceptual
  structure:




• An ER diagram shows entities, entity attributes, and relationships. It also
  shows the cardinality of each relationship.



                          Contents   First   Last   Prev   Next   Back   Close   Quit
4.2.    Attributes

• Entity attributes are a generalization of the attributes of tables. They express
  properties of the entity.

• An attribute should depend only on the entity: In a database containing
  languages and books used as sources, page number is not a sensible attribute
  for a book.

• An attribute can be simple or complex, if it consists of multiple parts.
  Example: Author as an attribute of the Book entity (has first name, last
  name, etc.).

• An attribute (simple or complex) can be single-valued or multi-valued.
  Example: Author is a multi-valued attribute of a book. Telephone number is
  multi-valued for people.



                            Contents   First   Last   Prev   Next   Back   Close   Quit
4.3.    Relationships

• We say that a relationship is total (for some type of entity) if every entity of
  that type participates in the relationship. It is partial otherwise. Example: A
  relationship between languages and grammars, called describes. For which
  entities is it total? Partial?

• A relationship is one-to-one if only one of each entity can be related.
  Example: A relationship is-the-capital-of, between cities and countries.

• A relationship is one-to-many if one entity type can be related to many
  entities of the other type, but not vice versa. Example: A relationship location
  between cities and countries.

• A relationship is many-to-many if both entity types can be related to many
  entities of the other type. Example: The relationship flies-to between airlines
  and cities.

• Relationships can have attributes, just like entities.

                             Contents   First   Last   Prev   Next   Back   Close   Quit
Example 2: Reciprocals

• For a cross-linguistic study of reciprocals, we identify several different ones in
  each language. E.g.:

     1)   They like each other
     2)   John and Mary argued on the way home
     3)   We looked at one another
     4)   They were at one another’s throats
     5)   They spread rumours about each other
     6)   Each of them likes the other


• We look for patterns: Our goal is not to study every sentence we come across,
  but to identify the distinct kinds of reciprocals and to understand (describe)
  each one of them.


                             Contents   First   Last   Prev   Next   Back   Close   Quit
            user’s computer
4.4.   A common pattern: Language-Construction-Example


                                   1:N
        Language                                         Construction


                                   1:N                              N:N
          Source
                                                           Example
          Person




                        Contents   First   Last   Prev    Next   Back   Close   Quit
Example 3: Languages and countries

• On the Ethnologue database, we find listings of the languages spoken in each
  country, and of the countries where each language is spoken.

• What is the right ER diagram for this kind of information?

• What is the cardinality of the relationship?

• Is the relationship total or partial?




                             Contents     First   Last   Prev   Next   Back   Close   Quit
5.    Rules of thumb for logical design
• Every entity normally becomes a table.

• A simple one-to-one or one-to-many relationship can be represented with a
  foreign key (as in example 1).

• For a many-to-many relationship, we need an additional table; this table
  could contain just the primary keys for the two entities being related.

• If a relationship has its own attributes, it probably needs to be given its own
  table.

• Multi-valued attributes also need a separate table.

• It is often tidier to place complex attributes in a separate table.




                            Contents   First   Last   Prev   Next   Back   Close   Quit
Example 1: Languages and reciprocals

• There can be multiple reciprocals per language (many to one).

• The LangID column in the Reciprocal table is a foreign key matching the
  ISO Code column of the Language table.

                          Language
        Language Name ISO Code   Speakers                          Continent         ...
        English                     eng           341.000.000          Europe
        (etc.)
                            Reciprocal construction
          ID      Name                  Type        Morph. Form              LangID
          101     Each-other        Argument           each other             eng
          102     Null               Verbal                                   eng
          104     Egymas            Argument                    a
                                                           egym´s             hun
          105     Verb morphology    Verbal               (various)           hun


                             Contents     First    Last     Prev      Next    Back    Close   Quit
Example 2: The Spoken-In relationship

• Each language is spoken in many countries, and in each country there are
  many languages spoken. Both are entities.

• The relationship between them is many to many, and requires an additional
  table (which can include properties of the relationship, e.g., Population).
                                   Spoken-In
                  Language ID     Country    Population               ...
                  eng             England              58.100.000
                  eng             Australia            17.400.000
                  eng             U.S.A.              215.000.000
                  eng             South Africa          3.670.000
                  zul             South Africa          9.980.000
                  (etc.)

• The primary key for the relationship table consists of the keys for the two
  related entities: neither of these keys by itself is unique!

                            Contents   First   Last     Prev   Next         Back   Close   Quit
Example 3: Alternative names

• Alternative names are a multivalued attribute of the Language entity. They
  must appear in a separate table, since a column can only take one value for
  each record.

                               Alternative names
                            Language ID Name
                            zul                Isizulu
                            zul                Zunda
                            nld                Nederlands
                            nld                Hollands
                            (etc.)


• In this case, no additional attributes are needed: Each row of the table contains
  just the Language ID (a foreign key) and an alternative name.


                            Contents   First    Last   Prev   Next   Back   Close   Quit
5.1.    Values from a fixed set

• In many cases we want the possible values of a field to be restricted to a fixed
  set: For example a field Continent, that tells us where a language is spoken,
  should take values from just the six continents.

• In this case we also use a table to store the values, possibly with nothing else:

                                   Continents
                                   Continent
                                   Africa
                                   Europe
                                   Asia
                                   (etc.)




                             Contents   First   Last   Prev   Next   Back   Close   Quit
• It is common to use small integers in place of text strings:

                                   Continents
                            Continent ID       Name
                                   1           Africa
                                   2           Europe
                                   3           Asia
                                               (etc.)


• In the Languages table, we define Continent (or Continent ID) as a foreign
  key into the Continents table.

                                Language details
                 Language    ISO Code Speakers           Continent ID
                 German         deu        100.000.000          2
                 (etc.)



                            Contents   First   Last      Prev   Next   Back   Close   Quit
6.    Modeling typological properties
• How do we model typological properties? One possibility is to treat them as
  attributes of a language:

                         Language details and properties
 Language   ISO Code    Speakers Continent Demographics                Basic    WO         WO
                                                  source               WO      source     pages
 German        deu       100.000.000    Europe          Grimes00       SVO     Comrie89        89
 (etc.)


• Additional columns need to be created for each typological property we wish to
  maintain.

• It can be seen that each property needs a column for its value, and additional
  columns for the source citation.


                             Contents    First   Last    Prev   Next    Back   Close    Quit
• Another possibility is to treat typological parameters as entities, and define
  them in a separate table (one parameter per row).

• We then create a table that relates them to languages:

                            Typological properties
     Language ID     Parameter           Value     Source ID         Source page
           swa       Basic word order    SVO       Ashton47                    39
            lez      Basic word order    SOV       Haspelmath93                 5
            lez      Affixation            suffixes    Haspelmath93                 4
           spa       Basic word order    SVO       Butt94                     464
          (etc.)


• This is a many-to-many relationship between languages and typological
  parameters.

• It is still impossible for one language to be paired to the same parameter (e.g.,
  “affixation”) twice: this would duplicate a primary key.

                            Contents    First   Last   Prev   Next   Back   Close   Quit
6.1.   A flexible database template

• We have seen that the Language-Construction-Example is broadly applicable
  to cross-linguistic surveys.

• Storing typological parameter definitions in a table, as we just did, gives us a
  generic database design that can be filled in with whatever questions we are
  interested in.

• We have created a web database based on this design:
                  http://languagelink.let.uu.nl/burs/

• The software is configurable, and is already in use by a number of other
  projects.




                            Contents   First   Last   Prev   Next   Back   Close   Quit
Contents   First   Last   Prev   Next   Back   Close   Quit
Contents   First   Last   Prev   Next   Back   Close   Quit
Contents   First   Last   Prev   Next   Back   Close   Quit
                                     Contents

1   The relational database model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2   Designing a database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3   Preview: What a database should look like . . . . . . . . . . . . . . . . . . 7
4   Entities and relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5   Rules of thumb for logical design . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6   Modeling typological properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23




                          Contents        First     Last      Prev      Next      Back      Close       Quit

						
Related docs