Normalization Example (PowerPoint)

Document Sample
Normalization Example (PowerPoint) Powered By Docstoc
Data inconsistency, incorrectness and
incompleteness is (partially) caused by

Normalization is the process of eliminating
redundancy as much as possible, without
losing data.
      Tools of Normalization
Normalization is achieved by
• lossless decomposition
• functional dependency
      Functional dependency
A functional dependency is a 1-n relation of a set
of attributes within a relation to another set of
attributes in the same relation.
Example: anr  last_name
       Functional dependency
  IATA                     Airport
 AMS       Amsterdam Schiphol
 LHR       London Heathrow
 JFK       John F. Kennedy International Airport
           New York City
 KWH       Kiapupe Airport Zaire

 KWH       Khwahan Airport Afghanistan

 BRU       Brussel Zaventem International Luchthaven

Airport is functionally dependent on IATA. For each airport,
exactly one IATA exists. Notation: IATA  Airport
            Case Description
Airline Reservation System (assignment 1999)
• Flights are operated by Carriers
  – flight number, origin, destination, ETD, ETA
  – destination is coded in IATA codes
• Passengers make a Reservation
  – last name, initials, passport, nationality
  – booked, standby, checked in, boarded
Last    initials nationality passport Flight no carrier    dest      orgin
Leune   CJ      NL          1234567 KL123       KLM -      JFK       AMS
                            890                 Royal

Smith   JV      UK          0987654 BA0444 BA –    LHR               ORD
                            32             British

                   Departure              Arrival                 Status

                   29-jun-2000 09:10      29-jun-2000 11:25 Booked
                   05-nov-1999 13:25      05-nov-1999 16:30 Booked
         UNF (continued)
  IATA     Town          Country             Airport
AMS      Amsterdam   Netherlands      Schiphol
JFK      New York    United States    John F. Kennedy Intl.
ORD      Chicago     United States    O’Hare Intl.
LHR      London      United Kingdom   Heathrow
              First Normal Form
A relation is in first normal form if, and only if
all domains contain atomic values only
Problems with current tables:
   • Carrier contains both ID and full name
   • Departure and Arrival contains both date and time
   • Split up columns
                              1 NF

   •Flights without reservations do not exist
   •Carriers without flights do not exist
        Second Normal Form
A relation is in second normal form if, and only if
the relation is in first normal form and if all non
key-attributes are fully functionally dependent
on the primary key.

Full functional dependency A  B is a functional
dependency on B of A, so that B is not dependant on a part
of A.
     anr + last_name  address,
1NF Dependency Diagram
               2NF Violations
• FlightNumber is a determinant for Destination, Origin,
  CarrierID and CarrierName
• Passport is a determinant for Initials, LastName and
• FlightNumber and DepartureDate together determine
  DepartureTime, ArrivalDate and ArrivalTime.
• FlightNumber, Passport and DepartureDate together
  determine Status.
Splitting up the tables...

                  Carriers without flights
                  cannot be stored in the
             Third Normal Form
   A relation is in third normal form if, and only if the
   relation is in second normal form and if all non
   key-attributes are non-transitively dependent
   on the primary key.

A transitive dependency is a situation where two functional
dependencies are related in the sense that
        A  B and B  C
In other words; C is transitively dependent on A
2NF Dependency Diagram

              3NF violation:
              FlightNo  CarrierId
              CarrierId  CarrierName
Splitting up the tables...
      Boyce-Codd Normal Form
Relations in 3NF can still contain redundancy when:

1. A relation has two or more candidate keys, and
2. Candidate keys are composite keys, and
3. Candidate keys overlap.

Boyce-Codd Normal form:
A relation is in BCNF if all determinants are candidate keys.

Usually, a relation that is in 3NF is in BCNF too.
      Conclusion Normalization
1NF: a relation is in 1NF iff all cells contain atomic values
2NF: a relation is in 2NF iff it is in 1NF and all non-key
attributes are fully functionally dependent on the primary key
3NF: a relation is in 3NF iff it is in 2NF and all non-key
attributes are non-transitively dependent on the primary key
BCNF: a relation is in BCNF iff all determinants are
candidate keys.

Tools of normalization are functional dependency and
lossless decomposition.
 Normalisation assignment 2000
Case Study: Digital Library

Database components:
 Document Records
A thesaurus consists of Terms and Relationships

  information retrieval - narrower term - text retrieval
  TERM                    Relationship TERM

A document is classified into a (set of) thesaurus entries.
    Linking records with thesauri
  “Conceptualising information spaces in federated digital
  libraries” (Papazoglou M, Hoppenbrouwers J) 1999
  SIGMOD Record
  has keywords: meta data; semantics; information retrieval;
  query optimisation; federated data base; electronic library
                    UNF digital library
Bibliographic Records Table:
Title Authors Source Year Keywords Summary Category Location Language Type
….   ….       …./…. ….     ….         ….     ….          ….        ….     ….
     ….                    ….
     ….                    ….

Thesaurus Table:
 Term        UF          USE BT            NT                 RT

 Information CD-Rom          Information   - Browsing         Searching
 retrieval   searching       science       - Knowledge

                           NOT A GOOD IDEA!

Shared By: