Document Sample
is464t22 Powered By Docstoc
					         Chapter 5:
  Logical Database Design                          Introduction to Normalization
  and the Relational Model
          Part 2: Normalization                    Normalization: The process of transforming
                                                   relations into forms that make them easier
     Modern Database Management                    to manage in certain situations.
             6th Edition                           Objective of normalization: To eliminate
 Jeffrey A. Hoffer, Mary B. Prescott, Fred R.      modification anomalies from a relation,
                   McFadden                        which are problems that can occur when a
           Robert C. Nickerson                     relation is modified (that is, when data is
         ISYS 464 – Spring 2003                    added, deleted, or changed).
                Topic 22

           Normal Forms                                       Normal Forms
Normal forms: Forms that a relation can take
during normalization.                              Any relation in a higher normal form is also
First normal form – 1NF                   Lower    in all lower normal forms.
Second normal form – 2NF                           Each higher normal form results in fewer
Third normal form – 3NF                            modification anomalies.
Boyce-Codd normal form - BCNF
                                                   A relation in DKNF has no modification
Fourth normal form –4NF
Fifth normal form –5NF
Domain/key normal form – DKNF             Higher

         Practical View of                                  Practical View of
          Normalization                                      Normalization
Only up through 4NF is practical.                  Higher normal forms make it easier to update a
                                                   database. In lower normal forms, modification
We don't even know fully the consequences
                                                   anomalies have to be handled through complex
of a relation not being in 5NF.                    programming that takes longer to execute.
It is very difficult to determine if a relation    But, higher normal forms can make it harder to
is in DKNF.                                        query a database. In higher normal forms, some
Not all relations can be put in DKNF.              queries require complex programming and take
                                                   longer to execute.

                                                                      Place of Normalization in
                                                                      Database Development
 De-normalization: The process of changing                          Some people consider normalization to be a
 a relation into a lower normal form to make                        critical step in logical design. They recommend
 the database easier to query even though                           that all relations be put in as high a normal form
                                                                    as possible (fully normalize all relations)
 there will be more modification anomalies
                                                                    My view:
 in it.
                                                                    Normalization is related to performance tradeoff:
                                                                        What trade off are we willing to make between ease of
                                                                        modifying the database and ease of querying the

    Place of Normalization in
                                                                            Data Normalization
    Database Development
 Good conceptual design usually results in                         Primarily a tool to validate and improve a
 relations in a relatively high normal form.                       logical design so that it satisfies certain
 Normal forms should be checked after                              constraints that avoid unnecessary
 completing the logical design (which is                           duplication of data
 based on the conceptual design) to                                The process of decomposing relations with
 determine if the relations are in an                              anomalies to produce smaller, well-
 acceptable normal form given the
                                                                   structured relations
 modification/query tradeoff.

  Well-Structured Relations                                                  Example – Figure 5.2b
A relation that contains minimal data redundancy
and allows users to insert, delete, and update rows
without causing data inconsistencies
Goal is to avoid modification anomalies
 – Insertion Anomaly – adding new rows forces user to
   create duplicate data; adding data about more than one
 – Deletion Anomaly – deleting rows may cause a loss of
   data that would be needed for other future rows;
   deleting data about more than one entity
 – Modification (update) Anomaly – changing data in a       Question – Is this a relation?   Answer – Yes: unique rows and no multivalued
   row forces changes to other rows because of                                               attributes
   duplication; changing data about entity in more than                                            Answer – Composite: Emp_ID,
                                                            Question – What’s the primary key?
   one tuple                                                                                       Course_Title

General rule of thumb: a table should not pertain to
             more than one entity type

       Anomalies in this Table
                                                                       Functional Dependencies
Insertion – can’t enter a new employee without
having the employee take a class                                     1NF though BCNF deal with functional
Deletion – if we remove employee 140, we lose                        dependencies
information about the existence of a Tax Acc class                   Functional dependency (FD): Given a relation
Modification (update) – giving a salary increase                     R(X,Y,…), Y is functionally dependent on X (or
                                                                     X functionally determines Y) if and only if every
to employee 100 forces us to update multiple                         value of X has associated with it only one value of
records                                                              Y at any one time. (Given a value of X, there is
  Why do these anomalies exist?                                      only one value of Y associated with it at any one
    Because we’ve combined two themes (entity types)                 time.)
    into one relation. This results in duplication, and an           Notation: X     Y (functional dependency formula)
    unnecessary dependency between the entities

     Functional Dependencies                                           Functional Dependencies
    Functionally             Not functionally                        X and Y in definition of functional
    Dependent                Dependent                               dependency can be composite
    X Y                      X Y                                     A, B C
     1A                        1A                                    D E, F
     1A                        1B                                    Determinant: The attribute or composite
     2B                        2B
                                                                     attribute on the left of a functional
                                                                     dependency formula

   Functional Dependencies and Keys                             5.22 -Steps in
    Functional Dependency: The value of one                     normalization
    attribute (the determinant) determines the
    value of another attribute
    Candidate Key:
    – A unique identifier. One of the candidate keys
      will become the primary key
          E.g. perhaps there is both credit card number and
          SS# in a table…in this case both are candidate keys
    – Each non-key field is functionally dependent on
      every candidate key

               First Normal Form                                                 Second Normal Form
    No multivalued attributes                                              1NF plus every non-key attribute is fully
    Every attribute value is atomic                                        functionally dependent on the ENTIRE
    Fig. 5-2a on page 168 is not in 1st Normal                             primary key
    Form (multivalued attributes)      it is not a                         – Every non-key attribute must be defined by the
    relation                                                                 entire key, not by only part of the key
    Emp(Emp_ID, Name, Dept_Name, Salary, (Course_Title, Date_Completed))   – No partial functional dependencies
    Fig. 5-2b is in 1st Normal form                                        Fig. 5-2b is NOT in 2nd Normal Form (see
    Emp(Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed)
                                                                           fig 5-23b)
    All relations are in 1st Normal Form

     Fig 5.23(b) – Functional
  Dependencies in EMPLOYEE2
                                                                                Modification Anomalies
                     Dependency on entire primary key
                                                                            What are the modification anomalies in this
 EmpID     CourseTitle Name DeptName Salary DateCompleted                   example?
                                                                            Insertion anomaly: Adding an employee requires
                                                                            adding course data
             Dependency on only part of the key                             Deletion anomaly: Deleting an employee involves
                                                                            deleting course data
EmpID, CourseTitle            DateCompleted
                                                                            Update anomaly: Updating an employees salary
EmpID         Name, DeptName, Salary
                                                                            may involve updating it in several tuples
         Therefore, NOT in 2nd Normal Form!!

    Getting it into 2nd Normal Form
                                                                                Transitive Dependency
    See p193 – decomposed into two separate
                                                                            Transitive dependency: Given a relation
                                                                            R(A,B,C,…), C is transitively dependent on
                                                          Both are full     A if A B and B C. (That is, one attribute
                                                                            functionally determines a second, which
            EmpID     Name DeptName Salary
                                                                            functionally determines a third.)

                      EmpID      CourseTitle DateCompleted

 Figure 5-24 -- Relation with transitive dependency       Figure 5-24(b) Relation with transitive dependency
     (a) SALES relation with simple data

                                                      CustID     Name
                                                      CustID     Salesperson                      BUT
                                                      CustID     Region
                                                                                    CustID      Salesperson Region
                                                      All this is OK                         Transitive dependency
                                                      (2nd NF)                               (not 3rd NF)

   Modification Anomalies                                              Third Normal Form
What are the modification anomalies in this               2NF PLUS no transitive dependencies
                                                          involving non-key attributes
Insertion anomaly: Inserting a customer requires
inserting data (region) about a salesperson
Deletion anomaly: Deleting a customer deletes
data (region) about a salesperson
Update anomaly: Changing a salesperson’s region
requires changing data in more than one tuple

Figure 5.25 -- Removing a transitive dependency                         Figure 5.25(b) Relations in 3NF
     (a) Decomposing the SALES relation

                                                                                              Salesperson   Region

                                                                                       CustID     Name
                                                                                       CustID     Salesperson

                                                          Now, there are no transitive dependencies…
                                                          Both relations are in 3rd NF

                                                                       Other Normal Forms
      Boyce-Codd Normal Form
                                                                        (from Appendix B)
           (Appendix B)
                                                              4th NF
    All determinants are candidate keys
                                                              – No multivalued dependencies
                                                              5th NF
    3NF relations can sometimes have
                                                              – No “lossless joins”
    anomalies related to functional
    dependencies                                              Domain-key NF
                                                              – The “ultimate” NF…perfect elimination of all
    BCNF relations eliminate these anomalies                    possible anomalies

  Is a higher normal form always
                                                                       Integrity Constraints
      better than a lower one?
Emp_address (ID, Name, Street, City, State, Zip)              Domain Constraints
ID Name, Street, City, State, Zip                              – Allowable values for an attribute.
Street, City, State Zip                                       Entity Integrity
In 2NF, but not in 3NF because of transitive dependency        – No primary key attribute may be null. All primary key
To put in 3NF, put Zip into separate relation:                   fields MUST have data
Emp_address (ID, Name, Street, City, State)                   Referential Integrity
Zip (Street, City, State, Zip)                                 – Any foreign key value (on the relation of the many side)
                                                                 MUST match a primary key value in the relation of the
But then join required each time address is needed
                                                                 one side. (Or the foreign key can be null)
Better to keep in 2NF because Zips rarely change

  Figure 5-5:
  Referential integrity constraints (Pine Valley Furniture)

                                       constraints are
                                      drawn via arrows
                                     from dependent to
                                         parent table