Docstoc

Normalization

Document Sample
Normalization Powered By Docstoc
					                            R. J. Daigle




Normalization Concepts


            CIS 507
     Database Programming
                                                                                                R. J. Daigle
 Introduction
• Two types of Anomalies associated with
                                                   EmployeeID            Last Name          First Name
  databases.                                       111222                Jones              Samual

• Modification Anomaly                             111333                Smith              Sandra

                                                   222444                Edwards            Edwin

   – an unexpected consequence of changing
     the actual data in a database
• Design Anomaly                              EmployeeID                                  EmployeeID


   – a flaw in the logical design of the
     database itself                                        111222               111222



• Basic Principles:                                                  EmployeeID

   – For each modification anomaly there is
     a design anomaly
   – For each design anomaly there are
     associated modification anomalies
                                                                            R. J. Daigle
 Types of modification anomalies:

STUDENT-ID     STUDENT-NAME       COURSE            COURSE NAME             GRADE
112211         John Smith         CIS 501     Accelerated Programming       B
112244         Tonya Tucker       CIS 501     Accelerated Programming       B
112244         Tonya Tucker       CIS 502     Architecture and OS           A
112233         Michael Boyd       CIS 503     Data and File Structure       A
112255         George Jones       CIS 502     Architecture and OS           B
112244         Tonya Tucker       CIS 504     Networks & Communications     C

•   Insertion:
     – add a new student—since the key is STUDENT-ID + COURSE, a student
        can only be added when the course has been completed.
•   Deletion:
     – of student 112233 results in the loss of information about course CIS 503.
•   Update.
     – Student 112244 to Tonya Marshall requires the change to take place in
        several places.
                                                      R. J. Daigle
Normalization


• Design anomalies have been classified and criteria for
  removal of the anomalies have been developed.
• The process of removing design anomalies is called
  Normalization.
• A Normal Form is associated with the removal of a
  specific type of anomaly.
• The known normal forms from lowest to highest are: 1NF,
  2NF, 3NF, BCNF, 4NF, 5NF, DKNF
• Any design which is evaluated as a higher form
  automatically satisfies the lower forms.
                                                       R. J. Daigle
Normalization


• Theory of Normalization Contributors
   – Dr. E. F. Codd introduced the first three normal forms
     in the same paper in which the Relational Model was
     introduced ( A Relational Model of Data for Large
     Shared Databanks, CACM, Vol 13, No 6, June, 1970.)
   – Dr. R. F. Boyce extended Codd's original three forms.
   – Dr. R. Fagin extended the theory as proposed by Codd
     and introduced another way of evaluating a design.
   – Dr. D. M. Kroenke has been instrumental in clarifying
     the theory of normal forms in his role as educator.
                                                              R. J. Daigle
 Known Normal Forms.

• The normal forms in order from lowest to highest are
   – First Normal Form (1NF). Elimination of repeating field types
   – Second Normal Form (2NF). Elimination of partial key
     dependencies.
   – Third Normal Form (3NF). Elimination of transitive key
     dependencies among non-key attributes.
   – Boyce-Codd Normal Form (BCNF). Elimination of partial key
     dependencies upon non-key attributes.
   – Fourth Normal Form (4NF). Elimination of multi-valued
     dependencies.
   – Fifth Normal Form (5NF). Elimination of join anomalies.
   – Domain Key Normal Form. Elimination of all modification
     anomalies.
                                                R. J. Daigle
Basic Definitions.


  • Assumptions:
    – e is an entity type
    – ε is the set of attributes for e
    – A, B, C, ... are non-empty subsets of ε
                                                                             R. J. Daigle
Basic Definitions.


  • Functional Dependence: B is Functionally Dependent on A
    if for each value of A there is exactly one value of B.
       –   A is said to Functionally Determine B
       –   A is called a Determinant.
       –   The relationship between A and B is represented as A --> B
       –   If A --> ε, A is said to be an Identifier for the entity type e
  • Example
                 e = STUDENT-DORM-FEE
                   ε = {STUD-ID, STUD-NAME, DORM, DORM-FEE}
                   A = {STUD-ID}
                   B = {DORM}
                   C = {DORM-FEE}

                     A --> ε, B --> C
                                                            R. J. Daigle
Key


• K is a Key for the entity e if and only if
  1. K --> ε       and
  2. no non-empty subset of K determines ε.
• Example (From last example)
      – A = {STUD-ID} is a key for e = STUDENT-DORM-
          FEE
      – {STUD-ID, DORM-FEE} is not a key for e.
•     An attribute which belongs to the selected key A is called
      a Key Attribute; all other attributes are called Non-Key
      Attributes.
                                                                        R. J. Daigle

First Normal Form (1NF) (Repeating Attribute Types)

 •   A is a repeating attribute type or repeating field type if for each
     occurrence of e there may be 0, 1, or more occurrences of values for A.
 •   The data structure used for a repeating attribute types gives rise to
     maintenance difficulties.
      – Static approach: embedding repeating field type is within the entity
          type (implemented as an array)
           • allocation for maximal perceived use results in unused space or
           • insufficient storage for some entities if the maximal perceived
               use is underestimated
      – Dynamic approach: data structure which requires more complex
          functions for management.
 •   Example.
      – e = {STUD-ID, STUD-NAME, COURSE, COURSE-GRADE}
          COURSE and COURSE-GRADE form a repeating type pair of
          attributes since any student will have completed 0, 1, or more courses.
                                                                           R. J. Daigle
First Normal Form (1NF) (Atomic Attributes)

 • An attribute is Atomic if the attribute defines the lowest level of usage of
   data collected for the attribute.
 • Example.
    e = {STUD-ID, STUD-NAME, COURSE, COURSE-GRADE}
     – No guarantee that LAST-NAME will available since the design does
       not define it as an attribute.
     – Examples for Johnson Albert Gilbert:
        JOHNSON GILBERT, GILBERT JOHNSON, J. A. GILBERT, J. GILBERT, GILBERT J.,
        GILBERT J. A., JOHNSON A. GILBERT, J. ALBERT GILBERT, GILBERT JOHNSON A.,
        GILBERT JOHNSON ALBERT,JOHNSON ALBERT GILBERT, GILBERT J. ALBERT, etc.
     – The forms shown above might all be present for different entities in
       the data collection.
     – Any algorithm designed to extract the desired information would
       have to consider all possibilities.
     – Avoided by properly identifying the lowest level of use during the
       design phase rather than relying on an application to obtain the
       desired data.
                                                    R. J. Daigle
First Normal Form (1NF) (Definition)


•   An entity type e is in FIRST NORMAL FORM (1NF) if
    and only if
    1. e has no repeating attribute types AND
    2. all attribute types of e are atomic.
Token Diagram (Abstraction).           R. J. Daigle




                                C


                A
                                E

                B

                               D1 D2
Symbolic Table.                                                      R. J. Daigle



      A              B            C                   D              GRADE
 112211       John Smith       CIS 501   Accelerated Programming     B
 112244       Tonya Tucker     CIS 501   Accelerated Programming     B
 112244       Tonya Tucker     CIS 502   Architecture and OS         A
 112233       Michael Boyd     CIS 503   Data and File Structure     A
 112255       George Jones     CIS 502   Architecture and OS         B
 112244       Tonya Tucker     CIS 504   Networks & Communications   C




  Insertion Anomaly.

      b.   Deletion Anomaly.

      c.   Update Anomaly.
Second Normal Form (Partial Key Dependency)                 R. J. Daigle




  An entity type e with key, K, has a Partial Key Dependency if
    and only if a collection of non-key attributes is determined
    by (or functionally dependent on) a non-empty proper
    subset of K.
Second Normal Form (Definition)                  R. J. Daigle




  An entity type e is in SECOND NORMAL FORM (2NF) if
    and only if
    a. e is in 1NF AND
    b. e has no partial key dependencies.
Example.                  R. J. Daigle




  Difficulties.
                      C


                  A



                  B   D
Token Diagram (Abstraction).               R. J. Daigle




                    C              A
                                       C

  A
                               A

  B                 D                  D
                               B
Symbolic Table.           R. J. Daigle




  a. Insertion Anomaly.
  b. Deletion Anomaly.
  c. Update Anomaly.
Third Normal Form (3NF).                                       R. J. Daigle




  For A and C, attribute collections for an entity type e, there is
    a Transitive Dependency of C upon A if there is an
    attribute collection, B, of e for which
    a. A --> B and
    b. B --> C.
Third Normal Form (Definition)                       R. J. Daigle




  An entity type e is in THIRD NORMAL FORM (3NF) if and
    only if
    a. e is in 2NF AND
    b. e has no transitive dependencies of one non-key
    attribute collection upon another non-key attribute
    collection
Example           R. J. Daigle




  Difficulties.
Token Diagram (Abstraction)   R. J. Daigle
Symbolic Table.         R. J. Daigle




  Insertion Anomaly.
    Deletion Anomaly.
    Update Anomaly.
Boyce-Codd Normal Form (BCNF).                            R. J. Daigle




  Attribute collections A and B of an entity type e are
    Candidate Keys for e if and only if
    a. A is a key for e and
    b. B is a key for e and
    c. A is not equal to B.
Boyce-Codd Normal Form (BCNF) (Definition)         R. J. Daigle




  An entity type e is in BOYCE-CODD NORMAL FORM
    (BCNF) if and only if
    a. e is in 3NF AND
    b. all determinants of e are candidate keys.
Example.          R. J. Daigle




  Difficulties.
Token Diagram (Abstraction) Second Normal Form   R. J. Daigle
(Partial Key Dependency)
Symbolic Table.               R. J. Daigle




  a. Insertion Anomaly.

     b.   Deletion Anomaly.

     c.   Update Anomaly.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:11/7/2011
language:English
pages:29