Normalization by shuifanglj

VIEWS: 39 PAGES: 22

									             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




Lecture Normalization
(I modified the Course Module on Normalization)

______________________________________________________

Outline

Normalization of Database Tables

      Functional Dependency

      Data Redundancy

Normal Forms

      First Normal Form (1NF)

      Second Normal Form (2NF)

      Third Normal Form (3NF)

Invoice Database Normalization (3NF) Example

Database Design

Example Homework Problems

Student Homework
               a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc



Normalization of Database Tables

Good database design must be matched to good table structures. Good table
structures are evaluated and designed to control data redundancies, thereby
avoiding data anomalies. The process that yields such desirable results is known
as normalization.

Functional Dependency


A primary key uniquely identifies one and only one row in a table. Functional
dependency exists between a primary key and a unique row in a table because
the primary key guarantees uniqueness.
A more formal definition of functional dependency is that A      B. B is
functionally dependent on A if A determines B. In plain English, this means A
determines one and only one value of B.
We make a dependency diagram to show all functional dependencies within a
table. Examine the following dependency diagram for the table in figure 2.6,
where C1 and C3 constitute the composite primary key because it uniquely
identifies the entire tuple (remember formal term for row or record)—that is, all
five attributes.


                      Figure 2.6 Dependency Diagram (1NF)




Table 1:

       Primary key: C1, C3

       Foreign key: None
               a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

       Normal form: 1NF


C1, C3       C2, C4, C5 represents a functional dependency because C2, C4, and
C5 depend on the primary key composed of C1 and C3.
C1       C2 is a special case of a functional dependency referred to as a partial
dependency because C2 depends only on C1 rather than on the entire primary
key composed of C1 and C3.
C4       C5 is a special case of a functional dependency referred to as a transitive
dependency because C5 depends on an attribute (C4) that is not part of the
primary key.



Data Redundancy


Redundant data occur in more than one places, creating a strong probability of
inconsistency (anomalies) in updates, additions, and deletions of the redundant
data. For example, redundant data may be updated in one place but overlooked
in another place. The most common anomalies discussed in the text when data
redundancy exists are update anomalies, addition anomalies, and deletion
anomalies.
In addition, data redundancy causes data integrity problems because data entry
failed to conform to the rule that all copies of redundant data are to be equal.
We can avoid all these difficulties through normalization.

Normal Forms

Normalization is a technique to design tables in which data redundancies are
minimized by assigning attributes to entities. If the normalization process works
properly, it eliminates uncontrolled data redundancies, getting rid of both the data
anomalies and the data integrity problems. It's important to realize that
normalization does not eliminate data redundancy - it produces carefully
controlled redundancies used to link database tables to form relationships.
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

The first three normal forms (1NF, 2NF, and 3NF) are most commonly
encountered. From a structural point of view, higher normal forms are better than
lower ones because higher normal forms yield fewer data redundancies - 3NF is
better than 2NF, which is better than 1NF.
Almost all business designs use the 3NF as the ideal normal form. A special,
more restricted, 3NF is known as Boyce-Codd normal form (BCNF).
Let's look at each of the normal forms in turn.

First Normal Form (1NF)


A table is in 1NF when all the key attributes are defined and all remaining
attributes are dependent on the primary key. When there are repeating groups,
the primary key generally needs to be expanded to reach 1NF.
However, a table in 1NF can still contain both partial and transitive
dependencies. A partial dependency is one in which an attribute is functionally
dependent on only a part of a multiattribute primary key. A transitive dependency
is one in which one attribute is functionally dependent on another non-key
attribute. Naturally, a table with a single-attribute primary key can't exhibit partial
dependencies.
The table in the dependency diagram of figure 2.6 is in 1NF.
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




Second Normal Form (2NF)


A table is in 2NF when it's in 1NF and contains no partial dependencies.
Therefore, a 1NF table is automatically in 2NF if its primary key is based on only
a single attribute.
A table in 2NF may still contain transitive dependencies. Look at the dependency
diagram for a database the tables of which are at least 2NF as shown in figure
2.7.

                      Figure 2.7 Dependency Diagram (2NF)




Remove partial dependencies by creating new tables. To do this, write each
primary key component on a separate line, followed by a line containing the
original primary key (C1,C3). Each of these keys potentially starts a new table.
Write the dependent attributes after each new key. Because no attribute is
dependent on C3, a table never materializes for the primary key C3.
Table 1 is in 3NF because it's in 2NF (no partial dependencies) and contains no
transitive dependencies. Table 2 is in 2NF because it contains a transitive
dependency, C4        C5.

Third Normal Form (3NF)
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

A table is in 3NF if it's in 2NF and contains no transitive dependencies. Given this
definition of 3NF, the Boyce-Codd normal form (BCNF) is merely a special 3NF
case, in which all the determinant keys are candidate keys. So, if a table has only
a single candidate key, a 3NF table is automatically in BCNF.
Split a table that is not in 3NF into new tables until all the tables meet the 3NF
requirements. Look at the dependency diagram for a database with tables at
least 3NF in figure 2.8 below.
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc



                     Figure 2.8 Dependency Diagram (3NF)




Get rid of transitive dependencies by decomposing the table containing the
transitive dependency. Here's how:

      Place the attributes that create the transitive dependency in a separate
       table, C4      C5.
      Keep C4 in the original table 2 to create a foreign-key link to the new table
       3.
      Make sure that the primary key attribute C4 for the new table 3 is the
       foreign key in the original table 2.

After doing this, tables 1, 2, and 3 are all in 3NF because neither partial nor
transitive dependencies exist.
Pause for a moment to look back at normalization from a nontechnical
perspective. When normalization begins, our starting point is a "conglomerate"
table—one that has lots of themes. We then decompose it to a number of single-
theme tables. When you have arrived at a collection of tables each having one
theme, you probably will have achieved 3NF.
                   a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




     Invoice Database Normalization (3NF) Example

     Here's an INVOICE database - we'll use this data and decompose it to at least
     3NF:


Attribute Name                             Sample Data

INV_NUM                                    211347

PROD_NUM                                   AA_E3422QW

SALE_DATE                                  3/25/96

PROD_DESCRIPTION                           D & B rotary sander, 6-in. disk

VEND_CODE                                  211

VEND_NAME                                  Never Fail, Inc.


NUM_SOLD                                   2

PROD_PRICE                                 $49.95




     We know that a table is in 1NF when all the key attributes are defined and all
     remaining attributes are dependent on the primary key, and that when there are
     repeating groups, the primary key generally needs to be expanded to reach 1NF.
     To eliminate repeating groups in this case, we have INV_NUM, PROD_NUM as
     the composite primary key.
     However, a table in 1NF can still contain both partial and transitive
     dependencies. In this case, the partial dependencies are INV_NUM
     SALE_DATE and PROD_NUM            PROD_PRICE. The transitive dependency is
     VEND_CODE          VEND_NAME.
     Look at the INVOICE database table (1NF) in figure 2.9.


                            Figure 2.9 Invoice Database (1NF)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Table 1:

       Primary key: INV_NUM, PROD_NUM

       Foreign key: None

       Normal form: 1NF


We know that a table is in 2NF when it's in 1NF and contains no partial
dependencies. So we create three tables from the individual components of the
primary key and the composite primary key. These keys start the new tables as
primary keys with all their dependent attributes listed in their respective tables.
Transitive dependencies are allowed in 2NF. In this case, a table (2NF) has the
transitive dependency VEND_CODE             VEND_NAME. The other tables are
already in 3NF (no partial or transitive dependencies).
Look at the INVOICE database tables (2NF) in figure 2.10.


                           Figure 2.10 Invoice Database(2NF)




Table 1:

       Primary key: INV_NUM, PROD_NUM

       Foreign keys: INV_NUM (to table 2)

       PROD_NUM (to table 3)

       Normal form: 3NF
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




Table 2:

       Primary key: INV_NUM

       Foreign key: None

       Normal form: 3NF


Table 3:

       Primary key: PROD_NUM

       Foreign key: None

       Normal form: 2NF


We know that a table is in 3NF if it's in 2NF and contains no transitive
dependencies. So we place the attributes that create the transitive dependency
in a separate table, VEND_CODE           VEND_NAME. We keep VEND_CODE in
the original table to create a foreign key link to the new table.
The other tables are already in 3NF (no partial or transitive dependencies). Once
we normalize to 3NF, we can give the INVOICE database tables meaningful
names such as LINE, INVOICE, PRODUCT, and VENDOR.
Look at the INVOICE database tables (3NF) in figure 2.11.


                          Figure 2.11 Invoice Database (3NF)
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

LINE table:

      Primary key: INV_NUM, PROD_NUM

      Foreign keys: INV_NUM (to table INVOICE)

              PROD_NUM (to table PROD_NUM)

      Normal Form: 3NF




INVOICE table:

      Primary key: INV_NUM

      Foreign key: None

      Normal form: 3NF




PRODUCT table:

      Primary Key: PROD_NUM

      Foreign Key: VEND_CODE (to table VENDOR)

      Normal Form: 3NF




VENDOR table:

      Primary Key: VEND_CODE

      Foreign key: None

      Normal form: 3NF
             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




Database Design

Dependency diagrams don't show the nature of the relationships (1:1, 1:M, M:N).
The E-R diagrams remain crucial to our design effort. We can't successfully
produce complex design without some form of modeling. Yet, as we've seen in
the preceding examples, dependency diagrams are a valuable addition to our
designer's tool box because:

      Normalization is likely to suggest the existence of entities we may not
       have considered in the modeling process.
      If transaction management issues require the existence of attributes that
       create other than 3NF or BCNF conditions, the proper dependency
       diagrams will at least force us to be aware of these conditions.
      A relational schema is used in design documentation which depicts
       connecting fields and relationship types.

Figure 2.12 shows the complete design documentation that would accompany
the previous INVOICE database (3NF) example.
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc



           Figure 2.12 Invoice E-R Diagram and Relational Schema




Normalization is part of the design process. As we define entities and attributes
during the E-R modeling process, we do normalization checks on each entity (or
entity sets) and form new ones as required. We incorporate the normalized
entities into the E-R diagram and continue the iterative E-R process until all
entities and their attributes are defined and all equivalent tables are in 3NF.
The more tables we have, the more additional disk I/O to join them and the more
processing logic we need. That's why we sometimes denormalize tables to yield
less I/O and thus increase processing speed. Is that a good idea? Unfortunately,
we pay for the increased processing speed because updates to a larger table are
             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

inefficient, and data redundancies occur that are likely to yield data anomalies.
So we should use denormalization sparingly in the design process.
Normalization offers us evaluation standards for producing good table structures
that can also be passed on to the next generation of database designers.
        a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Example Homework Problems
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




Student Homework

:
Example Problem 1) The following report is how an inexperienced database
developer might create a table. Your mission is to get into 3rd Normal Form. The
key field is Customer_ID and Movie_ID, Vendor_ID and Check_Out_Date..

It would be helpful to do this in stages like the book so you can get partial credit if
something goes wrong.




                                   Normalize_ME
 Customer    Last       Movie                    Vendor        Check Out      Return
                                    Title               Type
     ID     Name          ID                       ID            Date          Date
1001      Barns        101    Title of Movie    ACM     ACT        1/1/2002    1/2/2002
                              1
1001        Barns      102    Title of Movie    ACM     COM        1/1/2002    1/2/2002
                              2
1001        Barns      103    Title of Movie    ACM     DRA        1/1/2002    1/5/2002
                              3
1001        Barns      104    Title of Movie    ACM     DRA        1/1/2002    1/6/2002
                              4
1001        Barns      105    Title of Movie    ACM     DRA        1/1/2002
                              5
1001        Barns      106    Title of Movie6   BB      DRA        1/1/2002
1001        Barns      107     Title of Movie7 BB       COM        1/1/2002
1001        Barns      108     Title of Movie8 ACM      COM        1/1/2002
1001        Barns      109     Title of Movie9 ACM      COM
              a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc



Solution:
Step 1) list the key fields separately and on the last line write the original
(composite) key


MOVIE_ID


CUSTOMER_ID

CHECK_OUT_DATE

MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE


Step 2) Add the dependent attributes next to the each line you made on step 1:


MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID


CUSTOMER_ID ---- CUSTOMER_LNAME

CHECK_OUT_DATE            none!!!! No Dependent Attributes!! So it does not
make a NEW TABLE

MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE


Step 3) Make new Tables for all the transitive dependencies.

Notice that MOVIE_ID gives TYPE and VENDOR_ID. So, I add two new tables
for each of these AND KEEP the linking attributes in the orginal table:

MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID

CUSTOMER_ID ---- CUSTOMER_LNAME

MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE

TYPE -- TYPE, TYPE_DESCRIPTION (new)

VENDOR --- VENDOR_ID, VENDOR_NAME (new)
             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc


Step 4) Give appropriate Names to the tables, and underline the PK of each
Entity:

MOVIE: (MOVIE_ID, MOVIE_TITLE, TYPE, VENDOR_ID)

CUSTOMER: (CUSTOMER_ID, CUSTOMER_LNAME)

RENTAL: (MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE,
RETURN_DATE)

TYPE: (TYPE, TYPE_DESCRIPTION)

VENDOR: (VENDOR_ID, VENDOR_NAME)

These tables are now in 3rd Normal Form.


Notice how I had to deal with CHECK_OUT_DATE. It is possible (but unlikely)
that a customer could rent the same video (i.e. same VIDEO_ID, MOVIE_TITLE,
VENDOR_ID) on different days! So I had to keep CHECK_OUT_DATE as part of
the PK.



Example Problem 2).

a) To keep track of office furniture, computers, printers, and so on, the
   FOUNDIT company uses the following table structure:

      Attribute name             Sample value

      ITEM_ID                    2311345-678
      ITEM_DESCRIPTION           HP DeskJet 660C printer
      BLDG_ROOM                  325
      BLDG_CODE                  DEL
      BLDG_NAME                  Dawn's Early Light
      BLDG_MANAGER               E. R. Rightonit

   Given this information, draw the dependency diagram. Make sure you
   label the transitive and/or partial dependencies.
             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

b) Starting with the dependency diagram drawn for problem 10, create a set
   of dependency diagrams that meet 3NF requirements. Rename
   attributes to meet the naming conventions and create new entities and
   attributes as necessary.

Note that the dependency diagram reflect the notion that each building is
managed by one employee.

SOLUTION:
    Problem 10 Solution



    ITEM_ID ITEM_DESCRIPTION BLDG_ROOM BLDG_CODE BLDG_NAME BLDG_MANAGER


                                            Transitive Dependencies


          Problem 11 Solution: All tables in 3NF




            ITEM_ID ITEM_DESCRIPTION ITEM_ROOM BLDG_CODE




            BLDG_CODE BLDG_NAME         EMP_CODE




            EMP_CODE      EMP_LNAME EMP_FNAME          EMP_INITIAL
            a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc



STUDENT HOMEWORK


1 Using the following INVOICE table structure, draw its dependency
diagram and identify all dependencies (including all partial and transitive
dependencies). You can assume that the table does not contain repeating
groups and that any invoice number may reference more than one product.
(Hint: This table uses a composite primary key.)

      Attribute name           Sample value

      INV_NUM                  211347
      PROD_NUM                 AA_E3422QW
      SALE_DATE                06/25/1999
      PROD_DESCRIPTION         B&D Rotary sander, 6 in. disk
      VEND_CODE                211
      VEND_NAME                NeverFail, Inc.
      NUMBER_SOLD              2
      PROD_PRICE               $49.95




2 Using the initial dependency diagram drawn in problem 1, remove all
partial dependencies, draw the new dependency diagrams, and identify the
normal forms for each table structure you created.

      Note: You can assume that any given product is supplied by a
      single vendor, but a vendor can supply many products.
      Therefore, it is proper to conclude that the following
      dependency exists:
      PROD_NUM               PROD_DESCRIPTION, PROD_PRICE,
                       VEND_CODE, VEND_NAME
      (Hint: Your actions should produce three new dependency
      diagrams.)
                 a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc




3) The following report is how an inexperienced database developer might create
a table. Your mission is to normalize it. The key fields are INV_NUM, S_ID,
CAR_NUM and CUS_ID.

The fields SER_CODE is multi-valued which implies a TABLE called SERVICE
that holds all the descriptions of the SERVICE.

The Field S_ID refers to the Salesperson table.


                                         Normalize ME
INV_NUM   S_ID   SER_CODE   CAR_NUM      CUS_ID   Last Name    CAR_MAKE   CAR_MODEL     CAR_YEAR
5001      1001   NCS        00yo5mn832   101      Adams        Mazada     Tribute       2002
5029      1005   NCS        86gf2de356   123      Palaisa      Mazada     Miata         2001
5030      1005   NCS        87mn5sw754   128      Sullivan     Lincoln    TownCar       2000
5031      1006   NCS        87vb2sn427   125      Petty        Volvo      C70 Coupe     2000
5032      1007   NCS        88za9qr821   127      Scott        Mazada     Protégé       2001
5033      1007   NCS        90as2sw987   126      Regan        Lincoln    Contiential   1997
5034      1003   NCS        90gh1gf546   129      Washington   Lincoln    LS            1999
5035             TUP        A1           103      Armstrong    BMW        326           2002
5036             INS        A2           104      Bell         BMW        327           2002
5037             TUP        18ms3sw145 107        Cliburn      Volvo      Cross Country 2002
5038             TUP        A3           109      Edision      BMW        328           2002
5043             PRE        85de4hg678 122        Nixon        Ford       Windstar      2002
5044             PRE        55ar6xv772   123      Palaisa      Ford       Windstar      2001

								
To top