# Normalization by shuifanglj

VIEWS: 39 PAGES: 22

• pg 1
```									             a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Lecture Normalization
(I modified the Course Module on Normalization)

______________________________________________________

Outline

Normalization of Database Tables

Functional Dependency

Data Redundancy

Normal Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Invoice Database Normalization (3NF) Example

Database Design

Example Homework Problems

Student Homework
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Normalization of Database Tables

Good database design must be matched to good table structures. Good table
structures are evaluated and designed to control data redundancies, thereby
avoiding data anomalies. The process that yields such desirable results is known
as normalization.

Functional Dependency

A primary key uniquely identifies one and only one row in a table. Functional
dependency exists between a primary key and a unique row in a table because
the primary key guarantees uniqueness.
A more formal definition of functional dependency is that A      B. B is
functionally dependent on A if A determines B. In plain English, this means A
determines one and only one value of B.
We make a dependency diagram to show all functional dependencies within a
table. Examine the following dependency diagram for the table in figure 2.6,
where C1 and C3 constitute the composite primary key because it uniquely
identifies the entire tuple (remember formal term for row or record)—that is, all
five attributes.

Figure 2.6 Dependency Diagram (1NF)

Table 1:

Primary key: C1, C3

Foreign key: None
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Normal form: 1NF

C1, C3       C2, C4, C5 represents a functional dependency because C2, C4, and
C5 depend on the primary key composed of C1 and C3.
C1       C2 is a special case of a functional dependency referred to as a partial
dependency because C2 depends only on C1 rather than on the entire primary
key composed of C1 and C3.
C4       C5 is a special case of a functional dependency referred to as a transitive
dependency because C5 depends on an attribute (C4) that is not part of the
primary key.

Data Redundancy

Redundant data occur in more than one places, creating a strong probability of
data. For example, redundant data may be updated in one place but overlooked
in another place. The most common anomalies discussed in the text when data
redundancy exists are update anomalies, addition anomalies, and deletion
anomalies.
In addition, data redundancy causes data integrity problems because data entry
failed to conform to the rule that all copies of redundant data are to be equal.
We can avoid all these difficulties through normalization.

Normal Forms

Normalization is a technique to design tables in which data redundancies are
minimized by assigning attributes to entities. If the normalization process works
properly, it eliminates uncontrolled data redundancies, getting rid of both the data
anomalies and the data integrity problems. It's important to realize that
normalization does not eliminate data redundancy - it produces carefully
controlled redundancies used to link database tables to form relationships.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

The first three normal forms (1NF, 2NF, and 3NF) are most commonly
encountered. From a structural point of view, higher normal forms are better than
lower ones because higher normal forms yield fewer data redundancies - 3NF is
better than 2NF, which is better than 1NF.
Almost all business designs use the 3NF as the ideal normal form. A special,
more restricted, 3NF is known as Boyce-Codd normal form (BCNF).
Let's look at each of the normal forms in turn.

First Normal Form (1NF)

A table is in 1NF when all the key attributes are defined and all remaining
attributes are dependent on the primary key. When there are repeating groups,
the primary key generally needs to be expanded to reach 1NF.
However, a table in 1NF can still contain both partial and transitive
dependencies. A partial dependency is one in which an attribute is functionally
dependent on only a part of a multiattribute primary key. A transitive dependency
is one in which one attribute is functionally dependent on another non-key
attribute. Naturally, a table with a single-attribute primary key can't exhibit partial
dependencies.
The table in the dependency diagram of figure 2.6 is in 1NF.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Second Normal Form (2NF)

A table is in 2NF when it's in 1NF and contains no partial dependencies.
Therefore, a 1NF table is automatically in 2NF if its primary key is based on only
a single attribute.
A table in 2NF may still contain transitive dependencies. Look at the dependency
diagram for a database the tables of which are at least 2NF as shown in figure
2.7.

Figure 2.7 Dependency Diagram (2NF)

Remove partial dependencies by creating new tables. To do this, write each
primary key component on a separate line, followed by a line containing the
original primary key (C1,C3). Each of these keys potentially starts a new table.
Write the dependent attributes after each new key. Because no attribute is
dependent on C3, a table never materializes for the primary key C3.
Table 1 is in 3NF because it's in 2NF (no partial dependencies) and contains no
transitive dependencies. Table 2 is in 2NF because it contains a transitive
dependency, C4        C5.

Third Normal Form (3NF)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

A table is in 3NF if it's in 2NF and contains no transitive dependencies. Given this
definition of 3NF, the Boyce-Codd normal form (BCNF) is merely a special 3NF
case, in which all the determinant keys are candidate keys. So, if a table has only
a single candidate key, a 3NF table is automatically in BCNF.
Split a table that is not in 3NF into new tables until all the tables meet the 3NF
requirements. Look at the dependency diagram for a database with tables at
least 3NF in figure 2.8 below.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Figure 2.8 Dependency Diagram (3NF)

Get rid of transitive dependencies by decomposing the table containing the
transitive dependency. Here's how:

   Place the attributes that create the transitive dependency in a separate
table, C4      C5.
   Keep C4 in the original table 2 to create a foreign-key link to the new table
3.
   Make sure that the primary key attribute C4 for the new table 3 is the
foreign key in the original table 2.

After doing this, tables 1, 2, and 3 are all in 3NF because neither partial nor
transitive dependencies exist.
Pause for a moment to look back at normalization from a nontechnical
perspective. When normalization begins, our starting point is a "conglomerate"
table—one that has lots of themes. We then decompose it to a number of single-
theme tables. When you have arrived at a collection of tables each having one
theme, you probably will have achieved 3NF.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Invoice Database Normalization (3NF) Example

Here's an INVOICE database - we'll use this data and decompose it to at least
3NF:

Attribute Name                             Sample Data

INV_NUM                                    211347

PROD_NUM                                   AA_E3422QW

SALE_DATE                                  3/25/96

PROD_DESCRIPTION                           D & B rotary sander, 6-in. disk

VEND_CODE                                  211

VEND_NAME                                  Never Fail, Inc.

NUM_SOLD                                   2

PROD_PRICE                                 \$49.95

We know that a table is in 1NF when all the key attributes are defined and all
remaining attributes are dependent on the primary key, and that when there are
repeating groups, the primary key generally needs to be expanded to reach 1NF.
To eliminate repeating groups in this case, we have INV_NUM, PROD_NUM as
the composite primary key.
However, a table in 1NF can still contain both partial and transitive
dependencies. In this case, the partial dependencies are INV_NUM
SALE_DATE and PROD_NUM            PROD_PRICE. The transitive dependency is
VEND_CODE          VEND_NAME.
Look at the INVOICE database table (1NF) in figure 2.9.

Figure 2.9 Invoice Database (1NF)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Table 1:

Primary key: INV_NUM, PROD_NUM

Foreign key: None

Normal form: 1NF

We know that a table is in 2NF when it's in 1NF and contains no partial
dependencies. So we create three tables from the individual components of the
primary key and the composite primary key. These keys start the new tables as
primary keys with all their dependent attributes listed in their respective tables.
Transitive dependencies are allowed in 2NF. In this case, a table (2NF) has the
transitive dependency VEND_CODE             VEND_NAME. The other tables are
already in 3NF (no partial or transitive dependencies).
Look at the INVOICE database tables (2NF) in figure 2.10.

Figure 2.10 Invoice Database(2NF)

Table 1:

Primary key: INV_NUM, PROD_NUM

Foreign keys: INV_NUM (to table 2)

PROD_NUM (to table 3)

Normal form: 3NF
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Table 2:

Primary key: INV_NUM

Foreign key: None

Normal form: 3NF

Table 3:

Primary key: PROD_NUM

Foreign key: None

Normal form: 2NF

We know that a table is in 3NF if it's in 2NF and contains no transitive
dependencies. So we place the attributes that create the transitive dependency
in a separate table, VEND_CODE           VEND_NAME. We keep VEND_CODE in
the original table to create a foreign key link to the new table.
The other tables are already in 3NF (no partial or transitive dependencies). Once
we normalize to 3NF, we can give the INVOICE database tables meaningful
names such as LINE, INVOICE, PRODUCT, and VENDOR.
Look at the INVOICE database tables (3NF) in figure 2.11.

Figure 2.11 Invoice Database (3NF)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

LINE table:

Primary key: INV_NUM, PROD_NUM

Foreign keys: INV_NUM (to table INVOICE)

PROD_NUM (to table PROD_NUM)

Normal Form: 3NF

INVOICE table:

Primary key: INV_NUM

Foreign key: None

Normal form: 3NF

PRODUCT table:

Primary Key: PROD_NUM

Foreign Key: VEND_CODE (to table VENDOR)

Normal Form: 3NF

VENDOR table:

Primary Key: VEND_CODE

Foreign key: None

Normal form: 3NF
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Database Design

Dependency diagrams don't show the nature of the relationships (1:1, 1:M, M:N).
The E-R diagrams remain crucial to our design effort. We can't successfully
produce complex design without some form of modeling. Yet, as we've seen in
the preceding examples, dependency diagrams are a valuable addition to our
designer's tool box because:

   Normalization is likely to suggest the existence of entities we may not
have considered in the modeling process.
   If transaction management issues require the existence of attributes that
create other than 3NF or BCNF conditions, the proper dependency
diagrams will at least force us to be aware of these conditions.
   A relational schema is used in design documentation which depicts
connecting fields and relationship types.

Figure 2.12 shows the complete design documentation that would accompany
the previous INVOICE database (3NF) example.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Figure 2.12 Invoice E-R Diagram and Relational Schema

Normalization is part of the design process. As we define entities and attributes
during the E-R modeling process, we do normalization checks on each entity (or
entity sets) and form new ones as required. We incorporate the normalized
entities into the E-R diagram and continue the iterative E-R process until all
entities and their attributes are defined and all equivalent tables are in 3NF.
The more tables we have, the more additional disk I/O to join them and the more
processing logic we need. That's why we sometimes denormalize tables to yield
less I/O and thus increase processing speed. Is that a good idea? Unfortunately,
we pay for the increased processing speed because updates to a larger table are
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

inefficient, and data redundancies occur that are likely to yield data anomalies.
So we should use denormalization sparingly in the design process.
Normalization offers us evaluation standards for producing good table structures
that can also be passed on to the next generation of database designers.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Example Homework Problems
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Student Homework

:
Example Problem 1) The following report is how an inexperienced database
developer might create a table. Your mission is to get into 3rd Normal Form. The
key field is Customer_ID and Movie_ID, Vendor_ID and Check_Out_Date..

It would be helpful to do this in stages like the book so you can get partial credit if
something goes wrong.

Normalize_ME
Customer    Last       Movie                    Vendor        Check Out      Return
Title               Type
ID     Name          ID                       ID            Date          Date
1001      Barns        101    Title of Movie    ACM     ACT        1/1/2002    1/2/2002
1
1001        Barns      102    Title of Movie    ACM     COM        1/1/2002    1/2/2002
2
1001        Barns      103    Title of Movie    ACM     DRA        1/1/2002    1/5/2002
3
1001        Barns      104    Title of Movie    ACM     DRA        1/1/2002    1/6/2002
4
1001        Barns      105    Title of Movie    ACM     DRA        1/1/2002
5
1001        Barns      106    Title of Movie6   BB      DRA        1/1/2002
1001        Barns      107     Title of Movie7 BB       COM        1/1/2002
1001        Barns      108     Title of Movie8 ACM      COM        1/1/2002
1001        Barns      109     Title of Movie9 ACM      COM
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Solution:
Step 1) list the key fields separately and on the last line write the original
(composite) key

MOVIE_ID

CUSTOMER_ID

CHECK_OUT_DATE

MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE

Step 2) Add the dependent attributes next to the each line you made on step 1:

MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID

CUSTOMER_ID ---- CUSTOMER_LNAME

CHECK_OUT_DATE            none!!!! No Dependent Attributes!! So it does not
make a NEW TABLE

MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE

Step 3) Make new Tables for all the transitive dependencies.

Notice that MOVIE_ID gives TYPE and VENDOR_ID. So, I add two new tables
for each of these AND KEEP the linking attributes in the orginal table:

MOVIE_ID ---- MOVIE_TITLE, TYPE, VENDOR_ID

CUSTOMER_ID ---- CUSTOMER_LNAME

MOVIE_ID, CUSTOMER_ID--- CHECK_OUT_DATE, RETURN_DATE

TYPE -- TYPE, TYPE_DESCRIPTION (new)

VENDOR --- VENDOR_ID, VENDOR_NAME (new)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

Step 4) Give appropriate Names to the tables, and underline the PK of each
Entity:

MOVIE: (MOVIE_ID, MOVIE_TITLE, TYPE, VENDOR_ID)

CUSTOMER: (CUSTOMER_ID, CUSTOMER_LNAME)

RENTAL: (MOVIE_ID, CUSTOMER_ID, CHECK_OUT_DATE,
RETURN_DATE)

TYPE: (TYPE, TYPE_DESCRIPTION)

VENDOR: (VENDOR_ID, VENDOR_NAME)

These tables are now in 3rd Normal Form.

Notice how I had to deal with CHECK_OUT_DATE. It is possible (but unlikely)
that a customer could rent the same video (i.e. same VIDEO_ID, MOVIE_TITLE,
VENDOR_ID) on different days! So I had to keep CHECK_OUT_DATE as part of
the PK.

Example Problem 2).

a) To keep track of office furniture, computers, printers, and so on, the
FOUNDIT company uses the following table structure:

Attribute name             Sample value

ITEM_ID                    2311345-678
ITEM_DESCRIPTION           HP DeskJet 660C printer
BLDG_ROOM                  325
BLDG_CODE                  DEL
BLDG_NAME                  Dawn's Early Light
BLDG_MANAGER               E. R. Rightonit

Given this information, draw the dependency diagram. Make sure you
label the transitive and/or partial dependencies.
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

b) Starting with the dependency diagram drawn for problem 10, create a set
of dependency diagrams that meet 3NF requirements. Rename
attributes to meet the naming conventions and create new entities and
attributes as necessary.

Note that the dependency diagram reflect the notion that each building is
managed by one employee.

SOLUTION:
Problem 10 Solution

ITEM_ID ITEM_DESCRIPTION BLDG_ROOM BLDG_CODE BLDG_NAME BLDG_MANAGER

Transitive Dependencies

Problem 11 Solution: All tables in 3NF

ITEM_ID ITEM_DESCRIPTION ITEM_ROOM BLDG_CODE

BLDG_CODE BLDG_NAME         EMP_CODE

EMP_CODE      EMP_LNAME EMP_FNAME          EMP_INITIAL
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

STUDENT HOMEWORK

1 Using the following INVOICE table structure, draw its dependency
diagram and identify all dependencies (including all partial and transitive
dependencies). You can assume that the table does not contain repeating
groups and that any invoice number may reference more than one product.
(Hint: This table uses a composite primary key.)

Attribute name           Sample value

INV_NUM                  211347
PROD_NUM                 AA_E3422QW
SALE_DATE                06/25/1999
PROD_DESCRIPTION         B&D Rotary sander, 6 in. disk
VEND_CODE                211
VEND_NAME                NeverFail, Inc.
NUMBER_SOLD              2
PROD_PRICE               \$49.95

2 Using the initial dependency diagram drawn in problem 1, remove all
partial dependencies, draw the new dependency diagrams, and identify the
normal forms for each table structure you created.

Note: You can assume that any given product is supplied by a
single vendor, but a vendor can supply many products.
Therefore, it is proper to conclude that the following
dependency exists:
PROD_NUM               PROD_DESCRIPTION, PROD_PRICE,
VEND_CODE, VEND_NAME
(Hint: Your actions should produce three new dependency
diagrams.)
a1b2cfa4-e611-4131-8011-35ccfe44fc2e.doc

3) The following report is how an inexperienced database developer might create
a table. Your mission is to normalize it. The key fields are INV_NUM, S_ID,
CAR_NUM and CUS_ID.

The fields SER_CODE is multi-valued which implies a TABLE called SERVICE
that holds all the descriptions of the SERVICE.

The Field S_ID refers to the Salesperson table.

Normalize ME
INV_NUM   S_ID   SER_CODE   CAR_NUM      CUS_ID   Last Name    CAR_MAKE   CAR_MODEL     CAR_YEAR
5029      1005   NCS        86gf2de356   123      Palaisa      Mazada     Miata         2001
5030      1005   NCS        87mn5sw754   128      Sullivan     Lincoln    TownCar       2000
5031      1006   NCS        87vb2sn427   125      Petty        Volvo      C70 Coupe     2000
5032      1007   NCS        88za9qr821   127      Scott        Mazada     Protégé       2001
5033      1007   NCS        90as2sw987   126      Regan        Lincoln    Contiential   1997
5034      1003   NCS        90gh1gf546   129      Washington   Lincoln    LS            1999
5035             TUP        A1           103      Armstrong    BMW        326           2002
5036             INS        A2           104      Bell         BMW        327           2002
5037             TUP        18ms3sw145 107        Cliburn      Volvo      Cross Country 2002
5038             TUP        A3           109      Edision      BMW        328           2002
5043             PRE        85de4hg678 122        Nixon        Ford       Windstar      2002
5044             PRE        55ar6xv772   123      Palaisa      Ford       Windstar      2001

```
To top