# 4 TH NORMAL FORM Lossless Decomposition by nym11541

VIEWS: 117 PAGES: 30

• pg 1
```									  4TH NORMAL FORM
&
Lossless Decomposition

By: Karen McVay
CS 157B
REVIEW OF NFs

   1NF  All values of the columns are
atomic. That is, they contain no
repeating values.

   2NF  it is in 1NF and every non-key
column is fully dependent upon the
primary key (avoid partial
dependencies)
REVIEW OF NF Cont…
   3NF  it is in 2NF and every non-key column is non
transitively dependent upon its primary key. In other
words, all non-key attributes are functionally
dependent only upon the primary key.

 BCNF A relation is in BCNF if every determinant is
a candidate key. This is an improved form of third
normal form.
Determinant: an attribute on which some other
attribute is fully functionally dependent
4NF and Multivalued Dependencies

   Some relations can exist that are in
BCNF but they have redundant data
and have update anomalies

   The next highest normal form is 4NF

   4NF is based on multivalued
dependencies
Multivalued Dependencies

Consider a relation R with attributes X, Y, Z where X,
Y, Z are sets of attributes

   The multivalued dependency, X             Y,
exists if
when two tuples exist having the same X
values:
T1(x, y1, z1) and T2(x, y2, z2),
   implies the two tuples
– T4(x, y2, z1) and T3(x, y1, z2) also exist
Example
Suppose we have two one-to-many relationships:

   Each employee may have many dependants
   Each employee may work on many projects
   For any employee, the dependents are completely
independent of the projects
– For a given value of ename, the values of pname are
only determined by ename and not dname
– For a given value of ename, the values of dname are
only determined by ename and not pname
– So, each dname is repeated for each pname, and
viceversa
Consider the relation EMP

EMP ename pname         dname

Note that EMP is BCNF, and there is a lot of redundancy in EMP

If (Smith, X, John) and (Smith, Y, Anna) exist, then
(Smith, Y, John) and (Smith, X, Anna) exist
The MVD ename          pname | dname exists in EMP
We might have liked to have:

EMP ename pname           dname
Smith    X, Y     John, Anna

But 1NF does not permit multivalued attributes

EMP ename pname              dname
Smith   X, Y     John, Anna

We have:

EMP ename pname           dname
Smith   X              John
Smith   Y              Anna
Smith       Y      John
Smith       X      Anna
Decomposing a MVD without loss of information

R       X         Y         Z

Note that if X        Y | Z exists, then R can be decomposed into
(X,Y) and (R-Y)

Ra         X         Y

Rb         X         Z

And this is a lossless decomposition
EMP ename pname   dname

As ename   pname | dname exists, EMP can be decomposed
into

EMPa ename pname
This is a lossless
decomposition
EMPb ename dname
4th Normal Form

A Boyce Codd normal form relation is in
fourth normal form if

(a)   there is no multi value dependency in
the relation or

(b)   there are multi value dependency but
the attributes, which are multi value
dependent on a specific attribute, are
dependent between themselves.
4th Normal Form Cont…
This is best discussed through mathematical notation.

Assume the following relation

R(a:pk1, b:pk2, c:pk3)

Recall that a relation is in BCNF if all its determinant
are candidate keys, in other words each
determinant can be used as a primary key.
Because relation R has only one determinant (a, b,
c), which is the composite primary key and since
the primary is a candidate key therefore R is in
BCNF.
4th Normal Form Cont…
Now R may or may not be in fourth normal form.

1. If R contains no multi value dependency then R will be
in Fourth normal form.

2. Assume R has the following two-multi value dependencies:

a --->> b     and           a --->> c

In this case R will be in the fourth normal form if b and c
dependent on each other.
However if b and c are independent of each other then R
is not in fourth normal form and the relation has to be
projected to two non-loss projections.
Example
Consider a case of class enrollment. Each
student can be enrolled in one or more
classes and each class can contain one or
more students.
Clearly, there is a many-to-many
relationship between classes and
students. This relationship can be
represented by a Student/Class cross-
reference table:
{StudentID, ClassID}
Example Cont…
   The key for this table is the combination of
StudentID and ClassID. To avoid violation of
2NF, all other information about each student
and each class is stored in separate Student
and Class tables, respectively.

   Note that each StudentID determines not a
unique ClassID, but a well-defined, finite set of
values. This kind of behavior is referred to as
multi-valued dependency of ClassID on
StudentID.
Example 2
   Consider another example with two many-to-many
relationships, between students and classes and
between classes and teachers.

*      *
Students              Classes

*      *
Classes               Teachers

Also,a many-to-many relationship between
students and teachers is implied.
Example 2 Cont…

   The combination of StudentID and TeacherID
does not contain any additional information
beyond the information implied by the
student/class and class/teacher relationships.

   Consequentially, the student/class and
class/teacher relationships are independent of
each other—these relationships have no additional
constraints. The following table is, then, in
violation of 4NF:

{StudentID, ClassID, TeacherID}
4th NF and Anomalies

   As an example of the anomalies that
can occur, realize that it is not possible
to add a new class taught by some
teacher without adding at least one
student who is enrolled in this class.
4th Normal Form and anomalies
Cont…

Case 1:
Assume the following relation:
Employee (Eid:pk1, Language:pk2, Skill:pk3)

No multi value dependency,
therefore R is in fourth normal
form.
4th Normal Form and
anomalies Cont…
case 2:
Assume the following relation with multi-value
dependency:

Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Eid --->> Languages          Eid --->> Skills

Languages and Skills are dependent.
This says an employee speak several languages and
has several skills. However for each skill a specific
language is used when that skill is practiced.
Thus employee 100 when he/she teaches
speaks English but when he cooks speaks
French. This relation is in fourth normal form
and does not suffer from any anomalies.

Eid             Language         Skill
100             English          Teaching
100             Kurdish          Politic
100             French           Cooking
200             English          Cooking
200             Arabic           Singing
4th Normal Form and
anomalies Cont…
case 3:
Assume the following relation with multi-
value dependency:

Employee (Eid:pk1, Languages:pk2, Skills:pk3)

Eid --->> Languages             Eid --->> Skills

Languages and Skills are independent.
4th Normal Form and
anomalies Cont…
This relation is not in fourth normal form and suffers
from all three types of anomalies.

Eid                Language           Skill
100                English            Teaching
100                Kurdish            Politic
100                English            Politic
100                Kurdish            Teaching
200                Arabic             Singing
Insertion anomaly: To insert row (200 English Cooking) we
have to insert two extra rows (200 Arabic cooking), and
(200 English Singing) otherwise the database will be
inconsistent. Note the table will be as follow:

Eid          Language     Skill
100          English      Teaching
100          Kurdish      Politics
100          English      Politics
100          Kurdish      Teaching
200          Arabic       Singing
200          English      Cooking
200          Arabic       Cooking
200          English      Singing
   Deletion anomaly: If employee 100 discontinue politic
skill we have to delete two rows:

(100 Kurdish Politic), and (100 English Politic) otherwise
the database will be inconsistent.

Eid                  Language           Skill
100                  English            Teaching
100                  Kurdish            Politics
100                  English            Politics
100                  Kurdish            Teaching
200                  Arabic             Singing
200                  English            Cooking
200                  Arabic             Cooking
200                  English            Singing
More anomalies

   Update anomaly: If employee 200
changes his skill from singing to
dancing we have to make changes
in more than one place.
The relation is projected to the following two non-loss
projections which are in forth normal form

Emplyee_Language(Eid:pk1, Languages:pk2)

Eid                         Language
100                         English
100                         Kurdish
200                         Arabic
Cont…

Emplyee_skill(Eid:pk1, Skills:pk2)

Eid                          Skill
100                          Teaching
100                          Politic
200                          Singing
References

Functional Dependency (Normalization)
http://www.emunix.emich.edu/~khaila
ny/files/Normalization.htm

Multivalued Dependencies (Ozmar Zaine):
http://www.cs.sfu.ca/CC/354/zaiane/materi
al/notes/Chapter7/node13.html

```
To top