Normal
Normal Forms
“Why be normal?”
- Author unknown
Copyright, Harris Corporation & Ophir Frieder, 1998 1
Objectives
• To define first, second, third, and Boyce-Codd normal forms.
• To discuss the motivation for normal forms, and the implications for
database design.
Copyright, Harris Corporation & Ophir Frieder, 1998 2
Normal Forms
• A relational scheme is said to be in first normal form (1NF) if and
only if each of it’s domains contains only scalar values.
– No “repeating groups” or unbounded lists.
– A field cannot itself be a table, as in Oracle8!
Question: Do CITY and STATE violate 1NF in relation scheme
R=(CITY,STATE,POPULATION)?
Copyright, Harris Corporation & Ophir Frieder, 1998 3
Motivation For 1NF
• Representation - with a repeating group, some method must be
devised for specifying the end or length of the list.
• Space Allocation - how is space allocated on a per record basis if
repeating groups are allowed?
SS# NAME HOBBIES
032446254 Mary Smith jogging, wrestling
045242453 Bob Jones cooking, cycling,
gardening
932415223 Sue Clark quilting,
hiking,
...
Copyright, Harris Corporation & Ophir Frieder, 1998 4
Motivation For 1NF
• Operations - without 1NF, all operations become more complex, and
this propagates throughout the database management system.
• Theory - 1NF simplifies the theoretical basis of the relational model
(e.g., proof of algorithmic correctness).
SS# NAME HOBBIES
032446254 Mary Smith jogging, wrestling
045242453 Bob Jones cooking, cycling,
gardening
932415223 Sue Clark quilting,
hiking,
...
Copyright, Harris Corporation & Ophir Frieder, 1998 5
Converting A Relational
Scheme To 1NF
• A repeating group is typically eliminated by “flattening” the table.
This makes most things simpler.
SS# NAME HOBBIES
032446254 Mary Smith jogging
032446254 Mary Smith wrestling
045242453 Bob Jones cooking
045242453 Bob Jones cycling
045242453 Bob Jones gardening
:
Copyright, Harris Corporation & Ophir Frieder, 1998 6
Single v.s. Multiple Keys
• For the sake of simplicity we will assume initially that each relational
scheme has exactly one key.
• Multiple keys do occur, but less frequently.
• In what follows, the formal definitions do not change in the case of
multiple keys.
• Given a relational scheme R, it will be helpful throughout the
following to divide the attributes up into two sets, those that are part of
the key, and those that are not.
Copyright, Harris Corporation & Ophir Frieder, 1998 7
Example #1:
Key And Non-Key Attributes
• Consider the following relational scheme for a department store chain
(e.g., Walmart):
– Attributes:
STORE_ID# - A store identification number.
CITY - The city in which the store is located.
STATE - The state in which the store is located.
ITEM - An item sold by the store.
PRICE - The price of the item.
– Functional Dependencies:
STORE_ID# => CITY
STORE_ID# => STATE
STORE_ID#, ITEM => PRICE
Copyright, Harris Corporation & Ophir Frieder, 1998 8
Example #1, Cont.
STORE_ID# CITY STATE ITEM PRICE
W001 Orlando FL Duck Tape 3.95
W001 Orlando FL Rope 3.75
W002 Savannah GA Paint 8.75
W002 Savannah GA Plywood 12.50
W002 Savannah GA Insulation 9.95
Copyright, Harris Corporation & Ophir Frieder, 1998 9
Example #1, Cont.
• The only key is: STORE_ID#,ITEM
Key Attributes Non-Key Attributes
STORE_ID# CITY
ITEM STATE
PRICE
Copyright, Harris Corporation & Ophir Frieder, 1998 10
Partial Dependency
• A functional dependency X=>A is a partial dependency if:
– X is a proper subset of the key attributes, and
– A is a non-key attribute.
Copyright, Harris Corporation & Ophir Frieder, 1998 11
Example #1, Cont.
Key Attributes Non-Key Attributes
STORE_ID# CITY
ITEM STATE
PRICE
• STORE_ID# => CITY is a partial dependency.
• Similarly, STORE_ID# => STATE is a partial dependency.
Copyright, Harris Corporation & Ophir Frieder, 1998 12
Normal Forms - 2NF
• A relational scheme is said to be in second normal form (2NF) if and
only if it is in 1NF and contains no partial dependencies.
Question: Why eliminate partial dependencies?
Copyright, Harris Corporation & Ophir Frieder, 1998 13
Example #1, Cont.
• Note the redundancy in the following legal relation:
STORE_ID# CITY STATE ITEM PRICE
W001 Orlando FL Duck Tape 3.95
W002 Charleston SC Duck Tape 3.75
W003 Savannah GA Paint 8.75
W003 Savannah GA Plywood 12.50
W003 Savannah GA Paint 9.95
Copyright, Harris Corporation & Ophir Frieder, 1998 14
Motivation for 2NF
Example #2
• Consider the following relational scheme for maintaining information
associated with students at a university:
– Attributes:
STUDENT_ID# - The social security number of a student.
NAME - The students last name.
COURSE_ID# - The ID # of a course the student is registered in.
DEPT_ID# - The ID # of the department that offers the course.
– Functional Dependencies:
STUDENT_ID# => NAME
COURSE_ID# => DEPT_ID#
Copyright, Harris Corporation & Ophir Frieder, 1998 15
Example #2, Cont.
STUDENT_ID# NAME COURSE_ID# DEPT_ID#
032-34-3455 Smith CS1001 8353
032-34-3455 Smith MS4025 9255
892-64-8372 Jones MS4025 9255
892-64-8372 Jones MS4145 9255
892-64-8372 Jones MS5023 9255
Copyright, Harris Corporation & Ophir Frieder, 1998 16
Example #2, Cont.
• The only key is: STUDENT_ID#, COURSE_ID#
Key Attributes Non-Key Attributes
STUDENT_ID# NAME
COURSE_ID# DEPT_ID#
• STUDENT_ID# => NAME is a partial dependency, so it is not in 2NF.
• Similarly, COURSE_ID# => DEPT_ID# is a partial dependency.
Copyright, Harris Corporation & Ophir Frieder, 1998 17
Example #2, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# NAME COURSE_ID# DEPT_ID#
032-34-3455 Smith CS1001 8353
032-34-3455 Smith MS4025 9255
892-64-8372 Jones MS4025 9255
892-64-8372 Jones MS4145 9255
892-64-8372 Jones MS5023 9255
Copyright, Harris Corporation & Ophir Frieder, 1998 18
Anomalies Resulting From
Partial Dependencies
• Insertion Anomalies - A new student cannot be added unless they are
currently registered for at least one course.
• Deletion Anomalies - If a student drops their last, or only course, then
there is no record left of the student.
• Update Anomalies - Changing a students’ name requires all their
records to be updated. Similarly, for changing a course ID #, or for
assigning a course to a different department.
Note that the first two assume null values are not desirable, which is of
considerable debate in the database community.
Copyright, Harris Corporation & Ophir Frieder, 1998 19
Transitive Dependency
• A functional dependency X=>A is a transitive dependency if:
– X is a proper subset of the non-key attributes, and
– A is a non-key attribute.
Copyright, Harris Corporation & Ophir Frieder, 1998 20
Motivation For 3NF
Example #1
• Consider the following relational scheme for National Football League
(NFL) athletes:
– Attributes:
PLAYER_ID# - The social security number for an NFL athlete.
TEAM - The name of the team the athlete plays for.
STATE - The state in which the team is located.
– Functional Dependencies:
PLAYER_ID# => TEAM
TEAM => STATE
Copyright, Harris Corporation & Ophir Frieder, 1998 21
Example #1, Cont.
PLAYER_ID# TEAM STATE
024-34-3455 Bills NY
195-63-6775 Dolphins FL
892-64-8372 Vikings MN
934-84-7271 Vikings MN
783-83-9879 Vikings MN
Copyright, Harris Corporation & Ophir Frieder, 1998 22
Example #1, Cont.
Key Attributes Non-Key Attributes
PLAYER_ID# TEAM STATE
• TEAM => STATE is a transitive dependency.
Copyright, Harris Corporation & Ophir Frieder, 1998 23
Normal Forms - 3NF
• A relational scheme is said to be in third normal form (3NF) if and
only if it is in 2NF and contains no transitive dependencies.
Every non-key attribute depends on the key, the whole key and nothing
but the key.
Question: Why eliminate transitive dependencies?
Copyright, Harris Corporation & Ophir Frieder, 1998 24
Motivation For 3NF, Cont.
Key Attributes Non-Key Attributes
PLAYER_ID# TEAM
STATE
• There are no partial dependencies, so it is in 2NF.
• However, it is not in 3NF because of the transitive dependency
TEAM=>STATE.
Copyright, Harris Corporation & Ophir Frieder, 1998 25
Example #1, Cont.
• Note the redundancy in the following legal relation:
PLAYER_ID# TEAM STATE
024-34-3455 Bills NY
195-63-6775 Dolphins FL
892-64-8372 Vikings MN
934-84-7271 Vikings MN
783-83-9879 Vikings MN
Copyright, Harris Corporation & Ophir Frieder, 1998 26
Motivation for 3NF
Example #2
• Consider the following relational scheme for a university student
database:
– Attributes:
STUDENT_ID# - A students’ social security number.
CITY - The city of the students’ home address.
STATE - The state of the students’ home address.
ZIP - The zip code of the students’ home address.
– Functional Dependencies:
STUDENT_ID# => CITY
STUDENT_ID# => STATE
STUDENT_ID# => ZIP
ZIP => STATE
ZIP => CITY
Copyright, Harris Corporation & Ophir Frieder, 1998 27
Example #2, Cont.
STUDENT_ID# CITY STATE ZIP
024-34-3455 Melbourne FL 32934
195-63-6775 Melbourne FL 32934
892-64-8372 Orlando FL 32816
934-84-7271 Orlando FL 32816
783-83-9879 Oswego NY 13126
Copyright, Harris Corporation & Ophir Frieder, 1998 28
Example #2, Cont.
Key Attributes Non-Key Attributes
STUDENT_ID CITY
STATE
ZIP
• There are no partial dependencies, so it is in 2NF.
• ZIP => STATE and ZIP=>CITY are transitive dependencies, so it is
not in 3NF.
Copyright, Harris Corporation & Ophir Frieder, 1998 29
Example #2, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# CITY STATE ZIP
024-34-3455 Melbourne FL 32934
195-63-6775 Melbourne FL 32934
892-64-8372 Orlando FL 32816
934-84-7271 Orlando FL 32816
783-83-9879 Oswego NY 13126
Copyright, Harris Corporation & Ophir Frieder, 1998 30
Motivation for 3NF
Example #3
• Consider a relational scheme for tracking software licenses:
– Attributes:
LICENSE_ID# - The license ID number for a piece of software.
MACHINE_ID# - The ID number of the machine on which the
software is installed.
EMPLOYEE_ID# - The social security number of the employee to
which the machine is assigned.
LOCATION - The location of the employee’s office.
– Functional Dependencies:
LICENSE_ID# => MACHINE_ID#
MACHINE_ID# => EMPLOYEE_ID#
EMPLOYEE_ID# => LOCATION
Copyright, Harris Corporation & Ophir Frieder, 1998 31
Example #3, Cont.
LICENSE_ID# MACHINE_ID# EMPLOYEE_ID# LOCATION
3243452 3357 023-45-7886 B1-234
4432451 5342 094-34-0033 B3-9867
4452312 4256 143-95-7453 B2-6774
5343231 4256 143-95-7453 B2-6774
5534242 1124 143-95-7453 B2-6774
Copyright, Harris Corporation & Ophir Frieder, 1998 32
Example #3, Cont.
Key Attributes Non-Key Attributes
LICENSE_ID# MACHINE_ID#
EMPLOYEE_ID#
LOCATION
• There are no partial dependencies, so the relational scheme is in 2NF.
• MACHINE_ID# => EMPLOYEE_ID# and EMPLOYEE_ID# =>
LOCATION are both transitive dependencies, so it is not in 3NF.
Copyright, Harris Corporation & Ophir Frieder, 1998 33
Example #3, Cont.
• Note the redundancy in the following legal relation:
LICENSE_ID# MACHINE_ID# EMPLOYEE_ID# LOCATION
3243452 3357 023-45-7886 B1-234
4432451 5342 094-34-0033 B3-9867
4452312 4256 143-95-7453 B2-6774
5343231 4256 143-95-7453 B2-6774
6634324 1124 143-95-7453 B2-6774
7534242 1124 143-95-7453 B2-6774
Copyright, Harris Corporation & Ophir Frieder, 1998 34
Anomalies Resulting From
Transitive Dependencies
• Insertion Anomalies - A license cannot be added until it is installed
on a machine, and until that machine is assigned to an employee.
• Deletion Anomalies - Deleting all of the records for a particular
employee would delete any record of the machines or licenses assigned
to that employee.
• Update Anomalies - Changing the employee assigned to a particular
machine requires multiple record updates. Similarly for changing an
employees’ location.
As with partial dependencies, the first two assume null values are not
desirable.
Copyright, Harris Corporation & Ophir Frieder, 1998 35
Normal Forms - BCNF
• A relation is said to be in Boyce/Codd normal form (BCNF) if every
attribute depends on the key, the whole key, and nothing but the key.
Copyright, Harris Corporation & Ophir Frieder, 1998 36
BCNF
Example #1
• Consider a relational scheme for tracking employee salary
adjustments:
– Attributes:
EMPLOYEE_ID# - An employee identification number
DATE - A date on which the employee’s salary was adjusted
AMOUNT - The amount of the salary adjustment
EXPLANATION - An explanation for the adjustment
– Functional Dependencies:
EMPLOYEE_ID#,DATE => AMOUNT
EMPLOYEE_ID#,DATE => EXPLANATION
Copyright, Harris Corporation & Ophir Frieder, 1998 37
Example #3, Cont.
Key Attributes Non-Key Attributes
EMPLOYEE_ID# AMOUNT
DATE EXPLANATION
• There are no partial dependencies, so the relational scheme is in 2NF.
• There are no transitive dependencies, so the relational scheme is in
3NF.
• Each of the non-key attributes depends on both of the key attributes
(the key, the whole key, and nothing but the key), so the relational
scheme is in BCNF.
Copyright, Harris Corporation & Ophir Frieder, 1998 38
Normal Forms - BCNF, Cont.
• Note that the definition of BCNF does not reference that for 3NF.
• This raises a couple of questions:
– If a relational scheme is in BCNF, is it also in 3NF?
– If a relational scheme is in 3NF, is it also in BCNF?
• The answer to the first is yes (proof is left as an exercise).
• The answer to the second question depends...
Copyright, Harris Corporation & Ophir Frieder, 1998 39
Normal Forms - BCNF, Cont.
• Relational Scheme Has Only One Key:
– A relational scheme is in 3NF if and only if it is in BCNF.
• Relational Scheme Has Multiple Keys:
– If the relational scheme is in BCNF, then it is in 3NF (already stated).
– If the relational scheme is in 3NF, however, it is not necessarily in BCNF.
• If the case of multiple keys, 1NF, 2NF, and 3NF definitions are still
the same.
Copyright, Harris Corporation & Ophir Frieder, 1998 40
Motivation For BCNF
Example #1
• Consider the following relational scheme:
– Attributes:
STUDENT_ID# - A student ID number.
COURSE_ID# - The ID# of a course being taken by the student.
FACULTY_ID# - The ID# of the faculty member who
teaches the course taken by the student.
– Functional Dependencies:
STUDENT_ID#,COURSE_ID# => FACULTY_ID#
FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation & Ophir Frieder, 1998 41
Example #1, Cont.
STUDENT_ID# COURSE_ID# FACULTY_ID#
023-45-7886 CSE1001 998-32-4124
094-34-0033 CSE4257 087-32-8797
143-95-7453 CSE2130 774-92-3889
034-42-8009 CSE2130 774-92-3889
Copyright, Harris Corporation & Ophir Frieder, 1998 42
Example #1, Cont.
• There are two keys:
STUDENT_ID#,COURSE_ID#
STUDENT_ID#,FACULTY_ID#
• The relation is in 1NF, 2NF, and 3NF (why?)
• The relation is not in BCNF because of the dependency:
FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation & Ophir Frieder, 1998 43
Example #1, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# COURSE_ID# FACULTY_ID#
023-45-7886 CSE1001 998-32-4124
094-34-0033 CSE4257 087-32-8797
143-95-7453 CSE2130 774-92-3889
034-42-8009 CSE2130 774-92-3889
Copyright, Harris Corporation & Ophir Frieder, 1998 44
Motivation For BCNF
Example #2
• Consider the following relational scheme:
– Attributes:
LICENSE_ID# - A Florida state driver’s license number.
SS# - The social security number of the person holding the
license.
CODE - A traffic violation code.
QTY - The number of times the person has been issued the
violation within the past year.
– Functional Dependencies:
LICENSE_ID# => SS# LICENSE_ID#,CODE => QTY
SS# => LICENSE_ID# SS#,CODE => QTY
Copyright, Harris Corporation & Ophir Frieder, 1998 45
Example #2, Cont.
LICENSE_ID# SS# CODE QTY
B424-341-52-415 023-45-7886 Speeding 3
C893-402-52-435 094-34-0033 Tail-Gating 4
M345-310-61-875 143-95-7453 Speeding 1
M345-310-61-875 143-95-7453 Double-Parking 6
Copyright, Harris Corporation & Ophir Frieder, 1998 46
Example #2, Cont.
• There are two keys:
LICENSE_ID#,CODE
SS#,CODE
• The relation is in 1NF, 2NF, and 3NF (why?)
• The relation is not in BCNF because of the dependencies:
LICENSE_ID# => SS#
SS# => LICENSE_ID#
Copyright, Harris Corporation & Ophir Frieder, 1998 47
Example #2, Cont.
• Note the redundancy in the following legal relation:
LICENSE_ID# SS# CODE QTY
B424-341-52-415 023-45-7886 Speeding 3
C893-402-52-435 094-34-0033 Tail-Gating 4
M345-310-61-875 143-95-7453 Speeding 1
M345-310-61-875 143-95-7453 Double-Parking 6
Copyright, Harris Corporation & Ophir Frieder, 1998 48
Update Anomolies
• Insertion Anomalies - The fact that a license ID# has been assigned to
a particular person cannot be recorded unless they have at least one
violation.
• Deletion Anomalies - Deleting all of the violations for a particular
driver would delete any record of the license ID# for that person.
• Update Anomalies - Changing a driver’s name requires changing all
the records for each type of violation the driver has committed.
As with partial dependencies, the first two assume null values are not
desirable.
Copyright, Harris Corporation & Ophir Frieder, 1998 49
Normal Forms Summary
• A relational scheme is said to be in first normal form (1NF) if and
only if each of it’s domains contains only scalar values.
• A relational scheme is said to be in second normal form (2NF) if and
only if it is in 1NF and contains no partial dependencies.
• A relational scheme is said to be in third normal form (3NF) if and
only if it is in 2NF and contains no transitive dependencies.
• A relational scheme is said to be in Boyce/Codd normal form
(BCNF) if and only if the only nontrivial dependencies for the
relational scheme are those in which a key functionally determines one
or more attributes (“every attribute depends on the key, the whole key,
and nothing but the key”).
Copyright, Harris Corporation & Ophir Frieder, 1998 50
Normal Forms - 4NF & 5NF
Currently Beyond The Scope Of This Course:
• A relational scheme R is said to be in fourth normal form (4NF) if
and only if whenever there is a multivalued dependency X=>>Y,
where Y is not empty or a subset of X, and XY does not include all the
attributes of R, then X is a superkey of R.
• A relational scheme R is said to be in fifth normal form (5NF) - also
called projection-join normal form (PJ/NF) - if and only if every join
dependency in R is implied by the candidate keys of R.
Copyright, Harris Corporation & Ophir Frieder, 1998 51