Learning Center
Plans & pricing Sign in
Sign Out

Normalizacija DB


									First normal form

First normal form (1NF or Minimal Form) is a normal form used in database
normalization. A relational database table that adheres to 1NF is one that meets a certain
minimum set of criteria. These criteria are basically concerned with ensuring that the table is a
faithful representation of a relation[1] and that it is free of repeating groups[2].

The concept of a "repeating group" is, however, understood in different ways by different
theorists. As a consequence, there is not universal agreement as to which features would
disqualify a table from being in 1NF. Most notably, 1NF as defined by some authors (for
example, Ramez Elmasri and Shamkant B. Navathe[3], following the precedent established by
E.F. Codd) excludes relation-valued attributes (tables within tables); whereas 1NF as
defined by other authors (for example, Chris Date) permits them.

1NF tables as representations of relations

According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to
some relation", which means, specifically, that it satisfies the following five conditions:

       1. There's no top-to-bottom ordering to the rows.
       2. There's no left-to-right ordering to the columns.
       3. There are no duplicate rows.
       4. Every row-and-column intersection contains exactly one value from the applicable
       domain (and nothing else).
       5. All columns are regular [i.e. rows have no hidden components such as row IDs,
       object IDs, or hidden timestamps].
       —Chris Date, "What First Normal Form Really Means", pp. 127-8[4]

Violation of any of these conditions would mean that the table is not strictly relational, and
therefore that it is not in 1NF.

Examples of tables (or views) that would not meet this definition of 1NF are:

      A table that lacks a unique key. Such a table would be able to accommodate duplicate
       rows, in violation of condition 3.
      A view whose definition mandates that results be returned in a particular order, so that
       the row-ordering is an intrinsic and meaningful aspect of the view.[5] This violates
       condition 1. The tuples in true relations are not ordered with respect to each other.
      A table with at least one nullable attribute. A nullable attribute would be in violation
       of condition 4, which requires every field to contain exactly one value from its
       column's domain. It should be noted, however, that this aspect of condition 4 is
       controversial. It marks an important departure from Codd's original vision of the
       relational model, which made explicit provision for nulls.[6]

Repeating groups

Date's fourth condition, which expresses "what most people think of as the defining feature of
1NF"[7], is concerned with repeating groups. The following example illustrates how a database
design might incorporate repeating groups, in violation of 1NF.
Example 1: Domains and values

Suppose a novice designer wishes to record the names and telephone numbers of customers.
He defines a customer table which looks like this:


Customer ID First Name Surname Telephone Number
123         Rachel     Ingram    555-861-2025
456         James      Wright    555-403-1659

789         Maria      Fernandez 555-808-9633

The designer then becomes aware of a requirement to record multiple telephone numbers for
some customers. He reasons that the simplest way of doing this is to allow the "Telephone
Number" field in any given record to contain more than one value:

Customer ID First Name Surname Telephone Number
123         Rachel     Ingram    555-861-2025
456         James      Wright
789         Maria      Fernandez 555-808-9633

Assuming, however, that the Telephone Number column is defined on some Telephone
Number-like domain (e.g. the domain of strings 12 characters in length), the representation
above is not in 1NF. 1NF (and, for that matter, the RDBMS) prohibits a field from containing
more than one value from its column's domain.

Example 2: Repeating groups across columns

The designer might attempt to get around this restriction by defining multiple Telephone
Number columns:

Customer ID First Name Surname     Tel. No. 1   Tel. No. 2   Tel. No. 3
123         Rachel     Ingram    555-861-2025
456         James      Wright    555-403-1659 555-776-4100
789         Maria      Fernandez 555-808-9633

This representation, however, makes use of nullable columns, and therefore does not conform
to Date's definition of 1NF. Even if the view is taken that nullable columns are allowed, the
design is not in keeping with the spirit of 1NF. Tel. No. 1, Tel. No. 2., and Tel. No. 3. share
exactly the same domain and exactly the same meaning; the splitting of Telephone Number
into three headings is artificial and causes logical problems. These problems include:
         Difficulty in querying the table. Answering such questions as "Which customers have
          telephone number X?" and "Which pairs of customers share a telephone number?" is
         Inability to enforce uniqueness of Customer-to-Telephone Number links through the
          RDBMS. Customer 789 might mistakenly be given a Tel. No. 2 value that is exactly
          the same as her Tel. No. 1 value.
         Restriction of the number of telephone numbers per customer to three. If a customer
          with four telephone numbers comes along, we are constrained to record only three and
          leave the fourth unrecorded. This means that the database design is imposing
          constraints on the business process, rather than (as should ideally be the case) vise-

Example 3: Repeating groups within columns

The designer might, alternatively, retain the single Telephone Number column but alter its
domain, making it a string of sufficient length to accommodate multiple telephone numbers:

Customer ID First Name Surname           Telephone Number
123            Rachel     Ingram      555-861-2025
456            James      Wright      555-403-1659, 555-776-4100
789            Maria      Fernandez 555-808-9633

This design is not consistent with 1NF, and presents several design issues. The Telephone
Number heading becomes semantically woolly, as it can now represent either a telephone
number, a list of telephone numbers, or indeed anything at all. A query such as "Which pairs
of customers share a telephone number?" is more difficult to formulate, given the necessity to
cater for lists of telephone numbers as well as individual telephone numbers. Meaningful
constraints on telephone numbers are also very difficult define in the RDBMS with this

A design that complies with 1NF

A design that is unambiguously in 1NF makes use of two tables: a Customer table and a
Customer Telephone Number table.

Customer ID First Name Surname
123            Rachel     Ingram
456            James      Wright
789            Maria      Fernandez
 Customer Telephone Number
Customer ID Telephone Number
123            555-861-2025
456            555-403-1659
456            555-776-4100
789            555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-
to-Telephone Number link appears on its own record.


Some definitions of 1NF, most notably that of E.F. Codd, make reference to the concept of
atomicity. Codd states that the "values in the domains on which each relation is defined are
required to be atomic with respect to the DBMS."[8] Codd defines an atomic value as one that
"cannot be decomposed into smaller pieces by the DBMS (excluding certain special

Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is
ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be
understood.[10][11] In particular, the notion of a "value that cannot be decomposed" is
problematic, as it would seem to imply that few, if any, data types are atomic:

      A character string would seem not be atomic, as the RDBMS typically provides
       operators to decompose it into substrings.
      A date would seem not to be atomic, as the RDBMS typically provides operators to
       decompose it into day, month, and year components.
      A fixed-point number would seem not to be atomic, as the RDBMS typically provides
       operators to decompose it into integer and fractional components.

Date suggests that "the notion of atomicity has no absolute meaning"[12]: a value may be
considered atomic for some purposes, but may be considered an assemblage of more basic
elements for other purposes. If this position is accepted, 1NF cannot be defined with reference
to atomicity. Columns of any conceivable data type (from string types and numeric types to
array types and table types) are then acceptable in a 1NF table—although perhaps not always
desirable. Date argues that relation-valued attributes, by means of which a field within a table
can contain a table, are useful in rare cases.[13]

Second normal form

Second normal form (2NF) is a normal form used in database normalization. 2NF was
originally defined by E.F. Codd[1] in 1971. A table that is in first normal form (1NF) must
meet additional criteria if it is to qualify for second normal form. Specifically: a 1NF table is
in 2NF if and only if, given any candidate key and any attribute that is not a constituent of a
candidate key, the non-key attribute depends upon the whole of the candidate key rather than
just a part of it.

In slightly more formal terms: a 1NF table is in 2NF if and only if none of its non-prime
attributes are functionally dependent on a part (proper subset) of a candidate key. (A non-
prime attribute is one that does not belong to any candidate key.)

Note that when a 1NF table has no composite candidate keys (candidate keys consisting of
more than one attribute), the table is automatically in 2NF.


Consider a table describing employees' skills:
                Employees' Skills
Employee       Skill         Current Work Location
Jones      Typing            114 Main Street
Jones      Shorthand         114 Main Street
Jones      Whittling         114 Main Street
Roberts    Light Cleaning 73 Industrial Way
Ellis      Alchemy           73 Industrial Way
Ellis      Juggling          73 Industrial Way
Harrison   Light Cleaning 73 Industrial Way

The table's only candidate key is {Employee, Skill}.

The remaining attribute, Current Work Location, is dependent on only part of the candidate
key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way
Current Work Locations are represented: we are told three times that Jones works at 114 Main
Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table
vulnerable to update anomalies: it is, for example, possible to update Jones' work location on
his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting
data would imply contradictory answers to the question "What is Jones' current work

A 2NF alternative to this design would represent the same information in two tables:

Employee Current Work Location
Jones      114 Main Street
Roberts    73 Industrial Way
Ellis      73 Industrial Way
Harrison   73 Industrial Way
    Employees' Skills
Employee       Skill
Jones      Typing
Jones      Shorthand
Jones      Whittling
Roberts    Light Cleaning
Ellis      Alchemy
Ellis      Juggling
Harrison   Light Cleaning

Update anomalies cannot occur in these tables, which are both in 2NF.

Not all 2NF tables are free from update anomalies, however. An example of a 2NF table
which suffers from update anomalies is:

                       Tournament Winners
   Tournament         Year     Winner         Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open        1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations
are shown redundantly on multiple records. This problem is addressed by third normal form

2NF and candidate keys

A table for which there are no partial functional dependencies on the primary key is typically,
but not always, in 2NF. In addition to the primary key, the table may contain other candidate
keys; it is necessary to establish that no non-prime attributes have part-key dependencies on
any of these candidate keys.

Multiple candidate keys occur in the following table:

                         Electric Toothbrush Models
Manufacturer       Model        Model Full Name      Manufacturer Country
Forte            X-Prime      Forte X-Prime          Italy
Forte            Ultraclean   Forte Ultraclean       Italy
Dent-o-Fresh     EZBrush      Dent-o-Fresh EZBrush USA
Kobayashi        ST-60        Kobayashi ST-60        Japan
Hoch             Toothmaster Hoch Toothmaster        Germany
Hoch             Contender    Hoch Contender         Germany

Even if the designer has specified the primary key as {Model Full Name}, the table is not in
2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent
on a proper subset of it: Manufacturer.

Third normal form

The third normal form (3NF) is a normal form used in database normalization. 3NF was
originally defined by E.F. Codd[1] in 1971. Codd's definition states that a table is in 3NF if and
only if both of the following conditions hold:

       The relation R (table) is in second normal form (2NF)
       Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent)
        on every key of R.

A non-prime attribute of R is an attribute that does not belong to any candidate key of R.[2]
A transitive dependency is a functional dependency in which X → Z (X determines Z) by
virtue of X → Y and Y → Z; i.e. X determines Z only indirectly.
An alternative formulation of Codd's definition, given by Carlo Zaniolo[3] in 1982, is this: a
table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of
the following conditions holds:

        X contains A, or
        X is a superkey, or
        A is a prime attribute (i.e., A is contained within a candidate key)

Zaniolo's definition has the advantage of giving a clear sense of the difference between 3NF
and the more stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third
alternative ("A is a prime attribute").


An example of a 2NF table that fails to meet the requirements of 3NF is:

                          Tournament Winners
   Tournament            Year       Winner     Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open           1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

The only candidate key is {Tournament, Year}.

The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively
dependent on {Tournament, Year} via the non-prime attribute Winner. The fact that Winner
Date of Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with different
dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into

           Tournament Winners
   Tournament        Year       Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open       1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
        Player Dates of Birth
    Player         Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968
Update anomalies cannot occur in these tables, which are both in 3NF.

Normalization beyond 3NF

Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF
tables, rarely met with in practice, are affected by such anomalies; these are tables which
either fall short of Boyce-Codd normal form (BCNF) or, if they meet BCNF, fall short of the
higher normal forms 4NF or 5NF.

Boyce-Codd normal form

Boyce-Codd normal form (or BCNF) is a normal form used in database normalization. It is
a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd normal
form if and only if, for every one of its non-trivial functional dependencies X → Y, X is a
superkey—that is, X is either a candidate key or a superset thereof.

BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd[1] to address certain
types of anomaly not dealt with by 3NF as originally defined. Chris Date has pointed out that
a definition of what we now know as BCNF appeared in a paper[2] by Ian Heath in 1971.
"Since that definition predated Boyce and Codd's own definition by some three years," writes
Date, "it seems to me that BCNF ought by rights to be called Heath normal form. But it


Only in rare cases does a 3NF table not meet the requirements of BCNF. An example of such
a table is:

        Tutor/Student Cross-Reference
Tutor ID Tutor Soc. Security Num. Student ID
1078     088-51-0074               31850
1078     088-51-0074               37921
1293     096-77-4146               46224
1480     072-21-2223               31850

The purpose of the table is to show which tutors are assigned to which students. The table's
candidate keys are:

       {Tutor ID, Student ID}
       {Tutor Social Security Number, Student ID}

Therefore all three attributes of the table are prime attributes: that is, all three attributes
belong to candidate keys.

Recall that 2NF prohibits partial functional dependencies of non-prime attributes on
candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime
attributes on candidate keys. Since the table above lacks any non-prime attributes, it adheres
to both 2NF and 3NF.
BCNF is more stringent than 3NF in that it does not permit any functional dependency in
which the determining set of attributes is not a candidate key (or superset thereof). The
dependency of Tutor ID on Tutor Social Security Number is such a dependency. Accordingly,
the table above is not in BCNF.

Any table that falls short of BCNF will be vulnerable to logical inconsistencies. In the table
above, there is nothing to prevent two different Tutor IDs from being shown, illegitimately, as
corresponding to the same Tutor Social Security Number.

Correcting the problem in this case would be a simple matter of using only one scheme of
identifiers for Tutors: either IDs or Social Security Numbers, but not both.

Domain/key normal form

Domain/key normal form (DKNF) is a normal form used in database normalization which
requires that the database contains no constraints other than domain constraints and key

A domain constraint specifies the permissible values for a given attribute, while a key
constraint specifies the attributes that uniquely identify a row in a given table.

The domain/key normal form is achieved when every constraint on the relation is a logical
consequence of the definition of keys and domains, and enforcing key and domain restraints
and conditions causes all constraints to be met. Thus, it avoids all non-temporal anomalies.

It's much easier to build a database in domain/key normal form than it is to convert lesser
databases which may contain numerous anomalies. However, successfully building a
domain/key normal form database remains a difficult task, even for experienced database
programmers. Thus, while the domain/key normal form eliminates the problems found in
most databases, it tends to be the most costly normal form to achieve. However, failing to
achieve the domain/key normal form may carry long-term, hidden costs due to anomalies
which appear in databases adhering only to lower normal forms over time.

A violation of DKNF occurs in the following table:

                      Wealthy Person
Wealthy Person Wealthy Person Type Net Worth in Dollars
Steve           Eccentric Millionaire   124,543,621
Roderick        Evil Billionaire        6,553,228,893
Katrina         Eccentric Billionaire   8,829,462,998
Gary            Evil Millionaire        495,565,211

(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a
pre-defined sample of wealthy people; the domain for Wealthy Person Type consists of the
values 'Eccentric Millionaire', 'Eccentric Billionaire', 'Evil Millionaire', and 'Evil Billionaire';
and the domain for Net Worth in Dollars consists of all integers greater than or equal to
There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we
cannot deduce one from the other. The constraint dictates that an Eccentric Millionaire or Evil
Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, while an Eccentric
Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or higher. This
constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on
domain constraints and key constraints to guarantee that an inconsistent Wealthy Person Type
/ Net Worth in Dollars combination does not make its way into the database.

The DKNF violation could be eliminated by altering the Wealthy Person Type domain to
make it consist of just two values, 'Evil' and 'Eccentric' (the wealthy person's status as a
millionaire or billionaire is implicit in their Net Worth in Dollars, so no useful information is

DKNF is frequently difficult to achieve in practice

Fourth normal form

Fourth normal form (4NF) is a normal form used in database normalization. 4NF ensures
that independent multivalued facts are correctly and efficiently represented in a database
design. 4NF is the next level of normalization after Boyce-Codd normal form (BCNF).

The definition of 4NF relies on the notion of a multivalued dependency. A table with a
multivalued dependency is one where the existence of two or more independent many-to-
many relationships in a table causes redundancy; and it is this redundancy which is removed
by fourth normal form.

Consider the following example:

           Pizza Delivery Permutations
  Restaurant       Pizza Variety Delivery Area
Vincenzo's Pizza Thick Crust     Springfield
Vincenzo's Pizza Thick Crust     Shelbyville
Vincenzo's Pizza Thin Crust      Springfield
Vincenzo's Pizza Thin Crust      Shelbyville
Elite Pizza        Thin Crust    Capital City
Elite Pizza        Stuffed Crust Capital City
A1 Pizza           Thick Crust   Springfield
A1 Pizza           Thick Crust   Shelbyville
A1 Pizza           Thick Crust   Capital City
A1 Pizza           Stuffed Crust Springfield
A1 Pizza           Stuffed Crust Shelbyville
A1 Pizza           Stuffed Crust Capital City

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.

Notice that because the table has a unique key and no non-key attributes, it does not violate
any normal form up to BCNF. But because the varieties of pizza a restaurant offers are
independent from the areas to which the restaurant delivers, there is redundancy in the table:
for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza start
producing Cheese Crust pizzas then we will need to add multiple records, one for each of A1
Pizza's delivery areas. In formal terms, this is described as Pizza Variety having a multivalued
dependency on Restaurant.

To satisfy 4NF, we must place the facts about varieties offered into a different table from the
facts about delivery areas:

   Varieties By Restaurant
  Restaurant     Pizza Variety
Vincenzo's Pizza Thick Crust
Vincenzo's Pizza Thin Crust
Elite Pizza      Thin Crust
Elite Pizza      Stuffed Crust
A1 Pizza         Thick Crust
A1 Pizza         Stuffed Crust
Delivery Areas By Restaurant
   Restaurant       Delivery Area
Vincenzo's Pizza Springfield
Vincenzo's Pizza Shelbyville
Elite Pizza         Capital City
A1 Pizza            Springfield
A1 Pizza            Shelbyville
A1 Pizza            Capital City

In contrast, if the pizza varieties offered by a restaurant sometimes did vary from one delivery
area to another, the original three-column table would satisfy 4NF.

Ronald Fagin demonstrated that it is always possible to achieve 4NF (but not always
desirable). Rissanen's theorem is also applicable on multivalued dependencies.

Fifth normal form

Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of
database normalisation, designed to reduce redundancy in relational databases recording
multi-valued facts by isolating semantically related multiple relationships. A table is said to
be in the 5NF if and only if it is in 4NF and every join dependency in it is implied by the
candidate keys.


Consider the following example:

 Psychiatrist       Insurer          Condition
Dr. James         Healthco       Anxiety
Dr. James         Healthco       Depression
Dr. Kendrick      FriendlyCare OCD
Dr. Kendrick      FriendlyCare Anxiety
Dr. Kendrick      FriendlyCare Depression
Dr. Kendrick      FriendlyCare Mood Disorder
Dr. Lowenstein FriendlyCare Anxiety
Dr. Lowenstein FriendlyCare Schizophrenia
Dr. Lowenstein Healthco          Anxiety
Dr. Lowenstein Healthco          Dementia
Dr. Lowenstein Victorian Life Conversion Disorder

The psychiatrist is able to offer reimbursable treatment to patients who suffer from the given
condition and who are insured by the given insurer. In the absence of any rules restricting the
valid possible combinations of psychiatrist, insurer, and condition, the three-attribute table
Psychiatrist-to-Insurer-to-Condition is necessary in order to model the situation correctly.

Suppose, however, that the following rule applies: When a psychiatrist is authorized to offer
reimbursable treatment to patients insured by Insurer P, and the psychiatrist is able to treat
condition C, then – in the event that the Insurer P covers condition C – it must be true that the
psychiatrist is able to offer reimbursable treatment to patients who suffer from condition C
and are insured by Insurer P.

With these constraints it is possible to split the relation into three parts.

   Psychiatrist         Condition
                                                                            Insurer          Condition
  Dr. James         Anxiety
                                              Psychiatrist-to-Insurer     Healthco      Anxiety
  Dr. James         Depression
                                            Psychiatrist     Insurer      Healthco      Depression
  Dr. Kendrick      OCD
                                           Dr. James       Healthco       Healthco      Dementia
  Dr. Kendrick      Anxiety
                                           Dr. Kendrick    FriendlyCare   FriendlyCare OCD
  Dr. Kendrick      Depression
                                           Dr. Lowenstein FriendlyCare    FriendlyCare Anxiety
  Dr. Kendrick      Mood Disorder
                                           Dr. Lowenstein Healthco        FriendlyCare Depression
  Dr. Lowenstein Schizophrenia
                                           Dr. Lowenstein Victorian Life FriendlyCare Mood Disorder
  Dr. Lowenstein Anxiety
                                                                          FriendlyCare Schizophrenia
  Dr. Lowenstein Dementia
                                                                          Victorian Life Conversion Disorder
  Dr. Lowenstein Conversion Disorder

Note how this setup helps to remove redundancy. Suppose that Dr. James becomes a
treatment provider for FriendlyCare. In the previous setup we would have to add two new
entries since Dr. James is able to treat two conditions covered by FriendlyCare: anxiety and
depression. With the new setup we need only add a single entry (in Psychiatrist-to-Insurer).

Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a
complex real-world constraint governing the valid combinations of attribute values in the 4NF
table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the
burden of maintaining the logical consistency of the data within the table must be carried
partly by the application responsible for insertions, deletions, and updates to it; and there is a
heightened risk that the data within the table will become inconsistent. In contrast, the 5NF
design excludes the possibility of such inconsistencies.

Usage Caution:

"Spurious rows" may appear if you only join two of these three tables in this example.
You must re-join ALL tables in the collection to get the correct answer set.

To top