Document Sample
Normalization Powered By Docstoc
      (Codd, 1972)
     Practical Information
For Real World Database Design
 Requirements for Relational DB
• Table format
• Supports Boolean Algebra
   – Selects, joins, projects + 5 other operations to define
• Supports mathematical, relational, and logical
  operators (And-Or-Not)
• Codd’s Twelve Rules (abstracted)
   –   Null values can be present except in primary key
   –   All data represented in tables
   –   Must be able to update views
   –   Access data with table name, field name, or value
   –   Data and programs should be independent
   – Should enforce integrity and validity constraints
        Normalization Defined
• Normalization
  – Purpose is to avoid potential update problems called
  – Assigned attributes to entities using 1NF, 2NF, 3NF,
    4NF, 5NF
• Denormalization
  – Moving back a level to gain better performance in the
    real-world database; in practice 3NF is most common,
    however, to gain efficiency and speed, minor changes
    may need to be made
Why Normalization Is Important
• If not done, updates are less efficient (larger
  tables, possibly more than one update per
  data item change)
• If not done, indexing is more cumbersome –
  impractical to build large databases
• If not done, no simple strategies for creating
  views required by users
                 Design Rules
• Determine Business Rules
   – A company manages many different projects
   – Each project requires the services of many employees
   – Employees may be assigned to work on more than one
   – Each employee has a job classification
   – Many employees have the same job classification
• Translate business rules to validity constraints and
      Design Rules, Continued
• Analyze documents, interview key users,
  etc. to develop a field list
• Determine entities to be used (see next slide
  for definition)
• Determine relationships between entities
• Assign the attributes to the entities
• Identify primary and foreign keys
• Check for 1NF, 2NF, and 3 NF
• Entity – the subject to be modeled by the database
  file (table or relation)
• Primary Key – the field value that uniquely
  identifies the entity entry (row, tuple, record); all
  other attributes are functionally dependent on it;
  can’t be null
• Foreign Key – the field (attribute, column) that
  relates the table to a pre-existing table
• Functional Dependence - determines or depends
  on, e.g. advisor name depends on advisor ID
       Definitions, Continued
• Views
  – Selected group of records (select)
  – Selected group of fields (project)
  – Selected group of records and fields from two
    or more tables (join)
  – A query
  – A report
  – A set of labels
         Definitions, Continued
• Determinant
   – Determines value of another attribute; e.g. primary key
• Indexes
   – Tables that contain record numbers only arranged in an
     order based on some field value
• Entity Integrity
   – Every table must have a field to uniquely identity each
     record and there must be a field value for every record
• Referential Integrity
   – If a record has a value in a foreign key field, it must
     match an exiting value in the original table to which it
     is linked
                 1 Normal Form
• Table does not contain repeating groups
   – To put it another way, each record has at least one field
     that differentiates it from every other record in the file;
     e.g. a unique primary key
        Faculty ID is primary key and the same faculty id is associated
        with two or more courses
               Solve by creating a course file
        Faculty ID is primary key and the same faculty id is associated
        with two or more offices
               Solve by redesigning database to include offices as a
        separate table
              2 Normal Form
• Table must be in 1 Normal Form
• No non-key attribute is dependent on only
  part of the concatenated key
  – Concatenated key (two or more fields taken
    together represent primary key)
     • In course table, concatenated key is faculty ID and
       Catalog No – every field in table must be dependent
       on both faculty ID and catalog number
   Anomalies Avoided By 2 NF
• Only have one data item to change when update is
• Avoids “loose” data when deletes are made
   – When a part number is deleted, could lose reference to
   – How do you add a new course when there is no
     associated faculty ID?
   – A new office with no assigned faculty?
• Avoids inconsistent data
             3 Normal Form
• The only determinants are candidate keys
  – Candidate keys in student file are social
    security number and patron ID (both are
• To put it another way, there are no transitive
  – If student file contains Dept ID (foreign key)
    and department name, this is a transitive
        4 and 5 Normal Form
• 4 Normal Form
  – There are no multivalued dependencies – is like
    Boyce Codd
• 5 Normal Form
  – Holds only theoretical interest

Shared By: