Normalization

Document Sample
Normalization Powered By Docstoc
					Normalization
By Albert Lin
Basics
 Process of efficiently organizing data
  in a database.
 Goals
     Eliminate redundant data
     Ensure data dependency sensibility




                                           2
Benefits
 Faster sorting and index creation
 Larger clustered indexes
 Narrow and compact indexes
 Fewer indexes per table.
 Fewer null values
 Less inconsistencies




                                      3
Brief Description of Normal
Forms
   1st Normal Form
         No repeating groups
   2nd   Normal Form
         Each column depends entirely on the primary key.
   3rd   Normal Form
         Each column depends directly on the primary key.
   4th   Normal Form
         Isolate independent multiple relationships.
   5th   Normal Form
         Isolate Semantically related multiple relationships.
   Boyce-Codd Normal Form
         Non Trivial dependencies between candidate keys.
   Optimal Normal Form
         Limited to simple Facts
   Domain-Key Normal Form
         No modification anomolies

                                                                 4
Additional Information
   1st Normal Form
       Addresses the structure of an isolated
        table.
   2nd, 3rd and Boyce Codd Normal Form
       Addresses 1-1 and 1-many relationships.
   4th and 5th Normal Form
       Many-Many relationships.
   These forms are cumulative.

                                                 5
Anomalies
   Insertion Anomalies
       Duplication of data enforced
       impossible due to entity integrity
   Deletion Anomalies
       Leads to undesired loss of data.
   Modification Anomalies
       Modification of multiple rows can lead to
        inconsistencies.
   Databases that are sufficiently normalized
    can reduce anomalies from occurring.
                                                    6
Tips
 Make a table for each list.
 Use non-meaningful primary keys
 Eliminate Repeating Groups
 Eliminate Columns not dependent on
  Primary Key
 Each table should have an identifier
 Should only store data for a single
  type of entity.
 Avoid null columns
                                     7
Additional Tips
 Records are free, new fields are
  expensive.
 Know when data requires duplication
 Use referential integrity
 The higher normal form generally
  results in faster data retrieval.



                                    8
Questions to ask
 What data do you need?
 What are you going to do with the
  data?
 How are the data related?
 What is the future of the data?




                                      9
Examples of Bad Databases
SID    Class   Time    Location Prof_ID
0015   CS166   10:30   MH226    34215
1205   CS146   12:00   MH225    25123
0025   CS166   1:30    MH222    34215
0015   CS151   8:00    MH222    24215
2531   CS156   9:00    MH422    25124
1205   CS120   10:30   DH251    12412
2522   CS046   8:00    MH224    34215
1523   CS140   5:00    MH422    25123
0015   CS140   3:00    MH422    34215 10
Problems with the database
 In a large scale database, information
  would be repeated numerous times,
  resulting in redundant data
 Retrieval of data would be difficult
  and long
 Index creation would be difficult.




                                      11
In Practice
   Many databases are “de-normalized” to
    some degree.
   This is due to performance issues.
   It may require fewer joins and result in
    faster retrievals.
   However, before doing “de-normalization”
    performance issues must exist and de-
    normalization must dramatically improve it
    before introducing a suboptimal design
   A de-normalized table can be harder to
    update.                                  12
Overall topics
 Normalization helps organization and
  speed of organizing a database,
  which can help a company produce a
  database system that is quick and
  easily accessible.
 The higher the normal form, the less
  chance anomalies will arise.
 In case of performance issues, de-
  normalization can be done in order to
  increase performance.              13
Final Motto:
Strive for Single Themed Tables.
References
   Litt, Steve. “Normalization”. 1996
    http://www.troubleshooters.com/littstip/ltnorm.html
   “Rules of Data Normalization”. 2005.
    http://www.datamodel.org/NormalizationRules.html
   “Normalization”
    http://msdn.microsoft.com/library/default.asp?url=/library/en-
    us/createdb/cm_8_des_02_2oby.asp
   Chung,Luke. “Database Normalization Tips”. FMS. 2001.
    http://www.fmsinc.com/tpapers/genaccess/databasenorm.html
   Janert, Phillip. “Practical Database”. IEEE. 2004.
    http://www.devx.com/ibm/Article/20859
   Reus, Bernhard. “Databases”. University of Sussex. 2004.
    http://www.informatics.sussex.ac.uk/users/bernhard/db2005/Slides
    /dbXIII.pdf
   Wyllys, R.E. “Steps in Normalization” University of Texas at Austin.
    2003.
    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html
    #Section%206.%20Anomalies%20and%20Normalization


                                                                    15

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:18
posted:12/10/2011
language:
pages:15