NORMALIZATION: by mubashir2012

VIEWS: 106 PAGES: 13

									    NORMALIZATION:


    Definition:
       o It is a process with the help of which we can divide complex data structure into simplex data
           structure.
       o The process of decomposing relations with anomalies to produce smaller, well structure
           relations.
    A step by step process to produce more efficient and accurate database design
    Purpose is to produce an anomaly free design
    A strongly recommended step
    Normalization applied on each table of a DB design
    Performed after the logical database design
    Informally also performed during conceptual DB design

    Advantage:
      o On the basis of normalized design processing and manipulation of data become more easy and
         efficient
      o Normalized design makes the maintenance of database easier

    Anomalies:
      o An inconsistent, incomplete or incorrect state of database
      o State means extension of the data
      o Four types of anomalies are of concern here;
            a) Redundancy,
            b) Insertion,
            c) Deletion and
            d) Updation

    Normalization Process
      o Different forms or levels of normalization
      o Called first, second, third and so on forms
      o Each form has got certain conditions
      o If a table fulfils the condition(s) for a normal form then the table is in that normal form

    Normalized DB Design
      o Process is applied on each table
      o The minimum form in which all tables are in is called the normal form of the entire database
      o Objective is to place the DB in highest form of normalization

    Normal Forms:
      o A normal form is a state of relation that is achieved after applying certain rules of normalization
      o A state of a relation that result from applying simple rules regarding functional dependencies (or
         relationship between attributes) to that relation.

    Functional Dependency:
    Normalization is based on the concept of Functional Dependency
    A type of relationship between attributes of a relation
   FD Definition:
     o If A and B are attributes of a relation R, then B is functionally dependent on A if each value of
        A in R is associated with exactly one value of B; written as A  B
     o Explanation:
     o If it is clear to you that, with the help of A’s value you must get a value.
     o If we know StdId (A’s value) then it is 100% sure that we get a B’s value, only one value.
     o For example NIC_No, this attribute or value will return a unique person name not two or three
        person names. It will give only one unique name.
     o See if we say that what the father name of Imran is? Then we could not say that imran’s father
        name is Jamal. Because there are many imran in our database. So it means that value “imran”
        could not give a unique name.
     o If we get a unique value on the basis of another value then we can say that B is dependent on A.
        means with the help of A’s value we could fine B’s value.
     o A B. A is determinants and B is dependent
     o A Functionally determines B and B functionally dependent A
     o It does not mean that A derives B, although it may be the case sometime
     o Means that if we know value of A then we can precisely determine a unique value of B
     o Attribute of set of attributes on the left side are called determinant and on the right are called
        dependents
     o Like R (a, b, c, d, e)
     o a  b, c, d
     o d  d, e
     o FD Example
     o STD(stId, stName, stAdr, prgName, prgCredits)
     o StId  stName, stAdr, prName, credits
     o prgName  prgCredits

   Inference Rules
      o Called inference axioms (statement) or armstrong axioms
      o These are rules that establish certain FDs from a given set of FDs
      o These rules are sound / proved / accepted / valid / true.
      o Such a rules, such a statements, if we have some FDs of a table, then this Inference Rules
         exclude or deducted or assumed some FDs from give FDs
      o If we applied inference rules on give FDs we assumed some other FDs
      o The assumed FDs are sound means they are accepted, authentic as well as valid.
      1) Reflexivity
         If B is a subset of A then A  B, it also implies that A  A always hold, that is
         stName, stAdr  stName Or stName  stName

     2) Augmentation
         If we have A  B then AC  BC that is
        If stId  stName then stId, stAdr  stName, stAdr

     3) Transitivity
        If A  B and B  C then A  C that is
        If stId  prName and prName  credits then stId  credits

     4) Additivity or Union
        If A  B and A  C then A  BC
        If empId  eName and empId  qual then we can write it as empId  eName, qual
     5) Projectivity or Decomposition
        If A  BC then A  B and A  C
        If empId  eName, qual then we can write it as         empId  eName and empId  qual

     6) Pseudotransitivity
        If A  B and CB  D then AC  D
        If stId  stName and stName, fName  stAdr then we can write it as stId, fName  stAdr

   First Normal Form (1NF):
      o A relation is in first normal form iff every attribute in every tuple contains an atomic value
      o There is no multivaued (repeating group) in the relation

     o Example 1:
     CLASS (crsId, stId, stName, fId, room, grade, prgName, prgCrdits) not in 1NF

       Not in 1NF
     stId crsId         stName     fid               room          grade     prgName      prgCrdits
     1      C++ ,       Ali        Nadeem ,          B101,         A,        MCS          64
            DBMS                   Iftikhar          B102          B+
     2      VC++,       Khan       Nadeem,           B103,         B,        BCS          132
            DLD                    Iftikhar          B104          C+
     3      C++         Hassan     Nadeem            B101          B         MCS          64
     4      VC++        Omer       Shehzad           B103          A         MCS          64
     5      DLD         Saeed      Iftikhar          B104          A         BCS          132

   Solution:
     o Put all data in a separate row mean each cell must contain an atomic value and make it as a
         composite PK.
     o In above table only stId was a simple PK but in 1st NF crsId append with stId PK and now It
         becomes a composite PK. See in below table.
     o Now it is in 1NF each cell contain an atomic value.

     CLASS (crsId, stId, stName, fId, room, grade, prgName, prgCrdits) now in 1NF

         In 1NF
     stId crsId        stName fid              room grade prgName prgCrdits
     //One course is being registered by 3 student so you have to enter fid and room
     3 times
     1       C++       Ali        Nadeem       B101 A           MCS        64
     2       C++       Khan       Nadeem       B101 B           BCS        132
     3       C++       Hassan     Nadeem       B101 A+          MCS        64
     // three course are being registered by one students so you have to enter stName
     for 3 times
     1       DBMS Ali             Iftikhar     B102 C+          MCS         64
     1       VC++ Ali             Shehzad      B103 B+          MCS         64
     1       DLD       Ali        Iftikhar     B104 C           MCS         64
o Example 2 :




o Multiple values create problems in performing different operations, like, select or join
o Remember the treatment of multivalued attributes in the mapping process
   Second Normal Form
     o A relation that is in 1NF and has every nonkey attribute fully functionally depenedent on the
         primary key, that is, there is no partial dependency
     o If it has a composite PK.

     o Full functional dependency: an attribute B is fully functionally dependent on A if the B can be
       determined by whole of A not by any proper subset of A

     o Partial functional dependency: A functional dependency in which one or more nonkey
       attribute are functionally dependent on part of the primary key.


     o Consider the relation
       CLASS (crsId, stId, stName, fId, room, grade, prgName, prgCrdits)
       crsId, stId  stName, fId, room, grade

         stId  stName, prgName, prgCrdits……………….(PFD) (stName depend on part of PK)
         crsId  fId, room…………………………………..(PFD) (fId & room depend on part of PK)
         crsId, stId  grade…………………………………(FFD (grade fully depend on PK not depend
                                                  on part of PK)

     o stId, stName, fid, and room are partial functional dependent
     o because stName could be returned by using stId it means that we can get stName by using part or
       subset of pk and that is stId
     o and fId, room could be returned by using crsId it means that we can get fId and room by using
       part or subset of PK and that is crsId
     o While grade is Fully Functional Dependent on crsId and stId. It means that we could not get
       grade by using either crsId or stId, we have to use both crsId and stId then we are able to get
       grade.

     o Consider the relation
       CLASS (crsId, stId, stName, fId, room, grade)

     o Note: Keep in mind that when we enter crsId then we have to enter both fid and room
              (crsId  fId, room)
              Also keep in mind that when we enter stId then we have to enter student name
              (stId  stName)
     o The above relation is in 1st NF but not in 2nd NF.
     o If a relation is not in 2nd NF then there are 4 errors occurred.

     o Anomaly:
            It is an error or inconsistency that may arise when user tries to insert, delete, or modify
            data.

     o Types of Anomalies:
           b) Redundancy
           c) Insertion Anomaly
           d) Deletion Anomaly
           e) Updation Anomaly
a) Redundancy:-
      Duplication / Repletion
      Unnecessary duplication
      Wastage of storage

       We know that (crsId  fId, room)
       For one course ID, fid and room are fixed means if you enter crsId then you must enter
       fid and room. In other word we can say that one course would be taught by only on
       faculty member and such course would be taught in one room
       Now see the problem of Redundancy, when a course is being register by 50 students then
       you will have to enter fid and room upto fifty times.

       We know that (stId  stName, prgName, prgCrdits)
       For one student ID, stName, prgName and prgCrdits are fixed means if you enter stId
       then you must enter stName, prgName, prgCrdits
       Now see another problem of redundancy, when a student register 6 courses then we have
       to enter his name, program and program credits for 6 times
       Below table is in 1st NF because each cell contain an atomic value but not in 2nd NF

      In 1NF But not in 2NF
      stId crsId       stName fid              room grade prgName prgCrdits
      //One course is being registered by 3 student so you have to enter fid and room 3
      times
      1       C++      Ali         Nadeem      B101 A            MCS           64
      2       C++      Khan        Nadeem      B101 B            BCS           132
      3       C++      Hassan      Nadeem      B101 C+           MCS           64
      // three course are being registered by one students so you have to enter stName
      for 3 times
      1       DBMS Ali             Iftikhar    B102 A            MCS          64
      1       VC++ Ali             Shehzad B103 B+               MCS          64
      1       DLD      Ali         Iftikhar    B104 C            MCS          64

b) Insertion Anomaly
      A state / situation that is incorrect, that does not reflect the real world properly.
      If at insertion time if you face some problem means insertion problem then it is called
      insertion anomaly.
      We will not be able to insert data in such table in that form. If I enter crsId then it is clear
      that I will have to enter fid and room but see if I enter crsId means that I have offered a
      course but still there are no student who register this course. Means when I offer a course
      I will have to enter a stId., I will have to associate crsId to stId. Suppose I offer course
      and enter its crsId and leave stId NULL but see stId is a part of PK means crsId and stId
      is a composite PK and I could not give NULL in stId because it is a part of PK. So it
      means that now I am not able to enter data in stId and this is insertion anomaly.
      Now see from other side if a new student comes then definitely I can enter his name
      means stId. But when I am entering his name at that time he could not register a course
      then I will have to enter crsId NULL but again see crsId is a part of pk and I could not
      enter null. So it is insertion anomaly means that I am not able to enter data into a table
    c) Deletion Anomaly
          If you have course that is being registered by only one student and if you delete the
          student record then indirectly course data is also deleted. Now you have not such course.
          if we delete Ali’s data from above table then we will also loss course data and that is
          DBMS, Now we are in such situation in which we have not DBMS course. Now if
          another student wants to register DBMS course then we have not such course and in this
          way we loss a very important data
          If we delete DBMS course from above table then we will also loss student data that we
          don’t want to loss. Our intention is to just delete course not a student but when I delete
          course DBMS then Ali’s data will also lost and again in this way we loss a very
          important data of a student.

    d) Updation Anomaly
         If a course is being registered by 50 students and now if I wish to shift a course from one
         to another room means from room B101 to B109. and we know that (crsId  fId, room).
         Since such course is being registered by 50 students so you have to update room column
         against to all students
         See in above table a C++ course is being registered by 3 students and now If you change
         room B101 to B109 then you have to update 3 column.
         I think it is not a hard job to update just 3 records. But if we have more than thousands
         records then how you would do. I think it is not again a hard job to do update all
         thousands records but the problem is that in a huge amount of data that you want to
         update, there are many chances that some records will not be updated or you may leaved
         unintentionally and that is the problem, so it means that updation problem occur.

o Now we will have to transform the 1st NF table in to 2nd NF.

o Relation is decomposed based on the FDs

o CLASS(crId, stId, stName, fId, room, grade, prgName, prgCrdits)

o   crId, stId  stName, fId, room, grade
o   stId  stName, prgName, prgCrdits
o   crId  fId, room
o   crsId, stId  grade

o STD(stId, stName, prgName, prgCrdits)
o COURSE(crId, fId, room)
o CLASS(crId, stId, grade)
    In 1NF But not in 2NF
    stId crsId        stName fid             room grade          prgName prgCrdits
    //One course is being registered by 3 student so you have to enter fid and room 3
    times
    1       C++       Ali        Nadeem      B101     A          MCS        64
    2       C++       Khan       Nadeem      B101     B+         BCS        132
    3       C++       Hassan Nadeem          B101     C          MCS        64
    // three course are being registered by one students so you have to enter stName
    for 3 times
    1       DBMS Ali             Iftikhar    B102     B+         MCS          64
    1       VC++ Ali             Shehzad     B103     C+         MCS          64
    1       DLD       Ali        Iftikhar    B104     B          MCS          64

       o Below all three tables are now in 1NF as well as in 2NF
       o Each of these tables is in second normal form
       o Free of anomalies due to partial dependency

Now In 1NF & in 2NF
(1) STUDENT_TABLE                          (2) COURSE_TABLE               (3) CLASS_TABLE

stId    stName    prgName     prgCrdits     crsId    fid           room   stId    crsId    grade
1       Ali       MCS         64            C++      Nadeem        B101   1       C++      A
2       Khan      BCS         132           DBMS     Iftikhar      B102   1       DBMS     B+
3       Hassan    MCS         64            VC++     Shehzad       B103   1       VC++     C+
                                            DLD      Iftikhar      B104   1       DLD      B
                                                                          2       C++      B+
                                                                          3       C++      C

       o From above three tables I can draw ERD
       o 1st see the pk of all tables stId is a pk of STUDENT_TABLE, crsId is a PK of
         COURSE_TABLE and stId and crsId both make a composite PK for CLASS_TABLE.
       o A student can register many course and a course is being registered by many course which
         indicates that there is a many to many relationship between student and course.
    Third Normal Form
      o A table is in third normal form (3NF) iff it is in 2NF and there is no transitive dependency, that
          is, no non-key attribute is dependent on another non-key attribute

       o Transitive Dependency
             There is a transitive dependency in STUDENT_TABLE

                STD(stId, stName, stAdr, prgName, prgCrdts)
                stId  stName, stAdr, prName, prCrdts
                prgName  prgCrdts

       o Now there are anomalies but not due to PFD but due to Transitive dependency

       o When we applied normalization rules on a table upto 2NF we decompose our one table into 3
         simple tables, but there is one table (STUDENT_TABLE) that is in 1NF and 2NF but it is not in
         3NF and the rest of two tables are already in 1NF and 2NF and by default it is also in 3NF. So
         there is only one table that create some problem means to must apply 3rd normalization rule and
         that is 3NF (To remove transitive dependency)

      In 1NF But not in 2NF & 3NF
      stId crsId         stName fid                room       grade        prgName prgCrdts
      //One course is being registered by 3 student so you have to enter fid and room 3 times
      1        C++       Ali        Nadeem         B101       A            MCS        64
      2        C++       Khan       Nadeem         B101       B+           BCS        132
      3        C++       Hassan Nadeem             B101       C            MCS        64
      // three course are being registered by one students so you have to enter stName for 3
      times
      1        DBMS Ali             Iftikhar       B102       B+           MCS        64
      1        VC++      Ali        Shehzad        B103       C+           MCS        64
      1        DLD       Ali        Iftikhar       B104       B            MCS        64


(1) STUDENT_TABLE                            (2) COURSE_TABLE              (3) CLASS_TABLE

    In 1NF, 2NF But not in 3NF                In 1NF, 2NF & 3NF              In 1NF, 2NF & 3NF
    stId stName prgName prgCrdits             crsId   fid      room          stId crsId    grade
    1    Ali      MCS        64               C++     Nadeem B101            1     C++     A
    2    Khan     BCS        132              DBMS Iftikhar B102             1     DBMS B+
    3    Hassan MCS          64               VC++ Shehzad B103              1     VC++ C+
                                              DLD     Iftikhar B104          1     DLD     B
                                                                             2     C++     B+
                                                                             3     C++     C
(1) STUDENT_TABLE

In 1NF, 2NF But not in 3NF
stId stName prgName prgCrdits
1    Ali      MCS        64
2    Khan     BCS        132
3    Hassan MCS          64
4    Jamal    MCS        64
5    Aslam    MCS        64
6    Omer     MCS        64

    o Only this table is not in 3NF because there is a transitive dependency

             STD(stId, stName, stAdr, prgName, prgCrdts)
             stId  stName, stAdr, prName, prCrdts
             prgName  prgCrdts

    o prgName is a non-key and it can determine prgCrdits and this is Transitive Dependency.

       a) Redundancy:-
             When we enter student then we must enter prgName and prgCrdits. If there are 50
             students get admission in MCS, so when we enter student data then we must enter 50
             times both prgName and prgCrdits which is redundancy

       b) Insertion Anomaly:
             If we enter or introduce a new program like Ph.D. and when I enter Ph.D the see the table
             I will have to enter a student means a program must be associated with a student
             according to Table point of view.
             Here we just want to offer a new program so I will enter the name of new program but
             since stId is a PK so I could not leave it as null, pk never null. So here I am not able to
             enter a program name and this is insertion anomaly

       c) Deletion Anomaly:
             If you have program that is being registered by only one student (like Khan in BCS) and
             if you delete the student record then indirectly program data is also deleted. Now you
             have not such program. if we delete Khan’s data from above table then we will also loss
             program data and that is BCS, Now we are in such situation in which we have not BCS
             program. Now if another student wants to apply for BCS program then we have not such
             program and in this way we loss a very important data
             If we delete BCS program from above table then we will also loss student data that we
             don’t want to loss. Our intention is to just delete program not a student but when I delete
             program BCS then Khan’s data will also lost and again in this way we loss a very
             important data of a student.
   d) Updation Anomaly:
        If there are 50 students in MCS program and now if I wish to change either increase of
        decrease MCS program credits, then 50 times I will change MCS program credits against
        to those students that have been register in such program and we know that
        (prgName  prgCrdits). Since in such program there are 50 students so you have to
        update program credits against to all students that are in MCS program
        See in above table there 5 students in MCS program and now If you change program
        credits from 64 to 86 then I will have to update 5 column.
        I think it is not a hard job to update just 5 records. But if we have more than thousands
        records then how you would do. I think it is not again a hard job to do update all
        thousands records but the problem is that in a huge amount of data that you want to
        update, there are many chances that some records will not be updated or you may leaved
        unintentionally and that is the problem, so it means that updation problem occur.

o Now we will have to transform the 2nd NF table in to 3rd NF.

o Relation is decomposed based on the Transitive Dependency

         STD(stId, stName, stAdr, prgName, prgCrdts)
         stId  stName, stAdr, prName, prCrdts
         prgName  prgCrdts

         In 1NF, 2NF But not in 3NF
         stId stName prgName prgCrdits
         1    Ali      MCS        64
         2    Khan     BCS        132
         3    Hassan MCS          64
         4    Jamal    MCS        64
         5    Aslam    MCS        64
         6    Omer     MCS        64

         STD(stId, stName, stAdr, prgName)
         PROG(prName, prCrdts)

Now due to transitive dependency the student table decomposes into two tables and now it is in 1NF,
2NF and also in 3NF.

        Now In 1NF, 2NF & in 3NF
        STUDENT_TABLE            PROGRAM_TABLE

         stId   stName    prgName       prgName prgCrdits
         1      Ali       MCS           MCS     64
         2      Khan      BCS           BCS     132
         3      Hassan    MCS
         4      Jamal     MCS
         5      Aslam     MCS
         6      Omer      MCS
   Boyce-Codd Normal Form (BCNF)
     o A general form of 3NF, you can’t say that it is a 4NF, but it is a generalize form of 3NF.
     o There are many similarities and some differences between BCNF and 3NF
     o Every relation in BCNF is in 3NF vice-versa is not always true
     o 90% guarantee that if a relation is in 3NF then it will be in BCNF
     o 100% guarantee that if a relation is in BCNF then it must be in 3NF
     o 3NF is checked in steps, BCNF checked directly
     o A table is in BCNF if every determinant is a candidate key
     o Situation when table in 3NF is not in BCNF
     o A non-key determines a part of the composite primary key
          o   FACULTY(fId, deptt, office_No, BPS, dateHired)
          o   fId, dept  office, PBS, dateHired
          o   office_No  deptt
          o   Table is in 3NF, not in BCNF since the office is not a candidate key

                   FACULT_TABLE

                   In 1NF, 2NF & in 3NF But Not in BCNF
                   fId deptt     office_No BPS         Date_Hired
                   1    DCS      B004       17         20/08/2000
                   2    DCS      B003       19         21/08/2001
                   1    FMS      E102       18         22/09/2009
                   3    DCS      B004       18         12/03/2001
                   3    English F107        19         09/08/2007
                   4    Math     B101       17         05/03/2005

          o Here a non-key determines a part of Composite PK,
          o Office_No  deptt
          o When I enter office_No then it returns a unique Department Name.

     BCNF
       o We decompose the table again to bring it into BCNF

              FACULTY (fId, dept, office, rank, dateHired)

              FACULTY (fId, office, rank, dateHred)
              OFFICE (office, dept)

In 1NF, 2NF, 3NF & in BCNF

FACULT_TABLE                                               OFFICE_TABLE

    fId   office_No    deptt     BPS    Date_Hired           office_No   deptt
    1     B004         DCS       17     20/08/2000           B004        DCS
    2     B003         DCS       19     21/08/2001           B003        DCS
    1     E102         FMS       18     22/09/2009           E102        FMS
    3     B004         DCS       18     12/03/2001           B004        DCS
    3     F107         English   19     09/08/2007           F107        English
    4     B101         Math      17     05/03/2005           B101        Math


Summary
Normalization is the process of structuring relational database schema such that most ambiguity is
removed. The stages of normalization are referred to as normal forms and progress from the least
restrictive (First Normal Form) through the most restrictive (Fifth Normal Form). Generally, most
database designers do not attempt to implement anything higher than Third Normal Form or Boyce-Codd
Normal Form. We have started the process of normalization in this lecture. We will cover this topic in
detail in the coming lectures.

								
To top