Docstoc

29SpCS157BL174Normal_Form.ppt - Lecture 4 Normal Forms-BCNF

Document Sample
29SpCS157BL174Normal_Form.ppt - Lecture 4 Normal Forms-BCNF Powered By Docstoc
					Lecture 17    4th   Normal Form

        Prof. Sin-Min Lee
  Department of Computer Science
            Functional Dependencies
                                      o Dependencies for
R      A     B    C    D    E    F      this relation:
       a1    b1   c1   d1   e1   f1
                                         AB
       a1    b1   c2   d1   e2   f3
                                         AD
       a2    b1   c2   d3   e2   f3
       a3    b2   c3   d4   e3   f2
                                         BC  EF
       a2    b1   c3   d3   e4   f4   o Do they all hold in
       a4    b1   c1   d5   e1   f1     this instance of
                                        the relation R?

    • Functional dependencies are specified by the database
      programmer based on the intended meaning of the
      attributes.
              Armstrong’s Axioms
• Armstrong’s Axioms: Let X, Y be sets of attributes from a
  relation T.
   [1] Inclusion rule:      If Y  X, then X  Y.
   [2] Transitivity rule:   If X  Y, and Y  Z, then X  Z.
   [3] Augmentation rule:   If X  Y, then XZ  YZ.
• Other derived rules:
   [1] Union rule:         If X  Y and X  Z, then X  YZ
   [2] Decomposition rule: If X  YZ, then X  Y and X  Z
   [3] Pseudotransitivity:   If X  Y and WY  Z, then XW  Z
   [4] Accumulation rule: If X  YZ and Z  BW,
                                    then X  YZB
                           Closure
• Let F be a set of functional dependencies.
• We say F implies a functional dependency g if g can be
  derived from the FDs in F using the axioms.
• The closure of F, written as F+, is the set of all FDs that
  are implied by F.
• Example: Given FDs { A  BC, C  D, AD  B },
  what is the closure?
• The closure is a potentially exponential set:
    Trivial dependencies: {A A, B B,…,ABC ABC,…}, other
       dependencies obtained by augmentation {AB ABC, BC BD,…},
       dependencies obtained by other rules (or multiple rules), {A  BC, C
        D, AD  B, A  D }
                    Closure
• Given a set F of functional dependencies. A
  functional dependency X  Y is said to be
  entailed (implied) by F, if X  Y is in F+.
• To sets of functional dependencies F1, F2 are said
  to be equivalent, if their closures are equivalent,
  I.e. F1+ = F2+.
          Checking Entailment
• To find whether a functional dependency X  Y
  is implied by a set of functional dependencies F
   – We can apply all the rules in Armstrong’s Axioms to
     find whether we can obtain this dependency from the
     dependencies in F
   OR
   – We can determine the closure of attributes X, denoted
     by X+ with respect to F, and check whether Y  X+
 Closure of a set of attributes
• Given a set of F of functional dependencies, the
  closure of a set of attributes X, denoted by X+ is the
  largest set of attributes Y such that X  Y.
  Algorithm Closure(X,F)
  X[0] = X; I = 0;
  repeat
        I = I + 1;
        X[I] = X[I-1];
        FOR ALL Z  W in F
                   IF Z  X[I] THEN X[I] = X[I]  W
        END FOR
  until X[I] = X[I-1];
  RETURN X+ = X[I]
                    Entailment
• Given F = { C  DE, AB  CE, EB  CF, G
  A}
• Find: GB+
  –   Initialize: GB+ = {G,B}
  –   Use G  A , add A, GB+ = {A,B,G}
  –   Use AB  CE, add C,E, GB+ = {A, B, C, E, G}
  –   Use C  DE, add D, GB+ = {A, B, C, D, E, G}
  –   Use EB  CF, add F, GB+ = {A, B, C, D, E, F, G}
  –   Incidentally, GB is a superkey. Is it also a key?
   Closure of a set of attributes
• Given a set of functional dependencies F for
  a relation R, X is said to be a superkey, if
  X+ contains all the attributes in R.
  – In other words, X implies all other attributes.
  – Alternatively, if two tuples are the same with
    respect to X then they should be the same with
    respect to all other attributes.
      Boyce-Codd Normal Form
• A table T is said to be in Boyce-Codd Normal Form
  (BCNF) with respect to a given set of functional
  dependencies F if for all functional dependencies of the
  form X  A entailed by F the following is true:
   – If A is not a subset of X then X is a superkey, or
   – If A is not contained in X then, X contains all the
     attributes in a key.
• Given {AB  C, AB  D, AE  D, C  F} with Key:
  {A,B,E}
   – not in BCNF since C is a single attribute not in AB, but AB is not a
     superkey.
       Boyce-Codd Normal Form
Given head(T)={A,B,C,D,E,F} with functional dependencies
{AC  D, AC  E, AF  B, AD  F, BC  A, ABC  F } and
  keys: {A, C}, {B, C}, is this relation T in BCNF?


        No. It is sufficient to find one violation!
            – AF  B violates BCNF since B is not in AF and AF
              is not a superkey.
            – AD  F violates BCNF since F is not in AD and AD
              is not a superkey.
        Note: ABC  F does not violate BCNF since ABC is a
        superkey.
                   Decomposition
• A decomposition of a relation R with functional
  dependency set F is a sequence of pairs of the form
   – (R1,F1), …, (Rn,Fn) such that
   – The union of attributes in R1,…,Rn is equivalent to the attributes
     in R
   – All functional dependencies in F1,…,Fn are entailed by F.
• A decomposition is obtained by a simple projection of R
  onto the attributes in the decomposed relations.
   – For example, given R1 with schema R1(A1) , then
                R1 = A1 (R)
      Lossless Decompositions
• A decomposition of R to (R1, F1) and (R2,F2) is
  said to be lossless, if we join R1 and R2 on the
  common attributes, we are guaranteed to get R.
• In other words, given R1(A1) and R2(A2), we
  need to make sure that it is always the case that
      R=R1 join(A1A2) R2

where join(A1A2) means equi-join on the common
 attributes in A1 and A2.
R   A        B     C     D          R1   A    B    R2   B     C    D
    a1       b1    c1    d1              a1   b1        b1    c1   d1
    a1       b1    c2    d1              a2   b1        b1    c2   d1
    a2       b1    c2    d3              a3   b2        b1    c2   d3
    a3       b2    c3    d4              a4   b1        b2    c3   d4
    a4       b1    c1    d1


R1 join R2    A     B     C    D
              a1    b1    c1   d1                 {R1, R2} is not a
              a1    b1    c2   d1
                                               lossless decompostion
              a1    b1    c2   d3
              a2    b1    c1   d1
                                                    of relation R.
              a2    b1    c2   d1
              a2    b1    c2   d3             The join with respect to
              a3    b2    c3   d4             R1.B=R2.B is not equal
              a4    b1    c1   d1                      to R.
              a4    b1    c2   d1
              a4    b1    c2   d3
      Lossless Decomposition
Given a relation R with a set F of functional
 dependencies, and a decomposition of to
 (R1, F1) and (R2,F2) such that R1(A1) and
 R2(A2) is said to be lossless iff either

  A1A2  A1 or A1A2  A2

is entailed by F.
     Lossless Decompositions
Let F={AB  C, CD  E, AB  E, DE  AF}
and
  R1(A,B,C,D) F1={AB C}
  R2(C,D,E,F) F2={CD E,DE F}

 is this lossless?
 ABCD  CDEF = CD
 is CD CDEF entailed by F?
       Dependency preservation
•    A decomposition of relation (R,F) into (R1(A1),
     F1) and (R2(A2), F2) is said to be dependency
     preserving iff F1F2 is equivalent to F.
•    To check whether F1F2 is equivalent to F, we
     need to check
    1. Whether all functional dependencies in F1F2 are
       entailed by F, and
    2. Whether all functional dependencies in F are entailed
       by F1F2 .
      Dependency preservation
• Given a decomposition R1(A1) of a relation R
  with functional dependency set F, the only
  dependencies that can be preserved in R1 are all
  dependencies in F+ of the form B  C such that
  BC  A1.
Let F={AB  C, CD  E, AB  E, DE  AF}
Find all the functional dependencies that can be
  preserved in:
  R1(A,B,C,D)
  R2(C,D,E,F)
      Dependency preservation
• When a decomposition is lossy, then information
  connecting tuples is lost. We get errorneous
  information.
• When dependencies are lost in a decomposition,
  they cannot be enforced as table constraints. They
  have to be enforced as additional constraints.
• It is vital that decompositions are lossless. It is
  important but not vital that decompositions are
  dependency preserving.
                 Normal Forms
• If a relation is not in BNCF normal form, then it
  can be decomposed by lossless decompositions
  into smaller relations that are in BCNF.
• BCNF decomposition:
   – Until all relations are in BCNF
      • Find a dependency X  Y in R(A) that violated BCNF
      • Replace R, with R1(XY), and R2(A-Y)X
• This algorithm may cause many dependencies to
  be lost.
                3NF Conversion

• Given R(A,B,C,D,E,F) and
    F= { ABCD, ABCE, BDE, EC, EF}
• Keys: ABC and ABE.
• Not in BNCF (violations: BDE, EC, EF),
• Not in 3NF (violations: EF).
• Convert to 3NF using the algorithm:
  – First compact functional dependencies with common left side, to get
    F= { ABCDE, BDE, ECF}
  – Create relations R1(A,B,C,D,E), R2(B,D,E), R3(E,C,F)
  – Since there exists relations that contain ABC, and ABE, we are done.
  – Incidentally, all relations are also in BCNF.
 Multivalued Dependencies
         (MVDs)
• Let R be a relation schema and let   R
  and   R. The multivalued dependency
                        
  holds on R if in any legal relation r(R), for
  all pairs for tuples t1 and t2 in r such that
  t1[] = t2 [], there exist tuples t3 and t4 in r
  such that:
               t1[] = t2 [] = t3 [] = t4 []
               t3[]       = t1 []
               t3[R – ] = t2[R – ]
               t4 []      = t2[]
               t [R – ] = t [R – ]
                      Motivation
  • There are schemas that are in BCNF that do not
    seem to be sufficiently normalized
Stars
  name       street         city             title          year
 C. Fisher 123 Maple Str. Hollywood       Star Wars         1977
 C. Fisher 5 Locust Ln.   Malibu          Star Wars         1977
 C. Fisher 123 Maple Str. Hollywood Empire Strikes Back     1980
 C. Fisher 5 Locust Ln.   Malibu      Empire Strikes Back   1980
 C. Fisher 123 Maple Str. Hollywood    Return of the Jedi   1983
 C. Fisher 5 Locust Ln.   Malibu       Return of the Jedi   1983
         Attribute Independence

• No reason to associate address with one movie
  and not another
• When we repeat address and movie facts in all
  combinations, there is obvious redundancy
• However, NO BCNF violation in Stars
  relation
   – There are no non-trivial FD’s at all, all five attributes
     form the only superkey
   – Why?
      Multi-valued Dependency
Definition: Multivalued dependency (MVD):
 A1A2…An  B1B2…Bm holds for relation R if:
 For all tuples t, u in R
 If t[A1A2...An] = u[A1A2...An], then there exists a v in R
 such that:
     (1) v[A1A2...An] = t[A1A2...An] = u[A1A2...An]
     (2) v[B1B2…Bm] = t[B1B2…Bm]
     (3) v[C1C2…Ck] = u[C1C2…Ck], where C1C2…Ck is all
        attributes in R except (A1A2...An  B1B2…Bm)
       Example: name  street city

    Stars
      name       street         city             title          year
t    C. Fisher 123 Maple Str. Hollywood       Star Wars         1977
     C. Fisher 5 Locust Ln.   Malibu          Star Wars         1977
v    C. Fisher 123 Maple Str. Hollywood Empire Strikes Back     1980
u    C. Fisher 5 Locust Ln.   Malibu      Empire Strikes Back   1980
     C. Fisher 123 Maple Str. Hollywood    Return of the Jedi   1983
     C. Fisher 5 Locust Ln.   Malibu       Return of the Jedi   1983
       Example: name  street city
    Stars

      name       street         city             title          year
u    C. Fisher 123 Maple Str. Hollywood       Star Wars         1977
w    C. Fisher 5 Locust Ln.   Malibu          Star Wars         1977
v    C. Fisher 123 Maple Str. Hollywood Empire Strikes Back     1980
t    C. Fisher 5 Locust Ln.   Malibu      Empire Strikes Back   1980
     C. Fisher 123 Maple Str. Hollywood    Return of the Jedi   1983
     C. Fisher 5 Locust Ln.   Malibu       Return of the Jedi   1983
                    More on MVDs
• Intuitively, A1A2…An  B1B2…Bm says that the
  relationship between A1A2…An and B1B2…Bm is independent
  of the relationship between A1A2…An and R -{B1B2…Bm}
   – MVD's uncover situations where independent facts related to a certain
     object are being squished together in one relation
• Functional dependencies rule out certain tuples from being in
  a relation
   – How?
• Multivalued dependencies require that other tuples of a certain
  form be present in the relation
   – a.k.a. tuple-generating dependencies
                     Let’s Illustrate
• In Stars, we must repeat the movie (title, year) once for
  each address (street, city) a movie star has
   – Alternatively, we must repeat the address for each movie a star has
     made
• Example: Stars with name  street city
   name         street          city               title            year
  C. Fisher 123 Maple Str. Hollywood            Star Wars           1977
  C. Fisher 5 Locust Ln.       Malibu     Empire Strikes Back       1980
  C. Fisher 123 Maple Str. Hollywood        Return of the Jedi      1983

  • Is an incomplete extent of Stars
      – Infer the existence of a fourth tuple under the given MVD
           Trivial MVDs

• Trivial MVD
  A1A2…An  B1B2…Bm where
  B1B2…Bm is a subset of A1A2…An or
  (A1A2…An  B1B2…Bm ) contains all
  attributes of R
     Reasoning About MVDs
• FD-IS-AN-MVD Rule (Replication)
  If A1A2…An  B1B2…Bm then
         A1A2…An  B1B2…Bm holds
      Reasoning About MVDs
• COMPLEMENTATION Rule
 If A1A2…An  B1B2…Bm then A1A2…An 
 C1C2…Ck where C1C2…Ck is all attributes in R except
 (A1A2…An  B1B2…Bm )
• AUGMENTATION Rule
 If XY and WZ then WX YZ
• TRANSITIVITY Rule
 If XY and YZ then X  (Z-Y)
              Coalescence Rule for MVD

    X  Y
If:                     Then: X  Z
  W:W  Z


Remark: Y and W have to be disjoint and Z has to be a subset of or
equal to Y
             Definition 4NF
• Given: relation R and set of MVD's for R
• Definition: R is in 4NF with respect to its
  MVD's if for every non-trivial MVD
  A1A2…AnB1B2…Bm , A1A2…An is a
  superkey

• Note: Since every FD is also an MVD, 4NF
  implies BCNF
• Example: Stars is not in 4NF
        Decomposition Algorithm
(1) apply closure to the user-specified FD's and MVD's**:
(2) repeat until no more 4NF violations:
      if R with AA ->> BB violates 4NF then:
         (2a) decompose R into R1(AA,BB) and
                                 R2(AA,CC), where CC is all
              attributes in R except (AA  BB)
         (2b) assign FD's and MVD's to the new relations**

                         ** MVD's: hard problem!
• No simple test analogous to computing the attribute closure for FD’s
  exists for MVD’s. You are stuck to have to use the 5 inference rules
  for MVD’s when computing the closure!
               Exercise
• Decompose Stars into a set of relations
  that are in 4NF.
• namestreet city is a 4NF
  violation
• Apply decomposition:
   R(name, street, city)
   S(name, title, year)
• What about namestreet city in R
  and nametitle year in S?
             MVD (Cont.)
• Tabular representation of   
X ->> Y is trivial if
(a) Y  X or
(b) Y U X = R
        Multivalued Dependencies

• There are database schemas in BCNF that do not
  seem to be sufficiently normalized
• Consider a database
      classes(course, teacher, book)
  such that (c,t,b)  classes means that t is qualified
  to teach c, and b is a required textbook for c
• The database is supposed to list for each course
  the set of teachers any one of which can be the
  course’s instructor, and the set of books, all of
  which are required for the course (no matter who
  teaches it).
     Multivalued Dependencies
            course            teacher            book
      database            Avi              DB Concepts
      database            Avi              Ullman
      database            Hank             DB Concepts
      database            Hank             Ullman
      database            Sudarshan        DB Concepts
      database            Sudarshan        Ullman
      operating systems   Avi              OS Concepts
      operating systems   Avi              Shaw
      operating systems   Jim              OS Concepts
      operating systems   Jim              Shaw
                               classes
• There are no non-trivial functional dependencies and
  therefore the relation is in BCNF
• Insertion anomalies – i.e., if Sara is a new teacher that can
  teach database, two tuples need to be inserted
                (database, Sara, DB Concepts)
                (database, Sara, Ullman)
       Multivalued Dependencies
• Therefore, it is better to decompose classes
  into:
            course                 teacher
      database               Avi
      database               Hank
      database               Sudarshan
      operating systems      Avi
      operating systems      Jim
                      teaches
             course                  book
      database                   DB Concepts
      database                   Ullman
      operating systems          OS Concepts
      operating systems          Shaw
                          text
We shall see that these two relations are in Fourth Normal
Form (4NF)
 Multivalued Dependencies
         (MVDs)
• Let R be a relation schema and let   R
  and   R. The multivalued dependency
                        
  holds on R if in any legal relation r(R), for
  all pairs for tuples t1 and t2 in r such that
  t1[] = t2 [], there exist tuples t3 and t4 in r
  such that:
               t1[] = t2 [] = t3 [] = t4 []
               t3[]       = t1 []
               t3[R – ] = t2[R – ]
               t4 []      = t2[]
               t [R – ] = t [R – ]
             MVD (Cont.)
• Tabular representation of   
  4th Normal Form

No multi-valued dependencies
          4th Normal Form
Note: 4th Normal Form violations occur
 when a triple (or higher) concatenated key
 represents a pair of double keys
4th Normal Form
         4th Normal Form
        Multuvalued dependencies
Instructor     Book             Class
Price          Inro Comp        MIS 2003
Parker         Intro Comp       MIS 2003
Kemp           Data in Action   MIS 4533
Kemp           ORACLE Tricks MIS 4533
Warner         Data in Action   MIS 4533
Warner         ORACLE Tricks MIS 4533
         4th Normal Form
INSTR-BOOK-COURSE(InstrID, Book,
  CourseID)



COURSE-BOOK(CourseID, Book)
COURSE-INSTR(CourseID, InstrID)
                         4NF
        (No multivalued dependencies)
Independent repeating groups have been treated as a
complex relationship.
   TABLE                                    TABLE




                       TABLE
    TABLE                                   TABLE


                       TABLE
                    Example
• Let R be a relation schema with a set of attributes
  that are partitioned into 3 nonempty subsets.
                     Y, Z, W
• We say that Y  Z (Y multidetermines Z)
  if and only if for all possible relations r(R)
        < y1, z1, w1 >  r and < y2, z2, w2 >  r
  then
        < y1, z1, w2 >  r and < y2, z2, w1 >  r
• Note that since the behavior of Z and W are
  identical it follows that Y  Z if Y  W
•
               Theory of MVDs
    From the definition of multivalued dependency,
    we can derive the following rule:
    – If   , then   
  That is, every functional dependency is also a
  multivalued dependency
• The closure D+ of D is the set of all functional and
  multivalued dependencies logically implied by D.
    – We can compute D+ from D, using the formal
      definitions of functional dependencies and multivalued
      dependencies.
    – We can manage with such reasoning for very simple
      multivalued dependencies, which seem to be most
      common in practice
    – For complex dependencies, it is better to reason about
      sets of dependencies using a system of inference
      rules
          Fourth Normal Form
• A relation schema R is in 4NF with respect
  to a set D of functional and multivalued
  dependencies if for all multivalued
  dependencies in D+ of the form   ,
  where   R and   R, at least one of the
  following hold:
  –    is trivial (i.e.,    or    = R)
  –  is a superkey for schema R
• If a relation is in 4NF it is in BCNF
     Restriction of Multivalued
           Dependencies
• The restriction of D to Ri is the set Di
  consisting of
  – All functional dependencies in D+ that include
    only attributes of Ri
  – All multivalued dependencies of the form
        (  Ri)
    where   Ri and    is in D+
     4NF Decomposition Algorithm
   result: = {R};
  done := false;
  compute D+;
  Let Di denote the restriction of D+ to Ri
  while (not done)
     if (there is a schema Ri in result that is not in 4NF) then
        begin
         let    be a nontrivial multivalued dependency
  that holds
            on Ri such that   Ri is not in Di, and ;
          result := (result - Ri)  (Ri - )  (, );
        end
     else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless-join
                        Example
• R =(A, B, C, G, H, I)
  F ={ A  B
       B  HI
       CG  H }
• R is not in 4NF since A  B and A is not a superkey for
  R
• Decomposition
  a) R1 = (A, B)                    (R1 is in 4NF)
  b) R2 = (A, C, G, H, I)           (R2 is not in 4NF)
  c) R3 = (C, G, H)        (R3 is in 4NF)
  d) R4 = (A, C, G, I)              (R4 is not in 4NF)
• Since A  B and B  HI, A  HI, A  I
  e) R5 = (A, I)                    (R5 is in 4NF)
  f)R6 = (A, C, G)         (R6 is in 4NF)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:2/20/2011
language:English
pages:64