C363C607 Introduction to Database Systems

Document Sample
C363C607 Introduction to Database Systems Powered By Docstoc
					Introduction to Database Systems


            CS363/607*

            Lecture #9
     Third Normal Form - Motivation
   There is one structure of FD’s that causes
    trouble when we decompose.
   AB ->C and C ->B.
       Example: A = street address, B = city,   C=
        zip code.
   There are two keys, {A,B } and {A,C }.
   C ->B is a BCNF violation, so we must
    decompose into AC, BC.
We Cannot Enforce FD’s
   The problem is that if we use AC and BC as
    our database schema, we cannot enforce the
    FD AB ->C by checking FD’s in these
    decomposed relations.
Example
   Booking: {title, theater, city}
   FD’s: theater  city
          title city  theater
Third Normal Form (3NF)
   Relaxed condition:
    A relation R is in third normal form (3NF) if:
    whenever A1A2…An  B is a nontrivial FD,
    either {A1, A2, …, An} is a superkey, or B is a
    member of some key (prime).
    3NF Let’s Us Avoid This Problem
   3rd Normal Form (3NF) modifies the BCNF
    condition so we do not have to decompose
    in this problem situation.
   An attribute is prime if it is a member of
    any key.
   X ->A violates 3NF if and only if X is not a
    superkey, and also A is not prime.
Other normal forms
   First normal form (1NF):
    every component of every tuple is an atomic
    value.
   Second normal form (2NF):
    a nontrivial FD with a left side that is a proper
    subset of a key is not allowed.
    but, a transitive FD is ok.
   Fourth normal form (4NF)
Decomposition Algorithm (Synthesis)
   Input: R, and a set of FD’s F
   Output: a collection of relations in 3NF
   Method:
       Find a minimal basis for F, say G
       For each XA in G, use XA as the schema of one
        of the relations in the decomposition
       If none of the relations is a superkey for R, add
        another relation whose schema is a key for R.
Example
   R(A, B, C, D, E), FD’s: AB  C, C  B, and A
     D.
       First, verify if the FD’s the minimal set
       Second, verify can not eliminate any attributes
        from a left side
       Third, taking the attributes of each FD to create
        new relations
       Forth, drop subset relations
       Fifth, add new relation whose schema is a key
Exercise 3.5.1
   a) R (A, B, C, D) with FD’s AB  C, C  D,
    and D A
   b) R (A, B, C, D) with FD’s B  C and B  D
   f) R (A, B, C, D, E) with FD’s AB  C, C  D,
    D  B and D E.
Multivalued dependencies
   A multivalued dependency is that two
    attributes or sets of attributes are
    independent of one another.
   Multivalued dependencies (MVD’s) express a
    condition among tuples of a relation that
    exists when the relation is trying to represent
    more than one set-valued properties in a
    single relation.
   A relation could be in BCNF, but still have a
    kind of redundancy.
    Example
Name      Street         city       Title                 Year

C. Fisher 123 Maple St. Hollywood   Star Wars             1977

C. Fisher 5 Locust Ln.   Malibu     Star Wars             1977

C. Fisher 123 Maple St. Hollywood   Empire Strikes Back   1980

C. Fisher 5 Locust Ln.   Malibu     Empire Strikes Back   1980

C. Fisher 123 Maple St. Hollywood   Return of the Jedi    1985

C. Fisher 5 Locust Ln.   Malibu     Return of the Jedi    1985
Definition of MVD
   A multivalued dependency (MVD) is that: for
    a relation R, if the values for one set of
    attributes are fixed, the values in certain
    other attributes are independent of the values
    of all the other attributes in the relation.
   A1A2…An  B1B2…Bm: we hold particular
    values for A’s, the values for B’s are
    independent of the values for the attributes
    which are not A’s or B’s.
Another definition
   For each pair of tuples t and u of relation R
    that agree on all the A’s, we can find in R
    some tuple v that agrees:
       With both t and u on the A’s
       With t on the B’s
       With u on all attributes of R that are not among
        the A’s or B’s.
   Note: t and u can be interchanged
     Example
          name  street city

Name        Street          city        Title                 Year

C. Fisher   123 Maple St.   Hollywood   Star Wars             1977

C. Fisher   5 Locust Ln.    Malibu      Empire Strikes Back   1980
MVD rules
   Trivial MVD’s:
    in a relation R, A1A2…An  B1B2…Bm
    holds if B1B2…Bm ⊆ A1A2…An
   Transitive rule:
    if A1A2…An  B1B2…Bm and B1B2…Bm
     C1C2…Ck, then we have A1A2…An 
    C1C2…Ck.
Attention: any C’s are also A’s may be deleted from the
   right side.
Splitting Doesn’t Hold
   Like FD’s, we cannot generally split the left
    side of an MVD.
   But unlike FD’s, we cannot split the right side
    either --- sometimes you have to leave
    several attributes on the right side.
Example
   name  street city
    we can not expect name  street
    or name  city
   Every FD is an MVD.
    if A1A2…An  B1B2…Bm, then
    A1A2…An  B1B2…Bm.
Complementation rule
   If A1A2…An  B1B2…Bm is a MVD for
    relation R, then R also satisfies A1A2…An
     C1C2…Ck, where the C’s are all
    attributes of R not among the A’s and B’s.
   Example:
    name  street city,
    then name  title year as well.
Fourth Normal Form
   The redundancy that comes from MVD’s is
    not removable by putting the database
    schema in BCNF.
   There is a stronger normal form, called 4NF,
    that (intuitively) treats MVD’s as FD’s when it
    comes to decomposition, all “nontrivial”
    MVD’s are eliminated.
Nontrivial MVD
   A MVD A1A2…An  B1B2…Bm for a
    relation R is nontrivial if
       None of the B’s is among A’s.
       Not all the attributes of R are among the A’s and
        B’s.
4NF definition
   A 4NF condition is essentially the BCNF
    condition, but apply to MVD’s instead of FD’s.
   A relation R is in 4NF if whenever:
    A1A2…An  B1B2…Bm
    is a nontrivial MVD, {A1, A2, …, An} is a
    superkey.
       Note that the definition of “superkey” still depends
        on FD’s only.
    BCNF Versus 4NF
   Remember that every FD X ->Y is also an
    MVD, X ->->Y.
   Thus, if R is in 4NF, it is certainly in BCNF.
       Because any BCNF violation is a 4NF violation.
   But R could be in BCNF and not 4NF,
    because MVD’s are “invisible” to BCNF.
   Example
Name        Street          city        Title                 Year

C. Fisher   123 Maple St.   Hollywood   Star Wars             1977
C. Fisher   5 Locust Ln.    Malibu      Star Wars             1977
C. Fisher   123 Maple St.   Hollywood   Empire Strikes Back   1980
C. Fisher   5 Locust Ln.    Malibu      Empire Strikes Back   1980
C. Fisher   123 Maple St.   Hollywood   Return of the Jedi    1985
C. Fisher   5 Locust Ln.    Malibu      Return of the Jedi    1985

                     name  street city
Decomposition to 4NF
   Similar to BCNF decomposition
   If we find a 4NF violation, A1A2…An 
    B1B2…Bm where {A1, A2, …, An} is not a
    superkey. Then, we can break the schema of
    R into two schemas:
       The A’s and the B’s
       The A’s and all attributes of R that are not among
        the A’s or B’s.
Example
   R(name, street, city, title, year)
    name  street city is a nontrivial MVD
    which is a 4NF violation.
    decompose R into two:
    {name, street, city} and
    {name, title, year}
Relationships among normal forms


            3NF
            BCNF

            4NF
    Properties of normal forms
Property                            3NF   BCNF   4NF

Eliminates redundancy due to FD’s   No    Yes    Yes

Eliminates redundancy due to        No    No     Yes
MVD’s
Preserves FD’s                      Yes   No     No

Preserves MVD’s                     No    No     No
Exercise
   R(A, B, C) with a MVD A  B. We currently
    have (a, b1, c1), (a, b2, c2) and (a, b3, c3).
    What other tuples must also be in R?
   Answer:
    Each of the tuples (a,b1,c2), (a,b1,c3),
    (a,b2,c1), (a,b2,c3), (a,b3,c1), and (a,b3,c2)
    are also in R.
Exercise
   Record for each person the name, SSN, and
    birthday. For each child of the person, the
    name, SSN and birthday, and for each auto
    the person owns, its serial number and
    maker. Thus, the relation has:
    (n, s, b, cn, cs, cb, as, am).
    a)   FD’s and MVD’s
    b)   Suggest a decomposition into 4NF.
Answer
   a) The dependencies are summarized below:
    ssNo -> name birthdate
    childSSNo -> childName childBirthdate
    autoSerialNo -> autoMake
    ssNo ->-> childSSNo childName childBirthdate
    ssNo ->-> autoSerialNo autoMake
Answer (Cont.)
   b)
    {ssNo, name, birthdate}
    {ssNo, childSSNo}
    {childSSNo, childName childBirthdate}
    {ssNo, autoSerialNo}
    {autoSerialNo, autoMake}
Exercise
   R(A, B, C, D) with MVD’s A  B and A 
    C.
       Find all the 4NF violations
       Decompose to 4NF.
   Answer:
    The final set of relations are AB, AC, and AD
Exercise
   R(A, B, C, D) with MVD’s AB  C and FD B
     D.
       Find all the 4NF violations
       Decompose to 4NF.
   Answer:
    The final set of relations are ABC, and BD
     Exercise
   Give example relations that the following MVD’s do not
    hold.
    a) if A  BC, then A  B.
   Answer:
    Consider a relation R with schema ABCD and the instance
    with four tuples abcd, abcd', ab'c'd, and ab'c'd'. This
    instance satisfies the MVD A ->-> BC. However, it does
    not satisfy A ->-> B. For example, if it did satisfy A ->->
    B, then because the instance contains the tuples abcd
    and ab'c'd, we would expect it to contain abc'd and ab'cd,
    neither of which is in the instance.