Docstoc

Lecture07

Document Sample
Lecture07 Powered By Docstoc
					C20.0046: Database
Management Systems
Lecture #7
   Matthew P. Johnson
   Stern School of Business, NYU
   Spring, 2004


          M.P. Johnson, DBMS, Stern/NYU, Sp2004   1
Agenda
    Last time: Normalization
    Homework 1 due now
    Project part 2 is up, due on the 19th (Thurs.)
    This time:
1.   Finish BCNF
2.   3NF
3.   4NF
4.   Relational Algebra…

                   M.P. Johnson, DBMS, Stern/NYU, Sp2004   2
BCNF Review
   Q: What’s required for BCNF?

   Q: What’s the slogan for BCNF?

   Q: Who are B & C?

   Q: What are the two types of violations?



                  M.P. Johnson, DBMS, Stern/NYU, Sp2004   3
BCNF Review
   Q: How do we fix a non-BCNF relation?

   Q: If AsBs violates BCNF, what do we do?
       Q: In this case, could the decomposition be lossy?


   Q: Under what circumstances could a
    decomposition be lossy?

   Q: How do we combine two relations?

                      M.P. Johnson, DBMS, Stern/NYU, Sp2004   4
Decomposition algorithm example
   R(N,O,R,P)        F = {N  O, O  R, R  N}
                    Name                 Office                Residence   Phone
                    George               Pres.                 WH          202-…
                    George               Pres.                 WH          486-…
                    Dick                 VP                    NO          202-…
   Key: N,P      Dick    VP     NO                                        307-…

   Violations of BCNF: N  O, OR, N OR
       which kinds of violations are these?
   Pick N  OR          (on board)
   Can we rejoin? (on board)
   What happens if we pick N  O instead?
   Can we rejoin? (on board)
                           M.P. Johnson, DBMS, Stern/NYU, Sp2004                   5
Lossless BCNF decomposition
   Consider simple relation: R(A,B,C)
   Only FD: A  B (assume C!A) Key: A,C
       Diff vars from text! Also goes through if assumption is false
   BCNF violation (which kind?): no key on the left
   Thus: Decomposition to BCNF:
   Create R1(A,B) and R2(A,C)
   Could this be lossy?
   We will join R1 and R2 on A to find out

    Q: Since C ! A, what kind                    Q: If C  A, then what
    of bad FD do we have?                         kind do we have?


                         M.P. Johnson, DBMS, Stern/NYU, Sp2004             6
Lossless BCNF decomposition
   Suppose R contains (b,a,c) and (b’,a,c’)
   In projection onto (B,A):
       (b,a,c)  (b,a), (b’,a,c’)  (b’,a)
   In projection onto (A,C):
       (b,a,c)  (a,c), (b’,a,c’)  (a,c’)
   In joining, (b’,a), (a,c)  (b’,a,c)
   Q: Is/must/can this be correct?
   A: Yes! A  B, so b = b’
   So this was lossless
   We assumed C!A, but argument also goes
    through when CA
   Moral: BCNF decomp alg is always lossless
                          M.P. Johnson, DBMS, Stern/NYU, Sp2004   7
BCNF summary
   BCNF decomposition is lossless
       Can reproduce original by joining
   Saw last time: Every 2-attribute relation is in
    BCNF
   Final set of decomposed relations might be
    different depending on
       Order of bad FDs chosen
   Saw last time: But all results will be in BCNF



                     M.P. Johnson, DBMS, Stern/NYU, Sp2004   8
A problem with BCNF
   Relation: R(Title, Theater, Neighboorhood)
   FDs:
     Title,N’hood  Theater

           Assume movie can’t play twice in same neighborhood
     Theater  N’hood
   Keys:
     {Title, N’hood}
                       Title                             Theater   N’hood
     {Theater, Title}
                                 City of God Angelica              Village
                                 Fog of War Angelica               Village



                         M.P. Johnson, DBMS, Stern/NYU, Sp2004               9
A problem with BCNF
   BCNF violation: Theater  N’hood
   Decompose:
       {Theater, N’Hood}
       {Theater, Title}
   Resulting relations:
R1                                       R2
Theater         N’hood                     Theater           Title
Angelica        Village                    Angelica          City of God
                                           Angelica          Fog of War


                     M.P. Johnson, DBMS, Stern/NYU, Sp2004                 10
Problem - continued
   Suppose we add new rows to R1 and R2:
R1                                           R2
 Their
Theater     join:   N’hood                     Theater                Title
Angelica            Village                    Angelica               City of God
Film Forum          Village                    Angelica               Fog of War
                                               Film Forum             City of God
     (R’)
      Theater            N’hood                          Title
      Angelica           Village                         City of God
      Angelica           Village                         Fog of War
      Film Forum         Village                         City of God

A and B could not enforce FD Title,N’hood  Theater
                         M.P. Johnson, DBMS, Stern/NYU, Sp2004                      11
Third normal form: motivation
   There are some situations in which
       BCNF is not dependency-preserving, and
       Efficient checking for FD violation on updates is
        important
   In these cases BCNF is too severe a req.
   Solution: define a weaker normal form, called
    Third Normal Form
       in which FDs can be checked on individual relations
        without performing a join (no inter-relational FDs)
       to which relations can be converted, preserving both
        data and FDs

                       M.P. Johnson, DBMS, Stern/NYU, Sp2004   12
Third Normal Form
   BCNF decomposition is not dependency-preserving!
   We now define the (weaker) Third Normal Form
       Turns out: this example was already in 3NF

        A relation R is in 3rd normal form if :

        For every nontrivial dependency A1, A2, ..., An  B
        for R, {A1, A2, ..., An } is a super-key for R,
        or B is part of a key, i.e., B is prime

Tradeoff:
       BCNF = no FD anomalies, but may lose some FDs
       3NF = keeps all FDs, but may have some anomalies

                         M.P. Johnson, DBMS, Stern/NYU, Sp2004   13
BCNF: vices and virtues
   Be clear on the problem just described v. the
    arg. that BCNF decomp is lossless
   BCNF decomp does not lose data
       Resulting relations can be rejoined to obtain the
        original
   But: it can can lose dependencies
       After decomp, possible to add rows whose
        corresponding rows would be illegal in (rejoined)
        original



                      M.P. Johnson, DBMS, Stern/NYU, Sp2004   14
Recap: goals of normalization
    When we decompose a relation R with FDs F into
     R1..Rn we want:
1.   lossless-join decomposition – no data lost
2.   no/little redundancy: the relations Ri should be in
     either BCNF or at least 3NF
3.   Dependency preservation: if Fi be the set of
     dependencies in F+ that include only attributes in Ri:
        F is the “sum” of the FDs of the new relations
        (F1  F2  F3  …  Fn)+ = F+
        Otherwise checking updates for violation of FDs
         may require computing joins, which is expensive
                     M.P. Johnson, DBMS, Stern/NYU, Sp2004   15
Dependency preservation
   Saw that last req. didn’t hold in move-theater
    example
   Did it hold in R(N,O,R,P) example?
    (on board)




                  M.P. Johnson, DBMS, Stern/NYU, Sp2004   16
Testing for 3NF
   For each dependency X  Y, use attribute closure
    to check if X is a superkey
   If X is not a superkey, verify that each attribute in Y
    is prime
       This test is rather more expensive, since it involves finding
        candidate keys
       Testing for 3NF is NP-complete (in what?)
       Interestingly, decomposition into 3NF can be done in
        polynomial time
        Testing for 3NF is harder than decomposing into 3NF!
   Optimization: need to check only FDs in F, need not
    check all FDs in F+ (why?)

                         M.P. Johnson, DBMS, Stern/NYU, Sp2004          17
3NF Example
   R = (J, K, L)
   F = (JK  L, L  K)
   Two candidate keys: JK and JL
   R is in 3NF
       JK  L        JK is a superkey
       LK           K is prime
   BCNF decomposition yields
   R1 = (L,K), R2 = (L,J)
       testing for JK  L requires a join
   There is some redundancy in R

                      M.P. Johnson, DBMS, Stern/NYU, Sp2004   18
BCNF and 3NF Comparison
   Example of problems due to redundancy in 3NF
       R = (J, K, L)                     J                 K    L
       F = (JK  L, L  K)               j1                k1   l1
                                          j2                k1   l1
                                          j3                k1   l1
                                          NULL              k2   l2
   A schema that is in 3NF but not BCNF has the
    problems of:
       redundancy (e.g., the relationship between l1 and k1)
       need to use null values (if allowed!), e.g. to represent the
        relationship between l2 and k2 when there is no
        corresponding value for attribute J


                         M.P. Johnson, DBMS, Stern/NYU, Sp2004         19
Comparison of BCNF and 3NF
   It is always possible to decompose a relation
    into relations in 3NF such that:
       the decomposition is lossless
       the dependencies are preserved

   It is always possible to decompose a relation
    into relations in BCNF such that:
       the decomposition is lossless
       but it may not be possible to preserve
        dependencies
       But may eliminate more redundancy

                      M.P. Johnson, DBMS, Stern/NYU, Sp2004   20
The Normal Forms (so far)
   1NF: every attribute has an atomic value
   2NF: 1NF and no partial dependencies
   3NF: for each FD X  Y either
        it is trivial, or
        X is a superkey, or
        Y is a part of some key
   BCNF:
        3NF and third 3NF option disallowed
        I.e, 2NF and no transitive dependencies

                    M.P. Johnson, DBMS, Stern/NYU, Sp2004   21
Distinguishing examples
   1NF but not 2NF: R(Name, SSN ,Mailing-
       address,Phone)
       Key: SSN,Phone
       Partial: ssn  name, address
   2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr)
       Key: Title,Year
       Transitive: studio  president
   3NF but not BCNF: R(Title, Theater, N’hood)
       Title,N’hood  Theater
       Prime-on-right: Theater  N’hood




                         M.P. Johnson, DBMS, Stern/NYU, Sp2004   22
Design Goals
   Goal for a relational database design is:
       No redundancy
       Lossless Join
       Dependency Preservation
   If we cannot achieve this, we accept one of
       dependency loss
       use of more expensive inter-relational methods to preserve
        dependencies
       data redundancy due to use of 3NF
   Interesting: SQL does not provide a direct way of
    specifying FDs other than superkeys
       can specify FDs using assertions, but they are expensive to test


                          M.P. Johnson, DBMS, Stern/NYU, Sp2004            23
3NF
   3NF means we may have anomalies
   Example: TEACH(student, teacher, subject)
       student, subject  teacher (students not allowed in the
        same subject with two teachers)
       teacher  subject (each teacher teaches one subject)
       Subject is prime, so this is 3NF
   But we have anomalies:
       Insertion: cannot insert a teacher until we have a
        student taking his subject
   If we convert to BCNF, we lost student,
    subject  teacher
                        M.P. Johnson, DBMS, Stern/NYU, Sp2004     24
BCNF and over-normalization
   What is the problem?
   Schema overload – trying to capture two meanings:
       1) subject X can be taught by teacher Y
       2) student Z takes subject W from teacher V
   What to do?
   3NF has anomalies, normalizing to BCNF loses FDs
   One soln: keep the 3NF TEACH and another
    (BCNF) relation SUBJECT-TAUGHT (teacher,
    subject)
   Still (more!) redundancy, but no more insert and
    delete anomalies

                        M.P. Johnson, DBMS, Stern/NYU, Sp2004   25
New topic: MVDs (3.7)
   Consider this relation
        People ~ their jobs ~ their residences
        Person-address/city: many-many
        Person-job: many-many
        Address/city-job: independent
Name        SSN   Jobs                         Streets                 Citys

Michael     123   Mayor                        111 East 60th Street    New York
Michael     123   Mayor                        222 Brompton Road       London
Michael     123   CEO                          111 East 60th Street    New York
Michael     123   CEO                          222 Brompton Road       London
Hilary      456   Senator                      333 Some Street         Chappaqua
Hilary      456   Senator                      444 Embassy Row         Washington
Hilary      456   First Lady                   333 Some Street         Chappaqua
Hilary      456   First Lady                   444 Embassy Row         Washington
Hilary      789   Lawyer                       333 Some Street         Chappaqua
Hilary      789   Lawyer                       444 Embassy Row
                               M.P. Johnson, DBMS, Stern/NYU, Sp2004
                                                                       Washington   26
Redundancy in BCNF
        Name      Streets                       Citys                       Jobs
        Michael   111 East 60th Street          New York                    Mayor
        Michael   222 Brompton Road             London                      Mayor
        Michael   111 East 60th Street          New York                    CEO
        Michael   222 Brompton Road             London                      CEO
        Hilary    333 Some Street               Chappaqua                   Senator
        Hilary    444 Embassy Row               Washington                  Senator
        Hilary    333 Some Street               Chappaqua                   First Lady
        Hilary    444 Embassy Row               Washington                  First Lady
        Hilary    333 Some Street               Chappaqua                   Lawyer
        Hilary    444 Embassy Row               Washington                  Lawyer

   Lots of redundancy!                                        Now what?
   Key? All fields                                            New concept, leading
         None determined by others!                            to another normal form:
   Non-trivial FDs? None!                                     Multivalued
    In BCNF? Yes!                                             dependencies
                                    M.P. Johnson, DBMS, Stern/NYU, Sp2004                27

				
DOCUMENT INFO