# PPT by garrickWilliams

VIEWS: 2,413 PAGES: 39

• pg 1
```									C20.0046: Database
Management Systems
Lecture #5
M.P. Johnson
Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   1
Agenda
    Last time: FDs
    This time:
1.   Anomalies
2.   Normalization: BCNF & 3NF
    Next time: SQL

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   2
Review: FDs
   Definition:
If two tuples agree on the attributes
A1, A2, …, An
then they must also agree on the attributes
B1, B2, …, Bm

   Notation: A1, A2, …, An  B1, B2, …, Bm
   Read as: Ai functionally determines Bj

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   3
Review: Combining FDs
If some FDs are satisfied, then
others are satisfied too

name  color
If all these FDs are true:                category  department
color, category  price

Then this FD also holds: name, category  price

Why?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008            4
Problem: find all FDs
   Given a relation instance and set of given FDs
   Find all FD’s satisfied by that instance

   Useful if we don’t get enough information from our
users: need to reverse engineer a data instance

   Q: How long does this take?
   A: Some time for each subset of atts
   Q: How many subsets?
   powerset
    exponential time in worst-case
   But can often be smarter…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   5
Closure Algorithm

Repeat:                                                       name  color
category  department
if  B1, …, Bn  C is a FD and                               color, category  price
B1, …, Bn are all in X
{name, category}+ =
{name, category, color,
department, price}
until X didn’t change

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                 6
Closure alg e.g.

Example:            A, B  C
A, D  B
B     D

Compute X+, for every set X (AB is shorthand for {A,B}):

A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD, BC+ = BC, BD+ = BD, CD+ = CD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute–why?)
BCD+ = BCD, ABCD+ = ABCD

What are the keys?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   7
Closure alg e.g.

In class:

R(A,B,C,D,E,F)                         A, B             C
A, D             E
B                D
A, F             B

Compute {A,B}+       X = {A, B,                                }

Compute {A, F}+      X = {A, F,                                }

What are the keys?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008       8
Closure alg e.g.
   Product(name, price, category, color)
name, category  price
category  color

FDs are:
Keys are:   {name, category}

   Enrollment(student, address, course, room, time)
room, time  course
student, course  room, time

FDs are:
Keys are:

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   9
Next topic: Anomalies
   Identify anomalies in existing schemata

   Decomposition by projection

   BCNF

   Lossy v. lossless

   Third Normal Form
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   10
Types of anomalies
   Redundancy
   Repeat info unnecessarily in several tuples

   Update anomalies:
   Change info in one tuple but not in another

   Deletion anomalies:
   Delete some values & lose other values too

   Insert anomalies:
   Inserting row means having to insert other, separate info /
null-ing it out

M.P. Johnson, DBMS, Stern/NYU, Spring 2008     11
Examples of anomalies
Michael           123         NY                                     212-111-1111
Michael           123         NY                                     917-111-1111
Hilary            456         DC                                     202-222-2222
Hilary            456         DC                                     914-222-2222
Bill              789         Chappaqua                              914-222-2222
Bill              789         Chappaqua                              212-333-3333

SSN  Name, Mailing-address                                     SSN  Phone
     Update anomaly: Bill moves
     Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill!
     Insert anom: can’t insert someone without a (non-null) phone
     Underlying cause: SSN-phone is many-many
     Effect: partial dependency ssn  name, maddress,
    Whereas key = {ssn,phone}
M.P. Johnson, DBMS, Stern/NYU, Spring 2008                  12
Decomposition by projection
   Soln: replace anomalous R with projections of R onto
two subsets of attributes
   Projection: an operation in Relational Algebra
   Corresponds to SELECT command in SQL

   Projecting R onto attributes (A1,…,An) means
removing all other attributes
   Result of projection is another relation
   Yields tuples whose fields are A1,…,An
   Resulting duplicates ignored

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   13
Projection for decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm)                   R2(A1, ..., An, C1, ..., Cp)

R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
A1, ..., An  B1, ..., Bm  C1, ..., Cp = all attributes
R1 and R2 may (/not) be reassembled to produce original R

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                 14
Decomposition example
Break           Michael     123        NY                                 212-111-1111
the             Michael     123        NY                                 917-111-1111
relation        Hilary      456        DC                                 202-222-2222

into            Hilary      456        DC                                 914-222-2222
Bill        789        Chappaqua                          914-222-2222
two:
Bill        789        Chappaqua                          212-333-3333

Michael            123         NY                                  123              212-111-1111
Hilary             456         DC                                  123              917-111-1111
Bill               789         Chappaqua                           456              202-222-2222

      The anomalies are gone                                    456              914-222-2222

   No more redundant data                                789              914-222-2222

   Easy to for Bill to move                              789              212-333-3333

   Okay for Bill to lose all phones
M.P. Johnson, DBMS, Stern/NYU, Spring 2008                          15
Thus: high-level strategy
name

price                               name        ssn

Relational Model:
plus FD’s

Normalization:
Eliminates anomalies

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                     16
Using FDs to produce good schemas
2.       Define FDs (and keys) for them based on real
world
3.       Transform your relations to “normal form”
(normalize them)
    Do this using “decomposition”

        Intuitively, good design means
    No anomalies
    Can reconstruct all (and only the) original information

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   17
Decomposition terminology
   Projection: eliminating certain atts from relation
   Decomposition: separating a relation into two by
projection
   Join: (re)assembling two relations
   Whenever a row from R1 and a row from R2 have the same
value for some atts A, join together to form a row of R3

   If exactly the original rows are reproduced by joining
the relations, then the decomposition was lossless
   We join on the attributes R1 and R2 have in common (As)
   If it can’t, the decomposition was lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   18
Lossless Decompositions
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose

R1(A,B)          R2(A,C)

Recover

R’(A,B,C) should be the same as
R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   19
Lossless decomposition
   Sometimes the same set of data is reproduced:
Name             Price         Category
Word             100              WP
Oracle            1000             DB
Access            100              DB

Name      Price                                     Name     Category
Word      100                                      Word       WP
Oracle     1000                                     Oracle     DB
Access     100                                      Access     DB

   (Word, 100) + (Word, WP)  (Word, 100, WP)
   (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB)
   (Access, 100) + (Access, DB)  (Access, 100, DB)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                  20
Lossy decomposition
   Sometimes it’s not:
Name            Price         Category
Word             100             WP                  What’s
Oracle          1000              DB
wrong?
Access           100              DB

Category     Name                                            Category   Price

WP        Word                                               WP      100

DB        Oracle                                             DB      1000

DB       Access                                              DB      100

   (Word, WP) + (100, WP)  (Word, 100, WP)
   (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB)
   (Oracle, DB) + (100, DB)  (Oracle, 100, DB)
   (Access, DB) + (1000, DB)  (Access, 1000, DB)
   (Access, DB) + (100, DB)  (Access, 100, DB)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                   21
Ensuring lossless decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm)                   R2(A1, ..., An, C1, ..., Cp)

If A1, ..., An  B1, ..., Bm or A1, ..., An  C1, ..., Cp
Then the decomposition is lossless

Note: don’t need both
   Examples:
   name  price, so first decomposition was lossless
   category  name and category  price, and so second
decomposition was lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                 22
Quick lossless/lossy example

X           Y          Z
1           2          3
4           2          5

   At a glance: can we decompose into R1(Y,X), R2(Y,Z)?
   At a glance: can we decompose into R1(X,Y), R2(X,Z)?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   23
Next topic: Normal Forms
   First Normal Form = all attributes are atomic
   As opposed to set-valued
   Assumed all along

   Second Normal Form (2NF)

   Third Normal Form (3NF)

   Boyce Codd Normal Form (BCNF)

   Fourth Normal Form (4NF)

   Fifth Normal Form (5NF)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   24
BCNF definition
   A simple condition for removing anomalies from
relations:
A relation R is in BCNF if:
If As  Bs is a non-trivial dependency
in R , then As is a superkey for R

   I.e.: The left side must always contain a key
   I.e: If a set of attributes determines other attributes,
it must determine all the attributes

   Slogan: “In every FD, the left side is a superkey.”
M.P. Johnson, DBMS, Stern/NYU, Spring 2008   25
BCNF decomposition algorithm
Repeat
choose A1, …, Am  B1, …, Bn that violates the BNCF condition
//Heuristic: choose Bs as large as possible
split R into R1(A1, …, Am , B1, …, Bn) and R2(A1, …, Am , [others])
continue with both R1 and R2
Until no more violations

B’s             A’s          Others

R1                      R2
M.P. Johnson, DBMS, Stern/NYU, Spring 2008      26
Boyce-Codd Normal Form
   Name/phone example is not BCNF:
Michael     123    NY                                   212-111-1111
Michael     123    NY                                   917-111-1111

    {ssn,phone} is key
    FD: ssn  name,mailing-address holds
     Violates BCNF: ssn is not a superkey
   Its decomposition is BCNF
    Only superkeys  anything else
Michael            123          NY                          123                212-111-1111
123                917-111-1111

M.P. Johnson, DBMS, Stern/NYU, Spring 2008                         27
Design/BCNF example
   Consider situation:
   Entities: Emp(ssn,name,lot), Dept(id,name,budg)
   Relship: Works(E,D,since)
   Draw E/R

   New rule: in each dept, everyone parks in
same lot
   Translate to FD
   Normalize

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   28
BCNF Decomposition
   Larger example: multiple decompositions
   {Title, Year, Studio, President, Pres-Address}
   FDs:
   Title Year  Studio
   Studio  President
    Studio  President, Pres-Address
   No many-many this time
   Problem cause: transitive FDs:
   Title,year  studio  president

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   29
BCNF Decomposition
        Illegal: As  Bs, where As not a superkey
        Decompose: Studio  President, Pres-Address
     As = {studio}
     Cs = {title, year}
        Result:
2.    Movies(studio, title, year)
        Is (2) in BCNF? Is in (1) BCNF?
     Key: Studio
     Q: Does president  studio? If so, president is a key
     But if not, it violates BCNF

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   30
BCNF Decomposition
   Illegal: As  Bs, where As not a superkey
    Decompose: President  Pres-Address
   As = {president}
   Cs = {studio}
   {Studio, President}

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   31
Decomposition algorithm example
   R(N,O,R,P)    F = {N  O, O  R, R  N}
Name               Office                 Residence   Phone
George             Pres.                  WH          202-…
George             Pres.                  WH          486-…
Dick               VP                     NO          202-…
   Key: N,P      Dick    VP     NO                                    307-…

   Violations of BCNF: N  O, OR, N OR

   Pick N  OR (on board)
   Can we rejoin? (on board)
   What happens if we pick N  O instead?
   Can we rejoin? (on board)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008                 32
An issue with BCNF
   We could lose FDs

   Relation: R(Title, Theater, Neighboorhood)
   FDs:
 Title,N’hood  Theater

   Assume a movie shouldn’t play twice in same n’hood
 Theater  N’hood
   Keys:              Title        Theater                          N’hood
 {Title, N’hood}  Aviator      Angelica                         Village
 {Theater, Title} Life Aquatic Angelica                         Village

M.P. Johnson, DBMS, Stern/NYU, Spring 2008             33
Losing FDs
   BCNF violation: Theater  N’hood
   Decompose:
   {Theater, N’Hood}
   {Theater, Title}

   Resulting relations:
R1                                           R2
Theater          N’hood                       Theater              Title
Angelica         Village                      Angelica             Aviator
Angelica             Life Aquatic

M.P. Johnson, DBMS, Stern/NYU, Spring 2008              34
Losing FDs
   Suppose we add new rows to R1 and R2:

R1 Theater            N’hood                R2 Theater                     Title
Angelica           Village                         Angelica             Life Aquatic
Film Forum         Village                         Angelica             Aviator
Film Forum           Life Aquatic
R’
Theater            N’hood                          Title
Angelica           Village                         Life Aquatic
Angelica           Village                         Aviator
Film Forum         Village                         Life Aquatic
   Neither R1 nor R2 enforces FD Title,N’hood  Theater
M.P. Johnson, DBMS, Stern/NYU, Spring 2008                  35
Third normal form: motivation
   Sometimes
   BCNF is not dependency-preserving, and
   Efficient checking for FD violation on updates is important

   In these cases BCNF is too severe a req.
   “over-normalization”

   Solution: define a weaker normal form, 3NF
   FDs can be checked on individual relations without
performing a join (no inter-relational FDs)
   relations can be converted, preserving both data and FDs

M.P. Johnson, DBMS, Stern/NYU, Spring 2008    36
BCNF lossiness
   NB: BCNF decomp. is not data-lossy
   Results can be rejoined to obtain the exact original

   But: it can lose dependencies
   After decomp, now legal to add rows whose corresponding
rows would be illegal in (rejoined) original

   Data-lossy v. FD-lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   37
Third Normal Form
   Now define the (weaker) Third Normal Form
   Turns out: this example was already in 3NF

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An  B
for R, {A1, A2, ..., An } is a super-key for R,
or B is part of a key, i.e., B is prime

BCNF = no FD anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   38
Next week
   For Monday: read ch.5.1-2 (SQL)

   Proj1 due next Monday

M.P. Johnson, DBMS, Stern/NYU, Spring 2008   39

```
To top