FD

Document Sample
FD Powered By Docstoc
					      Design Theory of Relational
              Databases
Announcement (reminder): The 2nd class test questions
 will be given out (&appear on the web) before
 Thursday this week. The solution will be due Monday
 14th 5pm. [There will be an additional bonus question
 too, for extra credit. Solving the bonus will require
 some self-study.]
This lecture:
     –   Functional Dependencies
Next lectures:
     –   Normalisation
     –   Various properties of decompositions
       Link from previous lecture(s)
We looked at conceptual design
  Via ER diagrams
  & their translation into actual tables
The result of this is a very good starting point! :-)
But:
  The resulting tables most likely will be not yet in the
   optimal shape to represent real-world data
  The remaining problem: redundancies.
           •   Can lead to inefficiency and/or potential errors
 Technique to solve the problem of
          redundancies
- check each table for for Functional
  Dependencies [we learn how to do this today]
- once these found, there is a fairly mechanical
  procedure (called normalisation) to optimise the
  tables [in the next lecture]
    Functional Dependencies (FDs)
Definition:
Let S=(A,B,C,D,...) a relational schema of table T
Let U be a sub-collection of attributes of S
Let X be a single attribute of S
We say that the U -->X holds in table T if the
 following is true for all pairs of tuples t1 & t2 of T:
  If t1 and t2 have the same value for every attribute in U,
  then they have the same value for the attribute X.
U → X means:
   'the attribute set U determines the attribute X'
 or
   'the attribute X depends on the attribute set U'
                                  Example
From the 'lecturing' table:
                                     FDs are quite visible.
  cid | sid | year | numbers
 -------+-----+------+---------
                                     Can you spot them?
 6995 | 22 | 2002 | 87
 6995 | 41 | 2002 | 87
 6995 | 57 | 2002 | 87
 6995 | 76 | 2002 | 87
 6995 | 22 | 2003 | 80
 6995 | 89 | 2003 | 80
 6995 | 57 | 2003 | 80
 6995 | 76 | 2003 | 80
 8762 | 38 | 2002 | 251
 8762 | 81 | 2002 | 251
 8762 | 85 | 2002 | 251
11580 | 14 | 2002 | 182
11580 | 83 | 2002 | 182
11580 | 86 | 2002 | 182
(...)
                                  Example
From the 'lecturing' table:
                                     FDs are quite visible.
  cid | sid | year | numbers
 -------+-----+------+---------
                                     Can you spot them?
 6995 | 22 | 2002 | 87
 6995 | 41 | 2002 | 87
 6995 | 57 | 2002 | 87               cid,year → numbers
 6995 | 76 | 2002 | 87
 6995 | 22 | 2003 | 80
 6995 | 89 | 2003 | 80
 6995 | 57 | 2003 | 80
                                     cid → numbers ???
 6995 | 76 | 2003 | 80
 8762 | 38 | 2002 | 251
                                     year → numbers ???
 8762 | 81 | 2002 | 251              sid → numbers ???
 8762 | 85 | 2002 | 251
11580 | 14 | 2002 | 182
11580 | 83 | 2002 | 182
11580 | 86 | 2002 | 182
(...)
               FDs are constraints
From the real-world meaning of the 'lecturing' table, the
  FD that we found (i.e. cid,year → numbers) is not
  accidental.
Any collection of rows that will ever make up the “lecturing”
  table, will necessarily satisfy this dependency.(why? – a
  given course in a given year will always have a fixed
  number of students. And this number does not depend on
  who teaches the course.)
So, if the lecturing table is ever modified s.t. The constraint is
  not satisfied, then the integrity of the table is violated.
Setting up this constraint at the creation of the database is a
  bit tricky. We will see a better solution will be to decompose
  such tables into smaller ones.
  Another example of FD: funny but
               true
Primary keys give FDs
  Every attribute of the table is functionally dependent
   on the primary key!
  How so?
  Another example of FD: funny but
               true
Primary keys give FDs
  Every attribute of the table is functionally dependent
   on the primary key!
  How so?
           •   By definition, if two rows were to have the same
               value for the primary key, then they will have the
               same value for all other attributes.
                  –   But no two distinct rows can ever have the same
                      value for the primary key ==> so the FD is
                      automatically satisfied.
              Amstrong's axioms
Once some FDs have been found to hold true then other
 FDs can be inferred from them by the following rules
 (called Amstrong's axioms):
1. Trivial dependencies (also called Reflexivity):
If U → anything, and X is subset of U
   then U → X.
2. Augmentation: If U → X, then UZ → XZ.
3. Transitivity: If U → X and X → Y, then U → Y.
It can be shown that these rules are sufficient to derive
   all FD that hold as a consequence of postulating some.
            Checking if a FD holds
It can be shown that Amstrong's axioms are sufficient to derive all
    FD that hold as a consequence of postulating some.
But finding FDs this way requires a time-consuming algorithm
  (runs in exponential time).
There is a simpler method for the following special case:
   Problem: Given a set F of FDs, check if a particular FD, say U
     → X follows from it.
   Solution: we successively augment U with attributes that
    functionally depend on U. (We call the usugmented version
    of U the closure of U). The new U will contain all attributes
    which depend on the original U. So if X is among them then
    U → X is true otherwise it is false.
   The problem with unnormalised
              tables
Suppose:
S(A,B,C,D,...)
B→C         //whenever B are the same C are too


What's wrong with this?
    The problem with unnormalised
               tables
Suppose: S(A,B,C,D,...);     B→C
What's wrong with this?
Case 1: no two records will ever agree on B
==> then, nothing's wrong. [eg. when B is primary key]
Case 2: several problems:
   * aggregation anomaly: if C is numeric
   * insertion anomaly
   * update anomaly
   * deletion anomaly
Suppose: S(A,B,C,D,...);    B→C


  * aggregation anomaly: if C is numeric
  (e.g. numbers in lecturing),
  then GROUP BY on it can give wrong results.
  → Try to compute the total number of course registrations
   for a given year.
Suppose: S(A,B,C,D,...);    B→C


  * insertion anomaly:
  Whenever entering a new record, we would need to check
  not to violate the FD. That is, check if another record
    exists with the same B value, then check that the new
    record agrees with it on its C value.
Suppose: S(A,B,C,D,...);     B→C


  * update anomaly:
  Whenever we need to update the C info in the table, we
   must replace all occurrences of that particular value.


  * deletion anomaly:
  Deleting the last record with a specific B value causes the
   corresponding C value to be lost too.
Example: S= items held in a shop
supplier_name → suppliers_phone_number
* insertion anomaly: when we enter a new record that
   has a new phone number of the supplier (all old items
   still hold the old number!)
* update anomaly: when we need to change the
   suppliers phone number
*delete anomaly: when we delete the last item which was
  usually ordered from a certain supplier, the info about
  their contact number is lost too
                      Normalisation
Solution: split the table
S=(A,B,C,D,...); which has U → X.
Then:
* V:= find all other attributes outside U that depend on U [using the
   closure algorithm]
* split into: UV & notV.
In previous example:
   {supplier_name, suppl_phone, and all details of supplier} and
   {the rest of attributes, supplier_name}
* Continue with the new tables. [NB these may still have FDs that
   are not a consequence of U → X that we just dealt with!]
    Properties of decompositions
- Losslessness property
- Redundancy reducing property
- Dependency preserving property


A case study.
          Lossless decompositions
= no loss of information
Proposition. The decomposition made in the process of
  normalisation is always lossless.
Why?
 - easy to see that we don't lose any row, but in general joining
 the component tables we might give us more rows than we had
 originally.
          Lossless decompositions
= no loss of information
Proposition. The decomposition made in the process of
  normalisation is always lossless.
Why?
 - easy to see that we don't lose any row, but in general joining
 the component tables we might give us more rows than we had
 originally. To see why this will not happen in case of
 decompositions involved in normalisation, use:
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped into
  3 non-empty subsets V,U,W. The decomposition of S into
  S1=VU & S2=UW is lossless provided that at least one of U →
  V or U → W holds on (S1 join S2).
Using this and noting that we must have had U → V or U → W on
  S, which was exactly the reason to decompose it.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
  into 3 non-empty subsets V,U,W. The decomposition of S
  into S1=VU & S2=UW is lossless provided that at least one
  of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
  into 3 non-empty subsets V,U,W. The decomposition of S
  into S1=VU & S2=UW is lossless provided that at least one
  of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
  another match in S2 with which the join gives a line in S.
  And likewise a row of S2 must have another match in S1 with which
  the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
     & the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
  into 3 non-empty subsets V,U,W. The decomposition of S
  into S1=VU & S2=UW is lossless provided that at least one
  of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
  another match in S2 with which the join gives a line in S.
  And likewise a row of S2 must have another match in S1 with which
  the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
     & the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
But then S1 join S2 gives us not only (x(V),x(U),y(W)), but 3 more tuples:
  (x(V),x(U),x(W)), (y(V),y(U),y(W)), and (y(V),y(U),x(W)).
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
  into 3 non-empty subsets V,U,W. The decomposition of S
  into S1=VU & S2=UW is lossless provided that at least one
  of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
  another match in S2 with which the join gives a line in S.
  And likewise a row of S2 must have another match in S1 with which
  the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
     & the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
But then S1 join S2 gives us not only (x(V),x(U),y(W)), but 3 more tuples:
  (x(V),x(U),x(W)), (y(V),y(U),y(W)), and (y(V),y(U),x(W)).
==> neither U → V nor U → W holds on (S1 join S2).
    Redundancy reducing property
Redundancy = repeated information
We want to remove repeated info which arises by a non-trivial
 FD.
Definition: The decomposition of S into S1 and S2 is
  redundancy reducing if either S1 or S2 is free of any key
  for the original S, and they have fewer rows than S.
NB. Since we only consider non-trivial dependencies, if it was
 the case that both U → V and U → W holds on S then U
 would be key for S and so S wouldn't need fixing.
Hence, redundancy reducing property is also
 guaranteed for decompositions that arise in
 correctly carried out normalisation processes.
   Dependency preserving property
Remember, FDs = constraints that we observe during analysis of
  the real-world situation.
We want to guarantee and check FDs,
 but we want this via the keys (of decomposed schemas)
Usually the decomposition done in the normalisation process
  achieves this, except a problem case [fortunately arises rarely!]:
   supplier → supp_address
   contact_person → supp_address
   Here, if we take {contact_person, supp_address} in a
    separate table, the first FD is lost. Likewise, if we take
    {supplier, supp_address} into a separate table, the
    second FD is lost. No satisfactory solution in this case.
Case study: Achim's example tables
So how do we deal with the FD?
   cid,year → numbers
Applying what we learned, we decompose the
  lecturing(cid,sid,year,numbers) schema into:
 course_instances(cid,year,numbers)
 taught_by(cid,year,sid).
Star at this long enough to work out what ER diagram
  corresponds to it?...
...finally, it is good practice to revise the ER so that whenever
    the db needs extending you start from the best ER and ot
    have to normalise again for the same FDs.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:35
posted:7/23/2011
language:English
pages:29