# FD

Document Sample

```					      Design Theory of Relational
Databases
Announcement (reminder): The 2nd class test questions
will be given out (&appear on the web) before
Thursday this week. The solution will be due Monday
14th 5pm. [There will be an additional bonus question
too, for extra credit. Solving the bonus will require
some self-study.]
This lecture:
–   Functional Dependencies
Next lectures:
–   Normalisation
–   Various properties of decompositions
Link from previous lecture(s)
We looked at conceptual design
Via ER diagrams
& their translation into actual tables
The result of this is a very good starting point! :-)
But:
The resulting tables most likely will be not yet in the
optimal shape to represent real-world data
The remaining problem: redundancies.
•   Can lead to inefficiency and/or potential errors
Technique to solve the problem of
redundancies
- check each table for for Functional
Dependencies [we learn how to do this today]
- once these found, there is a fairly mechanical
procedure (called normalisation) to optimise the
tables [in the next lecture]
Functional Dependencies (FDs)
Definition:
Let S=(A,B,C,D,...) a relational schema of table T
Let U be a sub-collection of attributes of S
Let X be a single attribute of S
We say that the U -->X holds in table T if the
following is true for all pairs of tuples t1 & t2 of T:
If t1 and t2 have the same value for every attribute in U,
then they have the same value for the attribute X.
U → X means:
'the attribute set U determines the attribute X'
or
'the attribute X depends on the attribute set U'
Example
From the 'lecturing' table:
FDs are quite visible.
cid | sid | year | numbers
-------+-----+------+---------
Can you spot them?
6995 | 22 | 2002 | 87
6995 | 41 | 2002 | 87
6995 | 57 | 2002 | 87
6995 | 76 | 2002 | 87
6995 | 22 | 2003 | 80
6995 | 89 | 2003 | 80
6995 | 57 | 2003 | 80
6995 | 76 | 2003 | 80
8762 | 38 | 2002 | 251
8762 | 81 | 2002 | 251
8762 | 85 | 2002 | 251
11580 | 14 | 2002 | 182
11580 | 83 | 2002 | 182
11580 | 86 | 2002 | 182
(...)
Example
From the 'lecturing' table:
FDs are quite visible.
cid | sid | year | numbers
-------+-----+------+---------
Can you spot them?
6995 | 22 | 2002 | 87
6995 | 41 | 2002 | 87
6995 | 57 | 2002 | 87               cid,year → numbers
6995 | 76 | 2002 | 87
6995 | 22 | 2003 | 80
6995 | 89 | 2003 | 80
6995 | 57 | 2003 | 80
cid → numbers ???
6995 | 76 | 2003 | 80
8762 | 38 | 2002 | 251
year → numbers ???
8762 | 81 | 2002 | 251              sid → numbers ???
8762 | 85 | 2002 | 251
11580 | 14 | 2002 | 182
11580 | 83 | 2002 | 182
11580 | 86 | 2002 | 182
(...)
FDs are constraints
From the real-world meaning of the 'lecturing' table, the
FD that we found (i.e. cid,year → numbers) is not
accidental.
Any collection of rows that will ever make up the “lecturing”
table, will necessarily satisfy this dependency.(why? – a
given course in a given year will always have a fixed
number of students. And this number does not depend on
who teaches the course.)
So, if the lecturing table is ever modified s.t. The constraint is
not satisfied, then the integrity of the table is violated.
Setting up this constraint at the creation of the database is a
bit tricky. We will see a better solution will be to decompose
such tables into smaller ones.
Another example of FD: funny but
true
Primary keys give FDs
Every attribute of the table is functionally dependent
on the primary key!
How so?
Another example of FD: funny but
true
Primary keys give FDs
Every attribute of the table is functionally dependent
on the primary key!
How so?
•   By definition, if two rows were to have the same
value for the primary key, then they will have the
same value for all other attributes.
–   But no two distinct rows can ever have the same
value for the primary key ==> so the FD is
automatically satisfied.
Amstrong's axioms
Once some FDs have been found to hold true then other
FDs can be inferred from them by the following rules
(called Amstrong's axioms):
1. Trivial dependencies (also called Reflexivity):
If U → anything, and X is subset of U
then U → X.
2. Augmentation: If U → X, then UZ → XZ.
3. Transitivity: If U → X and X → Y, then U → Y.
It can be shown that these rules are sufficient to derive
all FD that hold as a consequence of postulating some.
Checking if a FD holds
It can be shown that Amstrong's axioms are sufficient to derive all
FD that hold as a consequence of postulating some.
But finding FDs this way requires a time-consuming algorithm
(runs in exponential time).
There is a simpler method for the following special case:
Problem: Given a set F of FDs, check if a particular FD, say U
→ X follows from it.
Solution: we successively augment U with attributes that
functionally depend on U. (We call the usugmented version
of U the closure of U). The new U will contain all attributes
which depend on the original U. So if X is among them then
U → X is true otherwise it is false.
The problem with unnormalised
tables
Suppose:
S(A,B,C,D,...)
B→C         //whenever B are the same C are too

What's wrong with this?
The problem with unnormalised
tables
Suppose: S(A,B,C,D,...);     B→C
What's wrong with this?
Case 1: no two records will ever agree on B
==> then, nothing's wrong. [eg. when B is primary key]
Case 2: several problems:
* aggregation anomaly: if C is numeric
* insertion anomaly
* update anomaly
* deletion anomaly
Suppose: S(A,B,C,D,...);    B→C

* aggregation anomaly: if C is numeric
(e.g. numbers in lecturing),
then GROUP BY on it can give wrong results.
→ Try to compute the total number of course registrations
for a given year.
Suppose: S(A,B,C,D,...);    B→C

* insertion anomaly:
Whenever entering a new record, we would need to check
not to violate the FD. That is, check if another record
exists with the same B value, then check that the new
record agrees with it on its C value.
Suppose: S(A,B,C,D,...);     B→C

* update anomaly:
Whenever we need to update the C info in the table, we
must replace all occurrences of that particular value.

* deletion anomaly:
Deleting the last record with a specific B value causes the
corresponding C value to be lost too.
Example: S= items held in a shop
supplier_name → suppliers_phone_number
* insertion anomaly: when we enter a new record that
has a new phone number of the supplier (all old items
still hold the old number!)
* update anomaly: when we need to change the
suppliers phone number
*delete anomaly: when we delete the last item which was
usually ordered from a certain supplier, the info about
their contact number is lost too
Normalisation
Solution: split the table
S=(A,B,C,D,...); which has U → X.
Then:
* V:= find all other attributes outside U that depend on U [using the
closure algorithm]
* split into: UV & notV.
In previous example:
{supplier_name, suppl_phone, and all details of supplier} and
{the rest of attributes, supplier_name}
* Continue with the new tables. [NB these may still have FDs that
are not a consequence of U → X that we just dealt with!]
Properties of decompositions
- Losslessness property
- Redundancy reducing property
- Dependency preserving property

A case study.
Lossless decompositions
= no loss of information
Proposition. The decomposition made in the process of
normalisation is always lossless.
Why?
- easy to see that we don't lose any row, but in general joining
the component tables we might give us more rows than we had
originally.
Lossless decompositions
= no loss of information
Proposition. The decomposition made in the process of
normalisation is always lossless.
Why?
- easy to see that we don't lose any row, but in general joining
the component tables we might give us more rows than we had
originally. To see why this will not happen in case of
decompositions involved in normalisation, use:
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped into
3 non-empty subsets V,U,W. The decomposition of S into
S1=VU & S2=UW is lossless provided that at least one of U →
V or U → W holds on (S1 join S2).
Using this and noting that we must have had U → V or U → W on
S, which was exactly the reason to decompose it.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
into 3 non-empty subsets V,U,W. The decomposition of S
into S1=VU & S2=UW is lossless provided that at least one
of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
into 3 non-empty subsets V,U,W. The decomposition of S
into S1=VU & S2=UW is lossless provided that at least one
of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
another match in S2 with which the join gives a line in S.
And likewise a row of S2 must have another match in S1 with which
the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
& the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
into 3 non-empty subsets V,U,W. The decomposition of S
into S1=VU & S2=UW is lossless provided that at least one
of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
another match in S2 with which the join gives a line in S.
And likewise a row of S2 must have another match in S1 with which
the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
& the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
But then S1 join S2 gives us not only (x(V),x(U),y(W)), but 3 more tuples:
(x(V),x(U),x(W)), (y(V),y(U),y(W)), and (y(V),y(U),x(W)).
Lemma. Suppose we have a schema S=(A,B,C,D,...) grouped
into 3 non-empty subsets V,U,W. The decomposition of S
into S1=VU & S2=UW is lossless provided that at least one
of U → V or U → W holds on (S1 join S2).
Proof. Consider (S1 join S2).
If (x(V), x(U)) in S1 fits with (y(U), y(W)) in S2 then x(U)=y(U).
We have a problem if the joined tuple (x(V),x(U),y(W)) is not in S.
But S1 and S2 is a decomposition of S so a row in S1 must then have
another match in S2 with which the join gives a line in S.
And likewise a row of S2 must have another match in S1 with which
the join gives a line in S.
Denote the match of (x(V), x(U)) in S1 by (x(U), x(W) neq y(W)) in S2.
& the match of (y(U), y(W)) in S2 by (y(V) neq x(V), y(U)) in S1.
But then S1 join S2 gives us not only (x(V),x(U),y(W)), but 3 more tuples:
(x(V),x(U),x(W)), (y(V),y(U),y(W)), and (y(V),y(U),x(W)).
==> neither U → V nor U → W holds on (S1 join S2).
Redundancy reducing property
Redundancy = repeated information
We want to remove repeated info which arises by a non-trivial
FD.
Definition: The decomposition of S into S1 and S2 is
redundancy reducing if either S1 or S2 is free of any key
for the original S, and they have fewer rows than S.
NB. Since we only consider non-trivial dependencies, if it was
the case that both U → V and U → W holds on S then U
would be key for S and so S wouldn't need fixing.
Hence, redundancy reducing property is also
guaranteed for decompositions that arise in
correctly carried out normalisation processes.
Dependency preserving property
Remember, FDs = constraints that we observe during analysis of
the real-world situation.
We want to guarantee and check FDs,
but we want this via the keys (of decomposed schemas)
Usually the decomposition done in the normalisation process
achieves this, except a problem case [fortunately arises rarely!]:
supplier → supp_address
contact_person → supp_address
Here, if we take {contact_person, supp_address} in a
separate table, the first FD is lost. Likewise, if we take
{supplier, supp_address} into a separate table, the
second FD is lost. No satisfactory solution in this case.
Case study: Achim's example tables
So how do we deal with the FD?
cid,year → numbers
Applying what we learned, we decompose the
lecturing(cid,sid,year,numbers) schema into:
course_instances(cid,year,numbers)
taught_by(cid,year,sid).
Star at this long enough to work out what ER diagram
corresponds to it?...
...finally, it is good practice to revise the ER so that whenever
the db needs extending you start from the best ER and ot
have to normalise again for the same FDs.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 35 posted: 7/23/2011 language: English pages: 29