Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Maier_Chapter 4 Funtional Dependencies by nyut545e2

VIEWS: 13 PAGES: 29

									Chapter 4

FUNCTIONAL                      DEPENDENCIES

Two primary purposes of databases are to attenuate data redundancy and
enhance data reliability. Any a priorz’ knowledge of restrictions or constraints
on permissible sets of data has considerable usefulness in reaching these
goals, as we shall see. Data dependencies are one way to formulate such ad-
vance knowledge. In this chapter we shall cover one type of data dependency,
the functional dependency. In Chapter 7 we cover two other types of data
dependencies, the multivalued and join dependencies. Other general classes
of data dependencies are treated in Chapter 14.


4.1    DEFINITIONS

We discussed keys in Chapter 1. Functional dependencies are a generaliza-
tion. Table 4.1 depicts the relation assign(PILOT          FLIGHT      DATE
DEPARTS). Assign tells which pilot flies a given flight on a given day, and
what time the flight leaves. Not every combination of pilots, flights, dates,
and times is allowable in assign. The following restrictions appIy, among
others.

  1.    For each flight there is exactly one time.
  2.    For any given pilot, date, and time, there is only one flight.
  3.    For a given flight and date, there is only one pilot.

These restrictions are examples of functional dependencies. Informally, a
functional dependency occurs when the values of a tuple on one set of at-
tributes uniquely determine the values on another set of attributes. Our
restrictions can be phrased as

  1.    TIME functionally depends on FLIGHT,
  2.    FLIGHT functionally depends on {PILOT,        DATE, TIME},       and
  3.    PILOT functionally depends on {FLIGHT,        DATE).

                                       42
                                                              Definitions    43

   Table 4.1   The relation assign(PILOT     FLIGHT    DATE DEPARTS).

          assign (PILOT       FLIGHT         DATE        DEPARTS)
                  Cushing      83             9 Aug      10: 15a
                  Cushing     116            IO Aug       1:2sp
                  Clark       281             8 Aug       5:SOa
                  Clark       301            12 Aug       6335~
                  Clark        83            11 Aug      10: 15a
                  Chin         83            13 Aug      10: 1Sa
                  Chin        116            12 Aug       1:25p
                  Copely      281             9 Aug       5:SOa
                  Copely      281            13 Aug       550a
                  Copely      412            15 Aug       1:25p




 We generally reverse the order of the two sets and write FLIGHT, DATE
functionally determines PILOT, or {FLIGHT, DATE]* PILOT. (Recall that
 we let a single attribute A stand for (A }.)
    We now state the notion formally using our relational operators. Let r he a
 relation on scheme R, with X and Y subsets of R. Relation r satisfies the
functional dependency (FD) X + Y if for every X-value x, ay(ax,,(r))      has at
 most one tuple. One way to interpret this expression is to look at two tuples,
 t1 and t2, in r. If tl(X) = tl(X), then tl(Y) = t2(Y). In the FDX -+ Y X is
 called the Zeft side and Y is called the right side.
    This interpretation of functional dependency is the basis for the algorithm
 SATISFIES given below.

Algorithm 4.1 SATISFIES
Input: A relation r and an FD X + Y.
Output: true if T satisfies X + Y, false otherwise.
SATISFIES(r, X --f Y);
   1. Sort the relation r on its X columns to bring tuples with equal X-values
        together.
   2. If each set of tuples with equal X-values has equal Y-values, return
        true. Otherwise, return f&e.

SATISFIES tests if a relation r satisfies an FD X -+ Y. Table 4.2 shows the
result of running SATISFIES(a.ssign, FLIGHT --, DEPARTS) on the rela-
tion assign from Table 4.1. The dashed lines mark off sets of tuples with
44    Functional Dependencies

equal FLIGHT-values.     The DEPARTS-values       for each set are the same, so
the FD is satisfied.

Table 4.2    The result of running the algorithm SATISFIES on the relation
                            assign from Table 4.1.

          assign (PILOT        FLIGHT           DATE              DEPARTS)
                  Cushing       83               9 Aug            10: 15a
                  Clark         83              11 Aug            10: 15a
                  Chin          83              13 Aug            10: 15a
                  __-_-______-________*------------------------------------------
                  Cushing      116              10 Aug              1:25p
                  Chin         116              12 Aug              1:2sp
                  _____I______________I___________________-----------------------
                  Clark        281               8 Aug              5:SOa
                  Copely       281               9 Aug              5:SOa
                  Copely       281              13 Aug              5:SOa
                  ___---______*______--------------------------------------------
                  Clark        301              12 Aug              6:35p
                  ____________________---------------------------------------~---
                  Copely       412              1.5 Aug             1:25p




Table 4.3 shows the result of running SATISFIES(assign, DEPARTS 4
FLIGHT). There is a set of tuples with equal DEPARTS-values that does not
have equal FLIGHT-values, so the FD is not satisfied by assign.

   There are two extreme cases to consider, namely X + @ and @ + Y. The
FD X + @ is trivially satisfied by any relation. The FD (z) -+ Y is satisfied by
those relations in which every tuple has the same Y-value. In the sequel, we
shall usually ignore FDs of these forms.


4.2   IIlV’FERENCE AXIOMS

For a relation r(R), at any given moment there is some family of FDs F that T
satisfies. We encounter the same problem we had with keys. One state of a
relation may satisfy a certain FD, while another state does not. We want the
family of FDs F that all permissible states of f satisfy. Finding F requires
                                                           Inference Axioms       45

            Table 4.3    The result of running SATISFIES(assign,
                           DEPARTS --) FLIGHT)

          assign (PILOT             FLIGHT              DATE               DEPARTS)
                  Clark             281                   8 Aug             5:SOa
                  Copeiy            281                   9 Aug             5:SOa
                  Copely            281                  13 Aug             5:SOa
                  -_-----_---------------------*---------------------------------
                  Cushing             83                  9 Aug            10:lSa
                  Clark               83                 11 Aug            10: 15a
                  Chin                83                 13 Aug            10:15a
                  ____-___.____._-________________________-----------------------
                  Cushing           116                 10 Aug               1:2sp
                  Chin              116                  12 Aug              1:25p
                  Copely            412                  15 Aug              1:25p
                  --_----__--------__--------------------------------------------
                  Clark             301                  12 Aug             6:35p




semantic knowledge of the relation r. We can also consider a family of FDs F
applying to the relation scheme R. In this case, any relation r(R) must satisfy
all the FDs of p. It is not always clear which begets the other, the set of per-
missible states of a relation or the FDs on the relation scheme.
   The number of FDs that can apply to a relation r(R) is finite, since there is
only a finite number of subsets of R. Thus it is always possible to find all
the FDs that r satisfies, by trying ail possibilities using the algorithm
SATISFIES. This approach is time-consuming. Knowing some members of
F, it is often possible to infer other members of F. A set F of FDs implies the
FD X -+ Y, written F I= X + Y, if every relation that satisfies all the FDs in
F also satisfies X -+ Y. An inference axiom is a rule that states if a relation
satisfies certain FDs, it must satisfy certain other FDs.
   We now introduce six inference axioms for FDs. In the statement of the
rules, r is a relation on R and IV, X, Y, and Z are subsets of R.

Fl. Reflexivity
The relation 7rx(uxCx(r))   always has at most one tuple, so X + X always
holds in r.
46     Functional    Dependencies

F2. Augmentation
This axiom deals with augmenting the left side of an FD. If + satisfiesx + Y,
then ny(u~&))     has at most one tuple for any X-value x. If Z is any subset
of R, then uxz=&)    c ax=,(r) and hence



Thus ny(oxz,,(r))        has at most one tuple and r must satisfy X Z -+ Y.

Example   4.1       Consider relation I below. Relation r satisfies the FD   A --+B,

                               r(A     B      c      D)
                                 Ql     bl    Cl     dl
                                 at     b2    cl     dl
                                 al     bl    ci     d2
                                 a3     b3    ~2     4

andhencetheFDsAB+B,AC+B,AD+B,ABC-+B,ABD+B,
ACD+B,andABCD+B,byaxiomF2.

F3. Additivity
This axiom allows us to combine two FDs with the same left sides. If r
satisfies X --* Y and X + Z then T~(u~=,(T)) and nz(ax,,(r))   both have at
most one tuple for any X-value x. If ?rn(ox=,(r))  had more than one tuple,
then at least one of ay(crxzx(r)) and az(ax,,(r)) would have more than one
tuple. Thus, Y satisfies X + Y 2.

Example 4.2 In the relation of Example 4-1, r satisfies       A + B and A + C.
By axiom F3, r must also satisfy A + B C.


F4. Projectivity
This axiom is more or less the reverse of additivity. If r satisfies X + Y Z,
then 7rn(ux,,(r))    has at most one tuple for any X-value x. Since
rr(xn(ux=,(r)))    = ~F~(u~&I)),   ?rdux=,(r))   can have at most one tuple.
Hence I satisfies X + Y,

Example  4.3 In the relation of Example 4.1, Y satisfies      A + B C. By axiom
F4, I must also satisfy A + B and A -+ C.
                                                Applying the Inference Axioms   47

FS. Transitivity
This axiom and the next are the most powerful of the inference axioms, f-et y
satisfy X - Y and Y - 2. Consider tupfes tl and t2 in r. We know that if tl(X)
= t,(X), then tl(u) = t2(Y) and also if tl(r) = t2(YJ, then t,(Z) = t,(Z).
Therefore, if tl(X) = t2(X), then t,(Z) = t2(Z), so r satisfies X - Z.

ExampIe 4.4 Relation T shown below satisfies the FDs A - B and B - C. By
axiom FS, T satisfies A - C.

                           r(A      B      C         D)
                             al     bi     ~2        dl
                             a2     bz     ct        dz
                             a3     bl     ~2        dl
                             a4     bl     ~2        d3


F6. Pseudotransitivity
Let I satisfy the FDsX - Y and YZ - W and let tl and t2 be tuples in r. We
know if tl(X) = t,(X), then tI( Y) = t2(Y) and also if tl( Y Z) = t2( Y Z),
then tl( W) = tz(W).   From tr(XZ) = t,(XZ) we can deduce that tt(X) =
t2(X) and so tr ( Y) = t2( Y) and further fl ( Y Z) = t2( Y Z), which implies
tl( w) = t2( W). Thus r satisfies X Z - W.

  To summarize, if W, X, Y, and Z are subsets of R, for any relation r on R:

Fl.   Reflexivity: X - X.
F2.   Augmentation: X - Y imphes X Z - Y.
F3.   Additivity: X - Y and X - Z imply X - Y Z.
F4.   Projectivity: X - Y Z implies X - Y.
F5.   Transitivity: X - Y and Y - Z imply X - Z.
F6.   Pseudotransitivity: X - Y and Y Z - W imply X Z -            W.



4.3    APPLYING    THE INFERENCE         AXIOMS

Using the axioms Fl to F6 it is possible to derive other inference rules for FDs.

Example 4.5 Let r be a relation on R with X and Y subsets of R. Axiom Fl
says that T satisfies Y - Y. Applying axiom F2 we get r satisfies X Y - Y.
Another way to state this rule is that for Y c X E R, r satisfies X - Y.
48    Functional   Dependencies

Example 4.6 Let Y be a relation on R with X, Y, and Z subsets of R. Suppose
P satisfies X Y - Z and X - Y. By axiom F6 we get r satisfiesXX - Z, which
simplifies to X - Z.

  To disprove a conjecture about FDs, all we need to do is exhibit a relation
where the conjecture does not hold.

Example   4.7 We want to disprove the conjecture X Y - Z W implies X - 2.
The relation r below satisfies A B - C D, but A + C.

                             r(A      B     C      D)
                                  a   b     c      d
                                  a   b’    C’     d

    Some of the inference axioms can be derived from the others. For example,
FS, transitivity, is a special case of F6, pseudotransitivity, where Z = 8. F6
follows from Fl, F2, F3, and FS: if X - Y and YZ - W, then by Fl, 2 - 2.
By F2, XZ - Y, andXZ - Z. Using F3, we getX2 - YZ. Finally, applying
F.5 we getXZ - W.
    We shall see in the next section that axioms Fl to F6 are complete; that is,
every FD that is implied by a set E; of FDs can be derived from the FDs in P by
one or more applications of these axioms. We have shown that each axiom is
correct, so applying the axioms to FDs in a set F can only yield FDs that are
implied by F.
    Given axioms Fl, F2, and F6, we can prove the rest. We have just seen that
F-5 is a special case of F6. Given X - Y and X - Z, we use Fl to get YZ -
 Y Z and apply F6 twice, first to get X Z - Y Z and then to get X - Y Z.
Therefore, F3 follows from Fi, F2, and F6. To prove F4, suppose X - YZ. By
Fl, Y - Y, and by F2, YZ - Y. Applying F6 yields X - Y. Thus axioms Fl,
F2, and F6 are a complete subset of Fl to F6. Axioms Fl, F2, and F6 are also
independent:     no one of the axioms can be proved from the other two (see Exer-
cise 4.5). These three axioms are sometimes called Armstrong’s             uxioms,
although they are not very similar to Armstrong’s original axioms (but the
name has a nice ring to it).
    Let F be a set of FDs for a relation r(R). The closure of F, written F+, is the
 smallest set containing F such that Armstrong’s axioms cannot be applied to
the set to yield an FD not in the set. Since F+ must be finite, we can compute
it by starting with F, applying Fl, F2, and F6, and adding the derived FDs to
F until no new FDs can be derived. The closure of F depends on the scheme R.
IfR = A B, then P+ will always contain B - B, but if R = A C, F+‘ never
                                               Completenessof the Inference Axioms                  49

contains B - B. When R is not explicitly defined, it is assumed to be the set of
all attribute symbols used in the FDs of F.
   The set F derives an FD X - Y if X - Y is in F+. Since our inference ax-
ioms are correct, if F derives X - Y, then F implies X - Y. In the next section
we prove the converse. Note that F+ = (F+)+ (see Exercise 4.6).

Example 4.8 Let I; = (A B -C, C-B}beasetofFDsonr(ABC).F+
= (A -AA,AB     -A,AC-A,ABC-AA,B-BB,AB-B,BC-BB,
ABC-B,C-C,AC-C,BC-C,ABC-CAB-A&ABC-AA,
AC -AC,ABC-AC,BC-BC,ABC-BC,ABC-ABC,
AB-C,AB-AC,AB-BC,AB-ABC,C-BB,C-BC,AC-BB,
AC-AB).

  In Chapter 5 we shall see more succinct ways to express I;+.


4.4   COMPLETENESS               OF THE INFERENCE                  AXIOMS

We wish to show that axioms Fl to F6 allow us to infer all the FDs implied by a
set F of FDs.* That is, if F implies X - Y, then F derives X - Y. To prove
this result, we shall show how to construct, for any F, a relation I that satisfies
every FD in F+ but no others.

Definition 4.1 X - Y is an FD over scheme R if X and Y are both subsets of
R.FisasetofFDsoverRifeveryFDinFisanF?DoverR.

DefSnition 4.2 If F is a set of FDs over R and G is the set of all possible FDs
over R, then F- = G - F+. I;- is the exterior of F.

Definition 4.3 An FD X - Y is trivial if X 3 Y. If X - Y is a trivial FD over
R, then any relation r(R) satisfies X - Y.

   If I; is a set of FDs over R and X is a subset of R, then there is an FD X - Y
in Ff such that Y is muximul: for any other FD X - 2’ in F+, Y 2 Z. This
result follows from additivity. The right side Y is called the closure of X and is
denoted by X+. The closure of X always contains X, by reflexivity.


*For the results of this section we must assume all domains are infinite   in order to avoid unwanted
combinatorial   effects.
50    Functional    Dependencies

Example4.9  LetI;={A--,AB--D,CE-G,E--}.Then
(Al?)+ = ABDEH.

Theorem 4.1        Inference axioms Fl to F6 are complete.

Proof Given a set F of FDs over scheme R, for any FD X - Y in F- we shall
exhibit a relation r(R) that satisfies P+ but not X - Y. Hence we will know
that there are no FDs implied by F that are not derived by F. Relation r will
satisfy most of the FDs in P+ vacuously: for an FD W - Z in FS , there will be
no distinct tuples in r with equal W-values.
     L,etR =A1A2         -‘. A, and let Q; and bi be distinct elements of dom(Ai),
1 I i I n. There will be only two tuples in r, t, and t ‘. Tuple t will be (al a2
 . . . a, ) . Tuple t ’ is defined as


                             t ‘(Ai) =


   First we show that + does not satisfy X - Y. From the definition of T’, t(X)
 = t ‘(X). Suppose t(Y) = t ‘(Y). Then t ‘(Y) must be all u’s, and hence Y C
X+ . But since X - X+ E F+, by projectivity, X - Y is in F+, a contradic-
tiontox-      YEF-.
   Now we show that r satisfies all the FDs in F+. The only FDs we need worry
about are those of the form W - Z, where W C Xf. If W 5$ X+, then t(W)
 # t ‘( W). Since W c X+, by reflexivity and projectivity, X+ - W is in P+,
and by two applications of transitivity, so is X - Z. Hence Z c X+ and t(Z)
 = t ‘(Z). So Y satisfies W - 2.

Corollary  For any set of FDs F over scheme R , there is a relation T(R ) satisfy-
ing I;+ and violating every FD in F- . (Such an r is called an Armstrong
relation.)

Proof For each FD X - Y in F-, use Theorem 4.1 to construct a relation
rx, y(R) that satisfies F+ but violates X - Y. Rename the entries in each such
relation so that no pair of relations share a common entry. Let

                                   r=         u      rx,   Y-
                                         X-   YEF-


It is clear that I violates every FD in P-. It is left to the reader to show that r
satisfies F+ .
                                        Derivations and Derivation DAGs     51

  We see now that inference axioms Fl to F6 are consistent and complete.
ThusI;i=   X-    YifandonlyifX-      Y f I;+. From now on we use the terms
implies and derives interchangeably when discussing FDs. We generally shall
use only Armstrong’s axioms or some other complete set of axioms for com-
puting F+ .


4.5     DERIVATIONS     AND DERIVATION          DAGs

IfFeX-        Y, then either X - Y is in F, or a series of applications of the
inference axioms to F will yield X - Y. This sequence of axiom applications
and resulting FDs is a derivation of X - Y from F. More formally, let F be a
set of FDs over scheme R . A sequence P of FDs over R is a derivation sequence
on F if every FD in P either

  1.    is a member of F, or
  2.    follows from previous FDs in P by an application    of one of the in-
        ference axioms Fl to F6.

P is a derivation sequence for X -   Y if X -   Y is one of the FDs in P.

Example4.10    LetF={AB-E,AG--BE-I,E-G,GI-If}.
The following sequence is a derivation sequence for A B - G H.

   1.    AB-E              (given)
   2.    AB-AB             (reflexivity)
   3.    AB-B              (projectivity from 2)
   4.    AB-BE             (additivity from 1 and 3)
   5.    BE-I              (given)
   6.    AB-I              (transitivity from 4 and 5)
   7.    E-G               (given)
   8.    AB-G              (transitivity from 1 and 7)
   9.    AB-GI             (additivity from 6 and 8)
  10.    GI-     H         (given)
  11.    A B -     H       (transitivity from 9 and 10)
  12.    GI-     GI        (reflexivity)
  13.    GI -    I         (projectivity from 12)
  14.    A B -     GH      (additivity from 8 and 11)

This sequence contains unneeded FDs, such as 12 and 13, and is also a deri-
vation sequence for other FDs, such as A B - G I.
52          FunctionaI Dependencies

Definition 4.4 Let P be a derivation sequence on F. The use set of P is the set
of all FDs in F that appear in P.

   We have seen that some subsets of the axioms Fl to F6 are complete. Their
completeness implies that if there is a derivation sequence P for X -
Y using all the axioms Fl to F6, there is a derivation sequence P ’ for X
- Y using only the axioms in the complete subset (see Exercise 4.10).
   We shall be using a complete set of inference axioms that are not a subset
of Fl to F6, called B-axioms. For a relation r(R), with W, X, Y, and Z sub-
sets of R, and C an attribute in R:

     Bl .     Reflexivity: X - X.
     B2.      Accumulation: X - Y Z and Z - C W imply X -           Y Z C.
     B3.      Projectivity: X - Y Z implies X - Y.

The B-axioms are easily shown correct (see Exercise 4.11). We show the
B-axioms derive Armstrong’s axioms and hence are complete.

     Fl.     Reflexivity: same as Bl.
     F2.     Augmentation: if X - Y, then by Bl, X Z - X       Z for any subset Z in
             R. By repeated application of B2, we get X Z -    X Y Z, and B3 gives
             xz-Y.
     F6.     Pseudotransitivity: Let r satisfy X - Y and Y Z   - W. By Bl, X Z -
             XZ. By repeated application of B2, XZ - X YZ      andXZ - WX YZ.
             One application of B3 yields X 2 - W.

   Since the B-axioms are complete, we can always find a derivation sequence
using the B-axioms if F I= X - Y.

Example 4.11         Let F be the set of FDs in Example 4.10. Then

      1.     EI-EI                      (reflexivity)
      2.     E-G                        (given)
      3.     EI-EGI                     (accumulation from 1 and 2)
      4.     EI-GI                      (projectivity from 3)
      5.     GI-H                       (given)
      6.     EI-GHI                     (accumulation from 4 and 5)
      7.     EI-GH                      (projectivity from 6)
      8.     AB-AB                      (reflexivity)
      9.     AB-E                       (given)
     10.     A B -ABE                   (accumulation from 8 and 9)
                                            Derivations and Derivation DAGs    53

  11.    BE    --I                    (given)
  12.    A B   -ABEI                  (accumulation from 10 and 11)
  13.    AI3   -ABEGI                 (accumulation from 4 and 12)
  14.    A B   -ABEGHI                (accumulation from 7 and 13)
  15.    AB    - GH                   (projectivity from 14)

is a derivation sequence for A B - G H using only the B-axioms.



4.5.1    RAP-Derivation   Sequences

Consider derivation sequences for X - Y on a set of FDs F using the B-axioms
that satisfy the following constraints:

  1.     ThefirstFDisX-X.
  2.     The last FD is X - Y.
  3.     Every FD other than the first and last is either an FD in F or an FD of
         the form X - Z that was derived using axiom B2.


Such a derivation sequence is called a RAPdetivation      sequence, for the order
in which the axioms are used.

Example 4.12      Let F be the set of FDs in Example 4.10. Then

    1.    AB-AB                   (Bl)
    2.    AB-E                    (given)
    3.    AB -ABE                 032)
    4.    BE-I                    (given)
    5.    AB -ABEI                032)
    6.    E-G                     (given)
    7.    AB -ABEGI               032)
    8.    GI-H                    (given)
    9.    AB -ABEGHI              032)
  10.     A B - GH                033)

is a RAP-derivation    sequence on F for A B - G H.

Theorem 4.2 Let F be a set of FDs. If there is a derivation sequence on F for
x-   Y, then there is a RAP-derivation sequence on F for X - Y.
54        Functional Dependencies

Proof Let P be a derivation sequence on F for X - Y using the B-axioms,
which must exist by our earlier remarks. Remove all the FDs in P past the first
occurrence of X - Y. P is still a derivation sequence for X - Y. Insert X - X
at the head of the sequence, if it is not already there.
   We next show that we are able to get by without the FDs in P generated by
B3, except for perhapsx - Y. Let Z - W be an FD in P (other than the last)
that was derived from Z - V W by B3. If 2 - W is not used to derive any F’D
further along P, then simply remove Z - W from P.
   IfZ - W is used to derive an FD further on, it must be by an application
of B2 or B3. If Z - W is used by B3, it must be to generate an FD Z - W’
whereW’c      W.ButZ-     W’canbederivedfromz-            VWbyB3,soZ-        W
can be removed from P. If Z - W is used by B2, it must be in one of two ways:

     1.   with an FD W ’ - C U to derive Z - C W, where W ’ c W, or
     2.   withanFDU-Z’toderiveU-+BZ’,whereZ’1Z,andBisanat-
          tribute in W.

Incasel,useZ-        VWinplaceofZ-            WtoderiveZ-CVWinsteadofZ
- C W. In case 2, Z - V W can be used in place of Z - W to derive U -
B Z’. Remove Z - W from P in either case.
   We have just shown that we can substitute an FD with a larger right side in
a derivation using the B-axioms. The only effect is possibly to generate an FD
with a larger right side than the FD originally generated, such as in case 1
above, where Z - C V W replaced Z - C W. This change is just another
substitution of an FD with a larger right side. Thus, the substitution of FDs
with larger right sides can propagate down the derivation sequence.
   The only problem that may arise from such substitutions is that X - Y ‘, Y’
 2 Y, might be generated as the last FD in P instead of X - Y, if X - Y was
derived using B2. In this case, add X - Y as the new last FD in P. X - Y can
be derived from X - Y’ using B3.
   We now have P to the point where it starts with X - X, ends with X - Y,
and has no FDs derived by B3 except possibly the last. The next step is to show
that X - Y can be derived using only B2 (except for the first and last FDs in
P) applied to FDs of the form X - Z W and W - C V, where W - C V is in
F. Thus any FDs in P derived by reflexivity are superfluous (except the first)
and can be removed.
   This portion of the proof is left to the reader and is illustrated only by exam-
ple here. The gist of the proof is that if a new attribute is introduced into the
right side of X - Z by B2, it can be introduced directly from some FD in F.
                                                    Derivations and Derivation DAGs   55

However, it may first be necessary to add other attributes to the right side.
Consider the following piece of a derivation sequence, where A is introduced
into the right side of X - V V’

           .

  10. x-           VV’
  11. Z-A               W        (given)
  12. v-           uz            (given, by Bl or by B2)
  13.    V-A            UZ       (from 11 and 12 by B2)
  14.    X-A            VV’      (from 10 and 13 by B2)




We want to get rid of V - A U Z and instead introduce A into the right side of
some FD with left sideX, usingZ - A W directly. Let Z = B1 Bz . . . Bk. We
replace FDs 13 and 14 by


  13.1    X -       V V’ B1                      (from 10 and 12 by B2)
  13.2    X -       VV’B,   B2                   (from 10 and 12 by B2)
  13.3    x -       VV’B1B2BJ                    (from 10 and 12 by B2)
               .
               .
               .
  13.k    X-        VV’B,B2           . * * Bk   (from 10 and 12 by B2)
               (=X-           VV’Z)
  14.     x -AVV’Z                               (from 11 and 13.k by B2)


This change gives us X - A V V’ Z instead of X - A V V’, but we have
already seen that substitution of FDs with larger right sides poses no problems.
   The basic idea of this part of the proof is to work backwards through P
removing applications of B2 that yield FDs where the left side is not X, as
shown in the example. Once this transformation is made, all applications of Bl
(except the first) become superfluous and can be removed (see Exercise 4.14).

  In the next section we shall introduce a pictorial means-a labeled
DAG-to    depict RAP-derivation sequences. We shall also show that every
such graph models a derivation sequence.
56         Functional     Dependencies

4.52         Derivation     DAGs

A directed acyclic graph (DAG) is a directed graph with no directed paths
from any node to itself. A labeled DAG is a DAG with an element from some
labeling set L associated with each node.

Definition  4.5 Let F be a set of FDs over scheme R. An F-bused derivation
DAG is a DAG labeled with attribute symbols from R constructed according
to the following rules.
     Rl.     Any set of unconnected nodes with labels from R is an F-based deriva-
             tion DAG.
     R2.     Let H be an F-based derivation DAG that includes nodes vl, v2, . . . ,
             vkwithlabelsAI,A2,       . . ..AkandletAtA2.--Ak-CZbeanFDin
             F. Form W ’ by adding a node u labeled C and edges (vI, u), ( v2, u),
              . . .) (vk, U) to H. H’ is an F-based derivation DAG.
     R3.     Nothing else is an F-based derivation DAG.

We abbreviate F-based derivation DAG to F-based DDAG.

Example 4.13          Let F be the set of FDs in Example 4.10, namely {AB - E,
AG-.?,BE-XI,-                  G, GI - H). Figure 4.1 shows various stages in the
construction       of an F-based DDAG.

   Any F-based DDAG is built by one application of rule Rl and any number
of applications of rule R2. R2 insures that the graph constructed is actually a
DAG.
Definition 4.6 If H is an F-based DDAG, a node v in H is an initial node if v
has no incoming edges. Any initial nodes must have been added to H by rule
Rl.
Definition       4.7      Let H be an F-based DDAG. N is a DDAG              for X -     Y if

     Dl.      X is the set of labels of initial nodes.
     D2.      Every attribute in Y labels some node in H.
Definition  4.8 The use set of an F-based DDAG H, denoted U(H), is the set
of all FDs in F used in the application of rule R2 during the construction of the
DDAG.”

*Use set is not quite well-defined, since for some sets F, there may be more than one way to con-
struct H. We should really write a use set of H, but we won’t.
                                        Derivations and Derivation DAGs     57




   0   A




   0   B

                                   (a)
                                                      rule Rl



       A             E


                                                      rule R2 using
       B
                                                           AB-E
   F
                                   (b)




                                                      rule R2 using
                                                           E -G




                                                      rule R2 using
                                                           B E-     I and
                                                           GI-H


                                   03
                                Figure 4.1




Example   4.14 The graph in Figure 4.1(d) is an F-based DDAG for A B -
G H. Its use set is {A B - E, E - G, B E - I, G I - H). The initial nodes
are the ones labeled A and B.

Example  4.15 Figure 4.2 shows a DDAG for A B C - A B C for any set of
FDs over a scheme R containing A, B, and C. Its use set is 8.
58      Functional   Dependencies




                                     0A
                                     0B
                                     0  C


                                     Figure 4.2



Observation      Let H be an F-based DDAG with initial nodes labeled with ex-
actly the attributes in some set X and all nodes in the graph labeled with ex-
actly the attributes in some other set Y. If Y’ is a subset of Y, then H is a
DDAG for X - Y’.

Theorem 4.3 Given a set of PDs F over R and an PD X -              Y, the following
are equivalent.

     1. FI=X-Yy.
     2. There is a derivation sequence on F for X -       Y.
     3. There is an F-based DDAG for X - Y.

Proof We have already observed the equivalence of 1 and 2 from Theorem
4.1. Theorem 4.2 states that condition 2 is the same as there being a RAP-
derivation sequence for X - Y on F. We shall show that we can construct an
F-based DDAG for X - Y given a RAP-derivation sequence for X - Y, and
vice versa.
     There is a natural correspondence between the B-axioms and the rules and
conditions for an F-based DDAG forX - Y. Axiom Bl corresponds to rule Rl
for constructing DDAGs. Axiom B2 corresponds to rule R2. Axiom B3 is em-
bodied in condition D2 of the definition of a DDAG for X - Y.
     Let P be a RAP-derivation sequence for X - Y on F. Let X - Z1, X - Zz,
 . . . , X - Zk be all the FDs in P, in order, that have X as the left side. We shall
show inductively that we can construct a sequence of F-based DDAGs HI, Hz,
 . . .( Hk such that Hi is obtained from Hi-l by the rules for constructing
DDAGs, and Hi is a DDAG for X - Zi.
                                          Derivations   and Derivation   DAGs   59

  We know that X - Zr must be X - X. We use rule Rl to construct DDAG
Hi that consists of unconnected nodes labeled with the attributes from X.
Suppose HI, HZ, . . . , Hi-rareDDAGsforX-Zr,X-ZZ2,          . . . . X-Z+,.
Consider X - Zj. This FD could have come from one of three places:

  1.   from F,
  2.   from FDs X - Zj and Z - C W by axiom B2. In this casej < i, Z -
       C W is in F, Zj contains Z, and Zi = C Zj.
  3.   Froman FDX - Zj by axiom B3. In this casej < i, Zj containsZi, and
       Zi = Ye

Incasel,letZi=BIBZ..           - B,. DDAG Hi-r contains DDAG HI. Apply rule
2 once for each attribute in Zi (m times) to Hi-1 to add nodes labeled B,, Bz,
 . . . , B,, and edges to these nodes from nodes labeled with the attributes of X.
The result is Hi. In case 2, we know Hi-1 contains Hj and Hj contains nodes
labeled with the attributes in Zj. Use rule 2 to add a node labeled C to Hi-1 to
form Hi. In case 3, Hj is already a DDAG for X - Zi and SO is Hi-l, since it
contains Hj. Let Hi = Hi- 1.
    When the process of constructing the His is completed, Hk will be an F-
based DDAG for X - Y.
     Now let H be an F-based DDAG forX - Y. We construct a RAP-derivation
sequence from H. Let HI, Hz, . . . , Hk be a sequence of F-based DDAGs such
that Hi is constructed from Hj-l by rule R2, 2 I i I k, and Hk = H. Let Zi
be the set of node labels in Hi. We shall construct a RAP-derivation sequence
P with X - Zr, X - Zz, . . . , X - Z, as a subsequence.
    Z1 must be X and HI must be the DDAG with unconnected nodes labeled
with the attributes in X. Let P begin with X - X = X - Zr. Now look at Hi, i
 2 2. Hi comes from Hi-l by rule 2, using an FD Z - C W in F, where C is the
label of the node added to Hi-1 and Zi- 1 contains Z. Thus Zi = C Zi-1. If Z
 - C W is not in P, add it to the end of P. Then add X - Zi to the end of P. X
 - Zi can be obtained by axiom B2, using X - Z+r and Z - C W.
     When this process terminates, we have a RAP-derivation sequence for X -
Z,, where Z, contains Y. Add X - Y to the end of P using axiom B3. P is now
a RAP-derivation sequence for X - Y.

Corollary There is an F-based DDAG H for X - Y with U(H) = G only if
there is a RAP-derivation sequence on F for X - Y with use set G.

Proof Immediate from the proof of Theorem 4.3. (Why is this corollary not if
and only if?)
60          Functional    Dependencies

Example 4.16 The F-based DDAG in Figure 4.1(d) can be constructed from
the RAP-derivation sequence in Example 4.12. The sequence of DDAGs in
Figure 4.1(a)-(d) yields the RAP-derivation sequence

      1.     AB-AB
      2.     AB-E
      3.     AB -ABE
      4.     E-G
      5.     AB -ABEG
      6.     BE-I
      7.     AB -ABEGI
      8.     GI-H
      9.     AB -ABEGHI
     10.     A B - G H.


4.5.3       More about Derivation         DAGs

Axiom B2 and rule R2 can both be strengthened in a similar manner. B2 can
be strengthened to the following form, where V is also a subset of R.

     B2’.      X-        YZandZ-         VWimplyX-   VYZ.

The corresponding change in rule R2 is left to the reader (see Exercise 4.18).
   Although the definition of DDAG allows multiple nodes with the same
label, the freedom is not needed.

Lemma 4.1 Let H be an F-based DDAG for X - Y. There is an F-based
DDAG for X - Y wherein every node has a distinct label.

Proof     Suppose H has two nodes with the same label, say v1 and v2 are both
labeled C. In the construction of H, either v1 and v2 were added at the same
time with rule Rl, or one was added later than the other using rule R2.
Assume v2 was added to H at the same time as or later than vl. There can be
no directed path from v2 to v1 in H.
   In the construction of H, any time rule R2 was applied using ~2, vl could
have been used instead, as shown in Figure 4.3. Thus there is an F-based
DDAGH’forX-           Y that has the same nodes and labels as H, as well as the
same set of initial nodes, but v2 has no outgoing edges in H ‘. H ’ is still a
DDAG for X - Y when v2 and its incoming edges are removed, since the set of
attributes labeling nodes does not change, and if v2 is an initial node, so is vl.
                                        Derivations and Derivation DAGs    61




       J-l originally                           Using   v1 for v2
               (a)                                      (W




  This transformation can be applied iteratively to all pairs of nodes with
equal labels to remove all duplicate labels.
  We have observed that if H is an F-based DDAG for X - Y, then it is also
an F-based DDAG for X - Y’, where Y I> Y’. Similarly, if X c X’, His
almost a DDAG forX ’ - Y. The only problem is that not all the attributes in
X ’ label some initial node in H. This problem can be solved by adding uncon-
nected nodes to H with labels in X ’ - X.

Lemma 4.2 Let H and J be F-based DDAGs for X - Y and Y - Z, respec-
tively. There is an F-based DDAG K for X - Z with U(K) c U(H) U U(J).

Proof We splice H and J together by overlapping the initial nodes of J with
the same-labeled nodes of H. Figure 4.4 gives an example of the overlapping
process where F = (A - E. A B - C, A C - D, C D - E, E - I). Notice
thatU(H)={A-E,AB-            C,AC-D}),U(J)={CD-E,E-I},and
U(K) = F.

Lemma 4.3 If H is an F-based DDAG for X -        Y, and V -    W is in U(H),
then F I= X - V.
62    Functional   Dependencies




                              DDAG K = H overlapped   with J
                                          w
                                   Figure 4.4



Proof For V-       W to be used in constructing H, H must contain nodes with
labels for every attribute in V. Hence H is an F-based DDAG for X - V.

Corollary If H is an F-based DDAG forX - Y and V - W is in U(H),       there
is an F-based DDAG for X - V that does not use V - W.

   Lemma 4.3 does not hold for derivation sequences, since V - W could be in
the use set of a sequence without it being necessary to derive X - Y.
                                              Testing Membership in F+      63

4.6   TESTING    MEMBERSHIP       IN F+

To determine if a set of FDs F != X - Y, we need only test if X - Y E I;+.
However, as we saw in Example 4.8, I;+ can be considerably larger than F. We
would like to find a means to test if X - Y is in F+ without generating all of
P+. In this section we present such a membership algorithm. The core of the
algorithm is a procedure that generates the closure of X under F. Once we
have found Xt , we can test if P implies X - Y.
   We seek an algorithm for testing membership that is more efficient than
generating all of F + . One way to compare algorithms is to examine the max-
imum amount of time they consume for an input of a given size. The (worst-
case) time-complexity of an algorithm is a function T(n) that gives the max-
imum number of steps the algorithm will take on an input of size n. Naturally,
T(n) depends on what is counted as one step of computation. We shall use the
RAM (random access machine) model as presented in Aho, Hopcroft, and
Ullman as our model of computation. A RAM is basically a model of a simple
digital computer with random access memory.
   For a particular algorithm, T(n) can be messy and complex, but often there
is some “nice” function that approximates the behavior of T(n). We write
T(n) = O(f(n)) (read “T(n) h as orderf(n )“) if there are constants c > 0 and
nl 1 0 such that T(n) I Q(n) for all n r nl.

Example 4.17 3n2 + 2 log2 log2 n = O(n2), since 3n2 -I- 2 log* log2 n 5 4n2
for n 1 1. Of course, 3n2 + 2 log2 log2 n = O(n3) as well, but we are more in-
terested in the slower growing function, since it is a better approximation of
3n2 + 2 log2 log2 n.

   For most algorithms, the time complexity T(n) is at least O(n), since most
algorithms read all their input, which takes n steps. We first present a
membership algorithm for FDs that is not O(n), but is easy to understand. We
then present a version of the algorithm that is more complex, but has O(n)
time complexity.
   We start with the function CLOSURE given below. CLOSURJ?$X, F)
returns X+ under F, where X is a set of attributes and F is a set of FDs.
OLDDEP and NEWDEP are variables for sets of attributes.

Algorithm 4.2 CLOSURE
Input: A set of attributes X and a set of FDs F.
Output: The closure of X under F.
64         Functionat Dependencies

  CLOSURE(X,          F)
     begin
     OLDDEP : = Q); NEWDEP : = X,
     while NEWDEP # OLDDEP do begin
           OLDDEP : = NE WDEP;
           for every FD W - Z in F do
               if NEWDEP I> W then
               NEWDEP : = NE WDEP U Z
       end;
     retum(NE WDEP)
     end

Example4.18    LetF={A-D,AB-E,BI-E,CD--E-CC).
CLOSURE@&‘, F) begins with NEWDEP = A E. On the first pass through
F, A - D is used to add D to NEWDEP, and E - C is used to add C to
NEWDEP, so NEWDEP = A C D E at the end of the for loop. The second
time through F. C D - I is used to add I to NEWDEP, so NEWDEP =
A C D E I at the end of the for loop. The next pass through F causes no
changes in NEWDEP, so A C D E I is returned as (A E)+.

    The algorithm essentially constructs an F-based DDAG for X - X+, using
a modified version of rule R2 in the definition of DDAG where more than one
node is added at a time (see Exercise 4.18). We start with initial nodes labeled
X and keep adding nodes to the DDAG until no new labels can be added. It is
not necessary to record the edges of the DDAG, however, since whether or not
we can use an FD W - Z depends only on there being nodes in the DDAG
with labels for all the attributes in W. The value of X’ does not depend on the
edges in the DDAG either, just on the final set of node labels. Thus, it suffices
to keep track of only the set of node labels in the DDAG during its construc-
tion. We keep track of the labels in OLDDEP and NEWDEP.
    Since we are constructing an F-based DDAG with initial nodes labeled X, it
follows that at any point in the execution of CLOSURE, NEWDEP E X+.
For any attribute A in X +, A will eventually be added toNEWDEP. Since A is
in X+ , F I= X - A, and there must be an F-based DDAG H for X - A. Any
FD W - Z used in constructing H can eventually be used in the construction
of the DDAG for the algorithm, and the DDAG in the algorithm will contain
labels for every attribute in Z. Therefore A will be added to NEWDEP and we
conclude that CLOSURE correctly computes X+.
    Using CLOSURE, it is simple to devise an algorithm to test membership in
F+. Algorithm 4.3 MEMBER performs this test.
                                                 Testing Membership in F+       65

Algorithm 4.3 MEMBER
Input: A set of FDs F and an FD X - Y.
Output: true ifF I= X - Y, false otherwise.
MEMBER@‘, X - Y)
   begin
   if Y C CLOSURE(X, F;) then retum(true) else return
  end.

    The time complexity for MEMBER is the same as the time complexity for
CLOSURE, since CLOSURE makes up the body of MEMBER. The worst
case for CLOSURE occurs when only one new right side of an FD is added to
NEWDEP for each execution of the for loop. If F = {A2 - Al, A3 - AZ, A4
 -A39    ..,,A,    -&-I),     in the computation for CLOSURE(A,, F), only at-
tribute Am-i, is added to NEWDEP on the i’h execution of the for loop. If a is
the number of different attribute symbols in F and p is the number of FDs in
F, then each execution of the for loop takes O(ap) time, since a steps are re-
quired to test containment of two sets over a elements, The while loop can be
executed p times before no changes occur to NEWDEP. Therefore, the time
complexity of CLOSURE, and hence of MEMBER, is O(ap2). Note that the
length of the input, IZ, is O(ap) (see Exercise 4.21).
    To see how the time complexity of CLOSURE can be improved, observe
that during the execution of CLOSURE, if for some FD W - Z, W is con-
tained in NEWDEP, then Z is added to NEWDEP and W - Z is of no further
use. At this point we could exclude W - Z from F and still compute the cor-
rect closure. By excluding FDs from F after their right sides are added to
NEWDEP,         we can reduce the number of FDs scanned during each execution
of the for loop. We can save even more time if we also know which FDs in 1;‘
currently have their left sides contained in NEWDEP.         If such information is
available we can consider an FD W - Z in F only when its left side is con-
tained in NEWDEP and then remove it from subsequent consideration. Thus
every FD in F would be considered only once. If each FD in F can be processed
in time proportional to its length in attribute symbols, we would have an O(n)
membership algorithm, where n is the number of symbols required to repre-
sent F and X - Y.
    We accomplish these ends as follows. For each FD W - Z in I; we shall
keep track of the number of attributes in W that are not in NEWDEP. When
this count becomes zero, it will be time to consider W - Z. To decrement the
count properly for each ED when a new attribute A is added to NEWDEP, it is
necessary to access all FDs with attribute A on their left sides. We therefore
maintain a series of lists, one for each attribute, consisting of all FDs in F with
66     Functional Dependencies

that attribute on the left side. Whenever an attribute is added to NEWDEP,
the list for that attribute is traversed, and all FDs on the list have their counts
decremented by 1. If some FD IV - Z on the list has its count decremented to
zero, then W is a subset of NEWDEP and Z is added to NEWDEP.
    We must be careful when adding Z to NEWDEP.             Suppose there is an at-
tribute A in Z that is already in NEWDEP.        If we traverse the list for A a sec-
ond time, we get erroneous values for the counts of FDs on the list. To prevent
this problem, we keep a set of attributes called UPDATE that is the subset of
NEWDEP        consisting of attributes that have not yet had their lists traversed.
When an attribute is added to NEWDEP for the first time, it is also added to
UPDATE       until its list can be traversed. UPDATE allows us to do away with
OLDDEP,       since when UPDATE         = a>, there are no more FDs that can be
used to add new attributes to NEWDEP.
    In the algorithm LINCLOSURE,          below, there is an array COUNT of in-
tegers containing the counts for each FD in F, and an array LIST of lists of
FDs for each attribute symbol in F. While an FD may seem to occur in lists for
more than one attribute, we actually store only one copy of the FD and have
the various lists point to this copy.

Algorithm 4.4 The function LINCLOSURE
Input and Output: identical to CLOSURE in Algorithm 4.2
LINCLOSURE(X,      I;*)
1. Initialization
       for each FD W 4 ZinFdobegin
        COUNT[W        -    Z] :=   ) WI;
        for each attribute A in W do add W - Z to LIST[A            ]
        end;
     NEWDEP        : = x;   UPDATE      : = X.
2. Computation
     while UPDATE    # @ do begin
       choose an A in UPDATE;
        UPDATE    := UPDATE   - A;
       for each FD W - Z in LIST[A]    do begin
          COUNT[ w - Z] := COUNT[W         - Z] -            1;
          if COUNT[ W - Z] = 0 then begin
             ADD := Z - NEWDEP,
             NEWDEP    := NEWDEP     U ADD;
             UPDATE   := UPDATE     U ADD
             end
          end
        end.
3. retum(NEWDEP).
                                                 Testing Membership in Ft       67

Example 4.19 Let F be as in Example 4.18. LINCLOSURE(A                  E, F) ini-
tializes NEWDEP, UPDATE, COUNT, and LIST as follows:

  NEWDEP = A E       UPDATE =AE
  LIST[A] = A - D. A B - E                   COUiVT[A - D] = 1
  LIST[B] = B I - E, A B - E                 COUNT[A B - E] = 2
  LIST[C) = CD - I                           COUNT[B I - E] = 2
  LISTID] = C D - I                          COUNT[C D - I] = 2
  LIST[E] = E - C                            COUNT[E - C] = 1
  LZST[I] = B I - E

   We select the A in UPDATE and traverse LIST[A]. COUNT[A - D] goes
to 0 and D is added to NEWDEP and UPDATE. COUNT[A B - E] goes to
1. If we next select E from UPDATE the result is

  NEWDEP=ACDE                       UPDATE     = CD
  COUNT[A - D] = 0
  COUNT[A B - E] = 1
  COUNT[B I - E] = 2
  COUNT[CD   - I] = 2
  COUNT[E - C] = 0.

Traversing the lists for C and D ieaves us with
  NEWDEP=ACDEI                   UPDATE = I
   COUNT[A - D] = 0
   COUNT[A B - E] = 1
  COUNT[B I - E] = 2
  COUNT[C D - I] = 0
  COUNT[E - C] = 0

Traversing the list for1 fails to reduce any counts to 0, so the algorithm returns
ACDEI.

    Let us review the workings of the computation step of LINCLOSURE. The
whIIe loop continues to execute while there are attributes in NEWDEP whose
lists have not been traversed. We choose one such attribute and traverse its
list. For each FD W - 2 in the list, we reduce COUNT[ W - 21. If the count
goes to 0, it is time to consider W - Z. We compute the set of attributes in 2
that are not already in NEWDEP and add these attributes to both NEWDEP
and UPDATE. The while loop stops executing when there are no more FDs
whose counts can be reduced.
68     Functional    Dependencies

Theorem 4.4         LINCLOSURE      has time complexity O(n) for input of length n.

Proof Computing COUNT[ W - Z] in the initialization step takes time pro-
portional to I WI if W is represented as a list of attributes. Computing all the
initial values for COUNT therefore takes O(n) time. Each FD W - Z in F
gets inserted into I WI lists in LIST. For an appropriate list representation,
adding one FD to one list takes a constant amount of time. Thus, filling in
LIST takes O(n) time. NEWDEP and UPDATE can also be initialized in
O(n) time.
    In the computation step, each attribute is added to UPDATE once, at most.
For each attribute A added to UPDATE, one pass of the while loop is per-
formed. For each pass of the while loop, an operation (decrement COUNT)
is performed for each FD in LIST[A].        Since any FD W - Z appears in
 I WI lists, the decrement operation is performed at most

                                      c IWI
                                     W-ZitlF
times. Thus O(n) time is spent decrementing COUNT.
   For any FD W - Z in F, the predicate COUNT[ W - Z] = 0 evaluates to
true at most once, since once COUNT[ W - Z] reaches 0, all the attributes in
W have been added to NEWDEP and removed from UPDATE. Thus, W -
Z does not appear in any attribute list remaining to be traversed. The com-
putation involving ADD takes time proportional to 12 I if NEWDEP is
represented as a bit vector. The total time spent with the statements involving
ADD is proportional to

                                        c     IZI,
                                     W-ZinF


which is O(n). Since no step of the algorithm takes more than O(n) time,
LINCLOSURE has time complexity O(n).

Corollary    Membership in F+ can be tested in O(n) time for inputs of length n.

Proof Substitute LINCLOSURE for CLOSURE in Algorithm                   4.3. Hence-
forth we shall assume MEMBER uses LINCLOSURE.
                                                                   Exercises    69

4.7    EXERCISES

4.1    Consider the relation r below.
                            r(A     B       C      D      E)
                              =1     bl     ~1     dl     el
                              al     b2     ~2     4      di
                              a2     bl     ~3     4      el
                              a2     bl     ~4     d3     el
                              a3     b2     ~5     4      el
       Which of the following FDs does r satisfy?
       A -D,AB-D,C-BDE,E--,A-E
4.2    Prove that r satisfies X - Y if and only if X is a key of xx&-).
4.3    Let r be a relation on R, with X a subset of R. Show that if xx(r) has the
       same number of tuples as r, then r satisfies X - Y for any subset Y
       ofR.
4.4    Prove or disprove the following inference rules for a relation r(R) with
       IV, X, Y, and 2 subsets of R.
       (a) x - YandZ - W imply XZ - Y W.
       (b) X Y - Z and Z - X imply 2 - Y.
       (cl x - YandY-ZimplyX-                YZ.
       (d) X - Y W - Z, and Y 2 W imply X - Z.
4.5    Prove that inference axioms Fl, F2, and F6 are independent. That is,
       no one of them can be proved from the other two.
4.6    Show that for any set of FDsF, F+ = (J’+)+.
4.7    Suppose F is a set of FDs over scheme R. If F = @, what does F+ look
       like?
4.8    For a set of FDs F, show that there is no relation satisfying all the FDs in
       F- and no others.
4.9*   Show inference axioms Fl, F3, F4, and FS are complete. Is this set of
       axioms independent?
4.10   Show that if there is a derivation sequence for X - Y using inference
       axioms Fl to F6, then there is a derivation sequence for X - Y using
       only Armstrong’s axioms.
4.11   Prove the B-axioms are correct.
4.12   Find a set of two inference rules that is complete. The rules need not be
       a subset of axioms Fl to F6.
4.13   LetF={AB-C,B-D,CD+E,CE+GH,G-A}.
       (a) Give a derivation sequence on F for A B - E.
       (b) Give a derivation sequence on F for B G - C using only Armstrong’s
            axioms
       (c) Give a RAP-derivation sequence on F for A B - G.
70    Functional   Dependencies

4.14” Complete the proof of Theorem 4.2.
4.1.5 Let F be as in Exercise 4.13. Construct an F-based DDAG for A B - G.
4.16 Prove that for sets F and G of FDs, if I; contains G, then F+ contains
      G+.
4.17 Prove that an invalid inference rule can always be disproved with a two-
      tuple relation.
4.18 Modify rule R2 in the definition of DDAG to reflect the change from
      axiom B2 to B2 ‘.
4.19 Let F and G be sets of FDs. Suppose for every FD 2 - W in P there is a
      G-based DDAG for Z - W. Prove that if X - Y has an F-based
      DDAG, it has a G-based DDAG.
4.20 Let F be a set of FDs over R. Find a bound on the size of F+ in FDs, in
      terms of the number of attributes in R.
4.21 Let F be a set of FDs where a is the number of distinct attributes in I;, p
      is the number of FDs in F, and 12is the number of symbols required to
      write F. Compare ap2 and n.
4.22 The algorithm MEMBER (Algorithm 4.3) computes more information
      than is necessary to ascertain if F b X - Y. Once Y is found to be in
      X+ , the rest of Xf is immaterial. Modify MEMBER and LIN-
      CLOSURE to remove this unnecessary computation.


4.8   BIBLIOGRAPHY         AND    COMMENTS

FDs were present when Codd [1970] first introduced the relational model, in
the form of keys. Codd [1972a] later introduced FDs that do not follow from
keys, for the purpose of normalization (see Chapter 6). Delobel and Casey
[1973] gave a set of inference axioms, which Armstrong [1974] showed were
complete and correct. He also gave a method for constructing an Armstrong
relation for a set of FDs. Beeri, Dowd, et al. [1980] explore the structure of
Armstrong relations.
   The LINCLOSURE algorithm is from Beeri and Bernstein [1979]. They
used derivation trees in their proofs. Derivation trees were the precursor of
DDAGs, introduced by Maier [1980b]. An exposition of the RAM model of
computation is given by Aho, Hopcroft, and Ullman [1974].
   Much work on the implication of FDs has focused on the discovery of keys
and the structure of sets of keys; see the papers by BCk&sy and Demetrovics
[1979]; B&k&ssy, Demetrovics, et al. [1980]; Demetrovics [1978, 19791; Forsyth
and Fadous [1975]; and Lucchesi and Osborn [1978].

								
To top