CPSC 504_ Data Management Review of Relational Model

Document Sample
CPSC 504_ Data Management Review of Relational Model Powered By Docstoc
					 CPSC 504: Data Management
Review of Relational Model 1/2
      Laks V.S. Lakshmanan
           Dept. of CS
           The Basic Model
• A DB = a (finite) set of (finite) relations.
• A relation has a well-defined schema – its set
  of attributes, each with its domain: e.g.,
  dom(songName), dom(Artist), dom(AlbumNo),
  dom(fileFormat), …
• A relation – a subset of the Cartesian (aka
  cross) product of its attribute domains:
• e.g., myfavs Í dom(AlbumNo) x
  dom(songName) x dom(Artist) x
     How do you get there?
• In practice, you often use the ER (for
  entity-relationship) model as an informal
  spec of a design of a relational database
• Schema = structure or table template.
• Relation = table instance.
            Example ER schema

       A schema in the ER model is specified using an ER diagram.

          Actor                                      Movie
                               Stars in

Name       DOB         Addr
                                            Title        Year       Dir

          Some constraints
• Each relation has a key: candidate keys,
  primary keys, superkeys, …?
• Functional dependency (FD): formalization and
  generalization of key.
• r(R), X, Y Í R. r |= XàY iff any pair of tuples
  in r that agree on X, necessarily agree on Y.
• e.g.: AlbumNoàArtist,
AlbumNo, songNameàAlbumNo, …, fileFormat.
• {AlbumNo, songName} – a key, but not
• Note: ER diagrams can express a rich class of
  (but not all) constraints.
                More on FDs
• FDs help determine keys: there are efficient
  algorithms as well as an assortment of
  (computationally) hard problems.
• FDs are pointers to redundancies in data: e.g., same
  Artist (value) will appear many times, once for each
  song in an album of that artist.
• Why is redundancy (in representation) bad?
• $efficient algorithms for redundancy removal – the so
  -called normal forms.
• There is a full assortment of them – 1NF, 2NF, 3NF,
  BCNF, 4NF, …
• FDs/keys help with indexing, with query optimization,
  with data exchange, etc.
     More on Normal Forms
• 1NF really says tuples must be “flat”.
• Other NFs really seek to capture different
  degrees of redundancy (and eliminate them).
• 3NF, BCNF – some of the most important ones
  in practice.
• Normalization algorithms: start from U, the
  set of all attributes, and figure out which
  attr sets should stay together, which ones
  should split up.
        Some more on FDs
• Constraints such as FDs are called
  integrity constraints. Why?
• ICs are statements in predicate logic.
• What does an FD really say?
• Need to reason with ICs, so:
  – Can determine keys,
  – Can check if certain ICs are enforced.
  – Can understand interaction of ICs.
           Last bits on FDs
• FDs admit a sound and complete axiom
  – Reflexivity: if X Í Y, then YàX.
  – Augmentation: if WàZ and X Í Y, then WYàZX.
  – Transitivity: if XàY and YàZ, then XàZ.
• Not all classes of ICs admit a S&C axiom
• Some consequences of the above axiom
  system for FDs.
       ICs, Tableaux, and Queries
    • An FD can also be stated in a “pictorial”
Schema/column   A    B              C        D

                x1   x2             _        _
                x1   x3             _        _

                          x2 = x3

                              Summary row/
  ICs, Tableuax, and Queries
• Tableaux are general; can express what
  conditions must hold when certain
  patterns are present in data.
• Can express equality generating
  dependencies (egd’s) and tuple
  generating dependencies (tgd’s).
• Can also express queries (by using
  summary row for saying what you want
  in the output).
  ICs, Tableaux, and Queries
• Tableaux can also be expressed in the
  form of rules, also known as “Horn
• Examples:
X2 = X3 ß r(X1,X2,_,_), r(X1,X3,_,_).
  Which really is saying …
X2 = X3 ß r(X1,X2,Y3,Y4),
Logically speaking, what are these rules
p(X,Y) ß a(X,Z), b(Z,Y), a(Y,W), W>100.
    Some more on NFs – 3NF
• Def. of 3NF: r(R), given set of
  applicable FDs F, is in 3NF iff for every
  XàA that holds for r, X is a superkey
  for r according to F OR A belongs to
  some key.
• e.g.: r(AlbumNo, Artist, songName,
  fileFormat) is not in 3NF, while
  r1(AlbumNo, Artist), r2(AlbumNo,
  songName, fileFormat) are. Why?
   Some more on NFs – BCNF
• Def. of BCNF: r(R), given applicable FDs F,
  is in BCNF iff for every XàA that holds for
  r, X is a superkey for r according to F.
• Clearly stronger than 3NF.
• e.g.: {S(treet), (s)T(ate), Z(ip)}, with STàZ,
  ZàT. – it’s in 3NF.
• Not in BCNF. What’s the redundancy here?
• {ZT, ZS} is in BCNF.
• Can you anticipate any problems with the
  design {ZT, ZS}?
• Majority of 3NF designs in practice tend to
  be in BCNF.
            More on ICs
• Inclusion depencency (IND): r1(Emp,
  Dept, Sal), r2(Mgr, Dept, Budget).
  r2[Mgr] Í r1[Emp]: every manager is an
• INDs are a generalization of referential
  integrity constraints.
• FDs, INDs. – most fundamental.
• What can we say about reasoning with
  INDs, or with INDs + FDs?
             More on INDs
• Good news -- $a complete axiomatization.
• Reflexivity: r[X]Ír[X].
• Projection+permutation: if r[A1,…,Am]
  Ís[B1…Bm], then r[Ai1…Aik] Ís[Bi1…Bik].
• Transitivity: if r[X] Ís[Y] and        s[Y]
  Ít[Z], then r[X] Ít[Z].
• But, while testing whether F |= f can be done
  in linear time, testing I |= i is PSPACE-
       Story of FDs+INDs
• Recall: FDs admit complete
  axiomatization, as do INDs.
• But, put together, there is no complete
  (k-ary) axiomatization for INDs+FDs!
• We have seen some (very) positive
  results and some negative results.
• What approaches can you think of for
  tackling the negative situations?
            Last bits on ICs
• There is a rich landscape of lot more classes
  of ICs.
• Fundamentally split into two syntactic classes:
  – Equality generating dependencies (egds), e.g., FDs.
  – Tuple generating dependencies (tgds), e.g., INDs.
• The value is NOT that these fancier ICs arise
  as is in practice.
• Rather, they arise in solving various problems:
  e.g., integrating data from diverse sources.
• E.g. tgd: (\forall M,D,B) [r2(M, D, B) à
  (\exists S) r1(M, D, S)].
• What is it saying, logically speaking?
           More Reading
• J.D. Ullman: Principles of Database and
  Knowledge-Base Systems, vol. I and II.
  CS Press, 1988.
• R. Ramakrishnan & J. Gehrke. Database
  Management Sytems. Mc-Graw Hill,
  latest edition.
• S. Abiteboul, R. Hull, and V. Vianu.
  Foundations of Databases. 1995.
  Available online from DBLP.
    Some Resources for Research
•   DBLP
•   Citeseer
•   ACM digital library
•   If you aren’t already a member,
    consider becoming ACM student
    member: really a great deal!
    Some top conferences/journals
         for DMM Research
•   CIKM
    JCSS, I&C, …

Shared By: