Document Sample
3-alg-calc Powered By Docstoc
					       Relational Algebra & Calculus

                                                                                         Zachary G. Ives
                                                                                    University of Pennsylvania
                                                                     CIS 550 – Database & Information Systems

                                                                                      September 15, 2009
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Codd’s Relational Algebra
 A set of mathematical operators that compose,
  modify, and combine tuples within different relations

 Relational algebra operations operate on relations
  and produce relations (“closure”)
   f: Relation  Relation   f: Relation x Relation  Relation

Codd’s Logical Operations: The
Relational Algebra
 Six basic operations:
      Projection      (R)
      Selection       (R)
      Union          R1 [ R2
      Difference     R1 – R2
      Product        R1 £ R2
      (Rename)       b (R)
 And some other useful ones:
      Join           R1 ⋈  R2
      Semijoin       R1 ⋉  R2
      Intersection   R1 Å R2
      Division       R1 ¥ R2

 Data Instance for Operator Examples
STUDENT       Takes                               COURSE
sid   name    sid   exp-grade      cid               cid           subj       sem
  1   Jill    1         A       550-0109          550-0109         DB         F09
  2   Qun     1         A       520-1009          520-1009         AI         S09
  3   Nitin   3         A       520-1009          501-0109         Arch       F09
  4   Marty   3         C       501-0109
              4         C       501-0109

                                  PROFESSOR                  Teaches
                                   fid     name              fid        cid
                                   1       Ives              1          550-0109
                                   2       Taskar            2          520-1009
                                   8       Martin            8          501-0109
Last Time…
 We discussed:
   Projection, (R), specified a set of attributes to include
    in a new relation

Selection, 

Product X

Join, ⋈: A Combination of Product
and Selection

Union 

Difference –

Rename, b
 The rename operator can be expressed several
   The book has a very odd definition that’s not algebraic
   An alternate definition:
     b(x)           Takes the relation with schema 
                       Returns a relation with the attribute list b

   Rename isn’t all that useful, except if you join a relation
    with itself
     Why would it be useful here?

 This completes the basic operations of the relational
  algebra. We shall soon find out in what sense this is
  an adequate set of operations. Try writing queries
  for these:
      The names of students named “Bob”
      The names of students expecting an “A”
      The names of students in Milo Martin’s 501 class
      The sids and names of students not enrolled

Deriving Intersection
Intersection: as with set operations, derivable from

                   AÅ B
                      ≡ (A [ B) – (A – B) – (B – A)
    A-B     B-A       ≡ A – (A – B)

     A      B

The Big Picture: SQL to Algebra to
Query Plan to Web Page
                                                           Web Server /
                                                             UI / etc
      Query Plan – an
      operator tree

                    STUDENT               Merge             Execution

                           Takes          COURSE
          Optimizer           by cid              by cid
   WHERE STUDENT.sid = Takes.sID
   AND Takes.cID = cid
Hint of Future Things: Optimization
Is Based on Algebraic Equivalences
 Relational algebra has laws of commutativity, associativity,
  etc. that imply certain expressions are equivalent in semantics
 They may be different in cost of evaluation!

              c Ç d(R) ´ c(R) [ d(R)

             c (R1 £ R2) ´ R1 ⋈c R2

              c Ç d (R) ´ c (d (R))

 Query optimization finds the most efficient representation to
  evaluate (or one that’s not bad)

Switching Gears: An Equivalent, But
Very Different, Formalism
 Codd invented a relational calculus that he proved
  was equivalent in expressiveness
    Based on a subset of first-order logic – declarative,
     without an implicit order of evaluation
       Tuple relational calculus
       Domain relational calculus
    More convenient for describing certain things, and for
     certain kinds of manipulations
 The database uses the relational algebra internally
 But query languages (e.g., SQL) are mostly based on
  the relational calculus

Domain Relational Calculus
 Queries have form:
     domain variables
    {<x1,x2, …, xn>| p}
 Predicate: boolean expression over x1,x2, …, xn
     Precise operations depend on the domain and query
      language – may include special functions, etc.
     Assume the following at minimum:
       <xi,xj,…>  R   X op Y    X op const   const op X

       where op is , , , , , 
              xi,xj,… are domain variables
More Complex Predicates
 Starting with these atomic predicates, build up new
 predicates by the following rules:
  Logical connectives: If p and q are predicates, then so are
   p  q, p  q, p, and p  q
         (x>2)  (x<4)
         (x>2)  (x>0)
  Existential quantification: If p is a predicate, then so is
         x. (x>2) (x<4)
  Universal quantification: If p is a predicate, then so is x.p
         x.x>2
         x. y.y>x
Some Examples
 Faculty ids
 Subjects for courses with students expecting a “C”
 All course numbers for which there exists a smaller
  course number

Logical Equivalences
 There are two logical equivalences that will be
  heavily used:
    p  q  p  q
     (Whenever p is true, q must also be true.)
    x. p(x)  x. p(x)
     (p is true for all x)
 The second can be a lot easier to check!

 Example:
    The highest course number offered

Free and Bound Variables
 A variable v is bound in a predicate p when p is of
  the form v… or v…
 A variable occurs free in p if it occurs in a position
  where it is not bound by an enclosing  or 
 Examples:
    x is free in x > 2
    x is bound in x. x > y

Can Rename Bound Variables Only
  When a variable is bound one can replace it with
   some other variable without altering the
   meaning of the expression, providing there are
   no name clashes
  Example: x. x > 2 is equivalent to y. y > 2

  Otherwise, the variable is defined outside our

  Pitfall in what we have done so far – how do we interpret:
   {<sid,name>| <sid,name>  STUDENT}

     Set of all binary tuples that are not students: an infinite set (and
      unsafe query)

  A query is safe if no matter how we instantiate the
   relations, it always produces a finite answer
     Domain independent: answer is the same regardless of the
      domain in which it is evaluated
     Unfortunately, both this definition of safety and domain
      independence are semantic conditions, and are undecidable

Safety and Termination Guarantees

 There are syntactic conditions that are used to
  guarantee “safe” formulas
    The definition is complicated, and we won’t discuss it; you
     can find it in Ullman’s Principles of Database and Knowledge-
     Base Systems
    The formulas that are expressible in real query languages
     based on relational calculus are all “safe”
 Many DB languages include additional features, like
  recursion, that must be restricted in certain ways to
  guarantee termination and consistent answers

How do you write:
   Which students have taken more than one course from
    the same professor?

Translating from RA to DRC
 Core of relational algebra: , , , x, -
 We need to work our way through the structure of
  an RA expression, translating each possible form.
   Let TR[e] be the translation of RA expression e into

 Relation names: For the RA expression R, the DRC
  expression is
    {<x1,x2, …, xn>| <x1,x2, …, xn>  R}

Selection: TR[ R]
 Suppose we have (e’), where e’ is another RA
  expression that translates as:
  TR[e’]= {<x1,x2, …, xn>| p}
 Then the translation of c(e’) is
  {<x1,x2, …, xn>| p’}
  where ’ is obtained from  by replacing each
  attribute with the corresponding variable
 Example: TR[#1=#2 #4>2.5R] (if R has arity 4) is
  {<x1,x2, x3, x4>|
       < x1,x2, x3, x4>  R  x1=x2  x4>2.5}

Projection: TR[i1,…,im(e)]
 If TR[e]= {<x1,x2, …, xn>| p} then
      {<x i1,x i2, …, x im >|  xj1,xj2, …, xjk.p},
  where xj1,xj2, …, xjk are variables in x1,x2, …, xn
  that are not in x i1,x i2, …, x im

 Example: With R as before,
  #1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R}

Union: TR[R1  R2]
 R1 and R2 must have the same arity
 For e1  e2, where e1, e2 are algebra expressions
   TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}
 Relabel the variables in the second:
   TR[e2]={< x1,…,xn>|q’}
 This may involve relabeling bound variables in q to
  avoid clashes
 Example: TR[R1  R2] = {< x1,x2, x3,x4>|
     <x1,x2, x3,x4>R1  <x1,x2, x3,x4>R2

Other Binary Operators
  Difference: The same conditions hold as for union
    If TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}
    Then TR[e1- e2]= {<x1,…,xn>|pq}
  Product:
    If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}
    Then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq}

  Example:
   TR[RS]= {<x1,…,xn, y1,…,ym >|
       <x1,…,xn> R  <y1,…,ym > S }

What about the Tuple Relational

 We’ve been looking at the Domain Relational
 The Tuple Relational Calculus is nearly the same, but
  variables are at the level of a tuple, not an attribute

    {Q | 9 S  COURSES, 9 T 2 Takes (S.cid = T.cid Æ Q.cid
     = S.cid Æ Q.exp-grade = T.exp-grade)}

Limitations of the
Relational Algebra / Calculus
Can’t do:
    Aggregate operations
    Recursive queries
    Complex (non-tabular) structures

 Most of these are expressible in SQL, OQL, XQuery
  – using other special operators
 Sometimes we even need the power of a Turing-
  complete programming language

 Can translate relational algebra into relational calculus
    DRC and TRC are slightly different syntaxes but equivalent
 Given syntactic restrictions that guarantee safety of DRC
  query, can translate back to relational algebra

 These are the principles behind initial development of
  relational databases
    SQL is close to calculus; query plan is close to algebra
    Great example of theory leading to practice!