# 3-alg-calc

Document Sample

```					       Relational Algebra & Calculus

Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems

September 15, 2009
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Codd’s Relational Algebra
 A set of mathematical operators that compose,
modify, and combine tuples within different relations

 Relational algebra operations operate on relations
and produce relations (“closure”)
f: Relation  Relation   f: Relation x Relation  Relation

2
Codd’s Logical Operations: The
Relational Algebra
 Six basic operations:
   Projection      (R)
   Selection       (R)
   Union          R1 [ R2
   Difference     R1 – R2
   Product        R1 £ R2
   (Rename)       b (R)
 And some other useful ones:
   Join           R1 ⋈  R2
   Semijoin       R1 ⋉  R2
   Intersection   R1 Å R2
   Division       R1 ¥ R2

3
Data Instance for Operator Examples
STUDENT       Takes                               COURSE
sid   name    sid   exp-grade      cid               cid           subj       sem
1   Jill    1         A       550-0109          550-0109         DB         F09
2   Qun     1         A       520-1009          520-1009         AI         S09
3   Nitin   3         A       520-1009          501-0109         Arch       F09
4   Marty   3         C       501-0109
4         C       501-0109

PROFESSOR                  Teaches
fid     name              fid        cid
1       Ives              1          550-0109
8       Martin            8          501-0109
4
Last Time…
 We discussed:
 Projection, (R), specified a set of attributes to include
in a new relation

5
Selection, 

6
Product X

7
Join, ⋈: A Combination of Product
and Selection

8
Union 

9
Difference –

10
Rename, b
 The rename operator can be expressed several
ways:
 The book has a very odd definition that’s not algebraic
 An alternate definition:
b(x)           Takes the relation with schema 
Returns a relation with the attribute list b

 Rename isn’t all that useful, except if you join a relation
with itself
Why would it be useful here?

11
Mini-Quiz
 This completes the basic operations of the relational
algebra. We shall soon find out in what sense this is
an adequate set of operations. Try writing queries
for these:
   The names of students named “Bob”
   The names of students expecting an “A”
   The names of students in Milo Martin’s 501 class
   The sids and names of students not enrolled

12
Deriving Intersection
Intersection: as with set operations, derivable from
difference

AÅ B
≡ (A [ B) – (A – B) – (B – A)
A-B     B-A       ≡ A – (A – B)

A      B

13
The Big Picture: SQL to Algebra to
Query Plan to Web Page
Web Server /
UI / etc
Query Plan – an
Hash
operator tree

STUDENT               Merge             Execution
Engine

Takes          COURSE
Optimizer           by cid              by cid
Storage
Subsystem
SELECT *
FROM STUDENT, Takes, COURSE
WHERE STUDENT.sid = Takes.sID
AND Takes.cID = cid
14
Hint of Future Things: Optimization
Is Based on Algebraic Equivalences
 Relational algebra has laws of commutativity, associativity,
etc. that imply certain expressions are equivalent in semantics
 They may be different in cost of evaluation!

c Ç d(R) ´ c(R) [ d(R)

c (R1 £ R2) ´ R1 ⋈c R2

c Ç d (R) ´ c (d (R))

 Query optimization finds the most efficient representation to
evaluate (or one that’s not bad)

15
Switching Gears: An Equivalent, But
Very Different, Formalism
 Codd invented a relational calculus that he proved
was equivalent in expressiveness
 Based on a subset of first-order logic – declarative,
without an implicit order of evaluation
 Tuple relational calculus
 Domain relational calculus
 More convenient for describing certain things, and for
certain kinds of manipulations
 The database uses the relational algebra internally
 But query languages (e.g., SQL) are mostly based on
the relational calculus

16
Domain Relational Calculus
Queries have form:
domain variables
{<x1,x2, …, xn>| p}
predicate
Predicate: boolean expression over x1,x2, …, xn
 Precise operations depend on the domain and query
language – may include special functions, etc.
 Assume the following at minimum:
<xi,xj,…>  R   X op Y    X op const   const op X

where op is , , , , , 
xi,xj,… are domain variables
17
More Complex Predicates
Starting with these atomic predicates, build up new
predicates by the following rules:
 Logical connectives: If p and q are predicates, then so are
p  q, p  q, p, and p  q
 (x>2)  (x<4)
 (x>2)  (x>0)
 Existential quantification: If p is a predicate, then so is
x.p
 x. (x>2) (x<4)
 Universal quantification: If p is a predicate, then so is x.p
 x.x>2
 x. y.y>x
18
Some Examples
 Faculty ids
 Subjects for courses with students expecting a “C”
 All course numbers for which there exists a smaller
course number

19
Logical Equivalences
 There are two logical equivalences that will be
heavily used:
 p  q  p  q
(Whenever p is true, q must also be true.)
 x. p(x)  x. p(x)
(p is true for all x)
 The second can be a lot easier to check!

 Example:
 The highest course number offered

20
Free and Bound Variables
 A variable v is bound in a predicate p when p is of
the form v… or v…
 A variable occurs free in p if it occurs in a position
where it is not bound by an enclosing  or 
 Examples:
 x is free in x > 2
 x is bound in x. x > y

21
Can Rename Bound Variables Only
 When a variable is bound one can replace it with
some other variable without altering the
meaning of the expression, providing there are
no name clashes
 Example: x. x > 2 is equivalent to y. y > 2

 Otherwise, the variable is defined outside our
“scope”…

22
Safety
 Pitfall in what we have done so far – how do we interpret:
{<sid,name>| <sid,name>  STUDENT}

 Set of all binary tuples that are not students: an infinite set (and
unsafe query)

 A query is safe if no matter how we instantiate the
relations, it always produces a finite answer
 Domain independent: answer is the same regardless of the
domain in which it is evaluated
 Unfortunately, both this definition of safety and domain
independence are semantic conditions, and are undecidable

23
Safety and Termination Guarantees

 There are syntactic conditions that are used to
guarantee “safe” formulas
 The definition is complicated, and we won’t discuss it; you
can find it in Ullman’s Principles of Database and Knowledge-
Base Systems
 The formulas that are expressible in real query languages
based on relational calculus are all “safe”
 Many DB languages include additional features, like
recursion, that must be restricted in certain ways to

24
Mini-Quiz
How do you write:
 Which students have taken more than one course from
the same professor?

25
Translating from RA to DRC
 Core of relational algebra: , , , x, -
 We need to work our way through the structure of
an RA expression, translating each possible form.
 Let TR[e] be the translation of RA expression e into
DRC.

 Relation names: For the RA expression R, the DRC
expression is
{<x1,x2, …, xn>| <x1,x2, …, xn>  R}

26
Selection: TR[ R]
 Suppose we have (e’), where e’ is another RA
expression that translates as:
TR[e’]= {<x1,x2, …, xn>| p}
 Then the translation of c(e’) is
{<x1,x2, …, xn>| p’}
where ’ is obtained from  by replacing each
attribute with the corresponding variable
 Example: TR[#1=#2 #4>2.5R] (if R has arity 4) is
{<x1,x2, x3, x4>|
< x1,x2, x3, x4>  R  x1=x2  x4>2.5}

27
Projection: TR[i1,…,im(e)]
 If TR[e]= {<x1,x2, …, xn>| p} then
TR[i1,i2,…,im(e)]=
{<x i1,x i2, …, x im >|  xj1,xj2, …, xjk.p},
where xj1,xj2, …, xjk are variables in x1,x2, …, xn
that are not in x i1,x i2, …, x im

 Example: With R as before,
#1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R}

28
Union: TR[R1  R2]
 R1 and R2 must have the same arity
 For e1  e2, where e1, e2 are algebra expressions
TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}
 Relabel the variables in the second:
TR[e2]={< x1,…,xn>|q’}
 This may involve relabeling bound variables in q to
avoid clashes
TR[e1e2]={<x1,…,xn>|pq’}.
 Example: TR[R1  R2] = {< x1,x2, x3,x4>|
<x1,x2, x3,x4>R1  <x1,x2, x3,x4>R2

29
Other Binary Operators
 Difference: The same conditions hold as for union
If TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}
Then TR[e1- e2]= {<x1,…,xn>|pq}
 Product:
If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}
Then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq}

 Example:
TR[RS]= {<x1,…,xn, y1,…,ym >|
<x1,…,xn> R  <y1,…,ym > S }

30
Calculus?

 We’ve been looking at the Domain Relational
Calculus
 The Tuple Relational Calculus is nearly the same, but
variables are at the level of a tuple, not an attribute

 {Q | 9 S  COURSES, 9 T 2 Takes (S.cid = T.cid Æ Q.cid

31
Limitations of the
Relational Algebra / Calculus
Can’t do:
 Aggregate operations
 Recursive queries
 Complex (non-tabular) structures

 Most of these are expressible in SQL, OQL, XQuery
– using other special operators
 Sometimes we even need the power of a Turing-
complete programming language

32
Summary
 Can translate relational algebra into relational calculus
 DRC and TRC are slightly different syntaxes but equivalent
 Given syntactic restrictions that guarantee safety of DRC
query, can translate back to relational algebra

 These are the principles behind initial development of
relational databases
 SQL is close to calculus; query plan is close to algebra
 Great example of theory leading to practice!

33

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 9 posted: 12/23/2010 language: English pages: 33