Outline
Conceptual Design: ER model Logical Design: ER to relational model Querying and manipulating data
Practical language: SQL
• Declarative: say “what you want”, not “how to get it”
Relational Algebra
Yanlei Diao UMass Amherst 9/22/09
Formal language: Relational Algebra
• Mathematical foundation for query processing
Slides Courtesy of R. Ramakrishnan and J. Gehrke
1
2
What is an “Algebra”?
What is “Relational Algebra”
A mathematical system consisting of:
Operands: variables or values from which new values can be constructed. Operators: symbols denoting procedures that construct new values from given values.
Relational Algebra:
Operands are relations. Each operator takes 1 or 2 relations and produces a relation.
Closure property: relational algebra is closed under the relational model.
Relational operators can be arbitrarily composed!
3
4
Relational Algebra
Example Instances
Sailors
Basic operations: Selection ( σ ) Selects a subset of rows from a relation.
Projection ( π ) Deletes unwanted columns from a relation. Cross-product ( × ) Allows us to combine two relations. Set-difference ( − ) Tuples in reln. 1, but not in reln. 2. Union ( ∪ ) Tuples in reln. 1 and in reln. 2.
S1 sid 22 31 58 S2 sid
sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
Reserves
R1 sid
Additional operations:
22 58
bid day 101 10/10/96 103 11/12/96
Join ( ), Intersection ( ∩ ), Division ( / ), Renaming ( ρ ) Can be derived from basic operators. Not essential, but useful for writing simpler queries.
5
28 31 44 58
6
Projection
Projection (contd.)
S2
Retain only attributes in the projection list; delete others. Schema of result contains exactly the fields in projection list.
S2 sid
28 31 44 58
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
sname yuppy lubber guppy rusty
rating 9 8 5 10
sid 28 31 44 58
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
age 35.0 55.5
! age(S2)
Projection operator has to eliminate duplicates!
! sname,rating(S2)
7
Real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)
8
Selection
S2
Selection (contd.)
sid 28 31 44 58 sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
S2
Select rows that satisfy the selection condition; discard others. Schema of result identical to schema of input.
sid 28 31 44 58
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
sid sname rating age 28 yuppy 9 35.0 58 rusty 10 35.0
"
(rating > 5 # rating < 5) $ age < 50
(S2)
! rating >8(S2)
9 10
!
A Simple Example of Composition
Union, Intersection, Set-Difference
Composition: result relation of an operator can be the input to another operator.
Set operations: Union ( ∪ ) Intersection ( ∩ )
Set difference ( − )
sid 28 58
sname yuppy rusty
rating 9 10
age 35.0 35.0
! rating >8(S2)
sname rating yuppy 9 rusty 10
Take two input relations, which must be union-compatible:
S1 sid 22 31 58 S2 sid 28 31 44 58
sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0
Same number of fields. Corresponding fields have the the same type.
What is the schema of result?
! sname,rating(" rating >8(S2))
11
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
12
Example Set Operations
S1 sid 22 31 58 S2 sid 28 31 44 58
Example Set Operations
7 8 10 5 9 45.0 55.5 35.0 35.0 35.0
S1 sid 22 31 58 S2 sid 28 31 44 58
13
sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0
sid sname rating age 22 31 58 44 28 dustin lubber rusty guppy yuppy
sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0
sid sname 22 dustin
rating age 7 45.0
S1! S2
sid sname rating age 31 lubber 8 55.5 58 rusty 10 35.0
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
S1! S2
Duplicate elimination: remove tuples that have same values in all attributes.
sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0
S1! S2
Duplicates?
14
Cross (Cartesian) Product
Cross-Product (contd.)
(sid) sname rating age 22 22 31 31 58 58
S1 × R1: Each row of S1 is paired with each row of R1.
22 31 58 dustin lubber rusty 7 8 10 45.0 55.5 35.0
7 7 8 8 10 10
(sid) bid day 22 58 22 58 22 58 101 10/10/96 103 11/12/96 101 10/10/96 103 11/12/96 101 10/10/96 103 11/12/96
dustin dustin lubber lubber rusty rusty
7 7 8 8 10 10
45.0 45.0 55.5 55.5 35.0 35.0
S1 sid sname rating age
R1
sid 22 58
bid 101 103
day 10/10/96 11/12/96
S1 × R1
(sid) sname rating age 22 22 31 31 58 58 dustin dustin lubber lubber rusty rusty 45.0 45.0 55.5 55.5 35.0 35.0
(sid) bid day 22 58 22 58 22 58 101 10/10/96 103 11/12/96 101 10/10/96 103 11/12/96 101 10/10/96 103 11/12/96
15
Result schema inherits all fields of S1 and R1.
Conflict: Both S1 and R1 have a field called sid.
•
Renaming operator: ! (C(1" sid1, 5 " sid 2), S1# R1) What if we want to find pairs s.t. S1.sid = S2.sid?
16
Join
Join
R> S = # (R$S) < " "
(sid) bid 58 103 58 103 day 11/12/96 11/12/96
Condition (theta) Join:
Equi-Join: A special case of condition join where the condition θ contains only equalities.
(sid) 22 31
sname rating age ! dustin 7 45.0 lubber 8 55.5
S1 > <
S1. sid < R1. sid
R1
sid 22 58
sname dustin rusty
rating age 7 45.0 10 35.0
bid 101 103
day 10/10/96 11/12/96
S1> <
sid
R1
Result schema same as that of cross-product. But often fewer tuples, more efficient for computation.
17
Result schema contains only one copy of fields for which equality is specified.
Natural Join ( R S ): equi-join on all common fields.
18
Example Schema
Find names of sailors who’ve reserved boat #103
Sailors(sid: integer, sname: string, rating: integer, age: integer) Boats(bid: integer, color: string) Reserves(sid: integer, bid: integer, day: date)
Sailors(sid: integer, sname: string, rating: integer, age: integer) Boats(bid: integer, color: string) Reserves(sid: integer, bid: integer, day: date) How many relations do we need to access? If more than 1, use cross-products or joins? Filtering condition? Use selection Attributes to return? Use projection
19
20
Find names of sailors who’ve reserved boat #103
Find names of sailors who’ve reserved a red boat
Solution 1:
! sname(("
bid =103
Reserves) > Sailors) <
Re serves)
Solution 2:
! (Temp1, "
bid = 103
Sailors(sid: integer, sname: string, rating: integer, age: integer) Boats(bid: integer, color: string) Reserves(sid: integer, bid: integer, day: date)
! ( Temp2, Temp1 > Sailors) <
! sname (Temp2)
Boat color is only available in Boats; so need an extra join:
Solution 3:
! sname ("
bid =103
(Re serves > Sailors)) <
21
! sname ((" Boats) > Re serves > Sailors) < < color =' red '
22
Algebraic equivalence!
Find names of sailors who’ve reserved a red or a green boat
Find names of sailors who’ve reserved a red and a green boat
Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
A single selection won’t work. Why? Instead, intersect sailors who’ve reserved red boats and sailors who’ve reserved green boats. sid is a key for Sailors
! (Tempboats, ("
color =' red ' # color =' green '
Boats))
! sname(Tempboats > Re serves > Sailors) < <
! (Tempred, "
sid
((#
color =' red '
Boats) > Re serves)) <
! (Tempgreen, "
Can also define Tempboats using union. How?
sid
((#
color =' green'
Boats) > Re serves)) <
! sname((Tempred " Tempgreen) > Sailors) <
Can we project onto name and then do the intersection?
23 24