Relational Algebra What is the (core) relational algebra?
How can we write queries using the relational algebra?
Floris Geerts How powerful is the relational algebra?
14 Oct, 2009
Relational query languages Formal relational query languages
Query languages allow the manipulation and retrieval of data Relational Algebra
from a database. Simple “operational” model, useful for expressing execution plans.
The relational model supports simple, powerful query
languages: Relational Calculus
strong formal foundation; and
Logical model (declarative), useful for theoretical results.
allows for much (provably correct) optimisation.
Both languages were introduced by Codd in a series of papers.
NOTE: Query languages are not (necessarily) programming They have equivalent expressive power.
languages. They are the key to understanding SQL query processing!
Preliminaries Relational algebra
A query is applied to relation instances, and the result of a Selection (σ): Selects a subset of rows from relation.
query is also a relation instance. Projection (π): Deletes unwanted columns from relation.
Cross-product (×): Allows us to combine two relations.
input query output
Set-diﬀerence (−): Allows us to subtract relations.
instance instance Union (∪): Allows us to union relations.
Renaming (ρ): Allows to rename relation and ﬁeld names.
For a given query, the schema of input relations are ﬁxed. Intersection, join, division,
Not essential, but (very!) useful (especially join).
The query will then execute over any valid instance.
The schema of the result can also be determined (and is ﬁxed Closure
for the given query). Since each operation returns a relation, operations can be
composed! (One says that the algebra is closed.)
Projection Projection – continued
Choose a set of ﬁeld names A and a table R
πA (R) extracts the columns in A from the table. Suppose the result of a projection has a repeated value, how
do we treat it?
Example, given Munros =
MId MName Lat Long Height Rating πRating (Munros) is Rating or Rating ?
1 The Saddle 57.167 5.384 1010 4 4 4
2 Ladhar Bheinn 57.067 5.750 1020 4 4 2.5
3 Schiehallion 56.667 4.098 1083 2.5 2.5 1.5
4 Ben Nevis 56.780 5.002 1343 1.5
πMId,Rating (Munros) is
In “pure” relational algebra the answer is always a set (recall
MId Rating that we deﬁned a relation instance as a set).
2 4 However, SQL and some other languages return a multiset for
3 2.5 some operations from which duplicates may be eliminated by
a further operation. (Why? Eliminating duplicates is
expensive in practice).
Provides the user with a view by hiding some attributes.
Selection Selection - continued
Chooses tuples that satisfy some condition. Conditions are built up from:
Selection σC (R) takes a table R and extracts those rows from Comparisons on attributes: R.A = R.A , R.A = R.A
it that satisfy the condition C . Comparisons on values. E.g., Height > 1000, MName =
For example, "Ben Nevis".
σHeight > 1050 (Munros) = Predicates constructed from these using ∨ (or), ∧ (and), ¬
E.g. (Lat > 57 ∧ Height > 1000) ∨ (Height=Lat) .
MId MName Lat Long Height Rating
3 Schiehallion 56.667 4.098 1083 2.5
4 Ben Nevis 56.780 5.002 1343 1.5
A selection provides the user with a view of data by hiding tuples
What can go into a condition C ? that do not satisfy the condition the user wants.
Combining selection and projection Combining selection and projection
Find all names and age of climbers of age > 30.
Relational algebra query Are Q1 and Q2 the same?
Q1 = πHName,Age (σAge>30 (Hikers)) They are semantically, as they produce the same result.
But they diﬀer in terms of eﬃciency:
Q1 scans Hikers, selects some tuples, and the only scans
An equivalent relational algebra query selected tuples.
Q2 scans Hikers, projects out two attributes and then scans
Q2 = σAge>30 (πHName,Age (Hikers)) the result again.
Q1 is likely to be more eﬃcient than Q2 .
Procedural languages can be optimized....
The same declarative query can be translated into more than one
Set operations – union Set operations – set diﬀerence
If two tables have the same structure, we can perform set
Same structure means union-compatible:
Same number of ﬁelds; and We can also take the diﬀerence of two union-compatible
Corresponding ﬁelds (taken from left to right) have the same
Example: Hikers − Climbers = HId HName Skill Age
123 Edmund EXP 80
Hikers = HId
Climbers = HId
25 313 Bridget EXP 33
898 Jane MED 39 212 James MED 27
212 James MED 27
Again, output schema is that of the ﬁrst relation.
Hikers ∪ Climbers = HId HName Skill Age
123 Edmund EXP 80
214 Arnold BEG 25
313 Bridget EXP 33
212 James MED 27
898 Jane MED 39
Output schema is that of the ﬁrst relation (Hikers in the
Set operations – intersection Cross product (Cartesian product)
The basic operation is the Cartesian product, R × S, which
It turns out we can implement intersection in terms of other concatenates every tuple in R with every tuple in S.
R ∩ S = R − (R − S)
A B C D
a1 b1 c1 d1
Although it is mathematically nice to have fewer operators, A B a1 b1 c2 d2
this may not be an eﬃcient way to implement intersection. a1 b1 × = a1 b1 c3 d3
a2 b2 a2 b2 c1 d1
Intersection is also a special case of a join, which we’ll shortly c3 d3
a2 b2 c2 d2
discuss. a2 b2 c3 d3
Cartesian product – continued Cartesian product – continued
What happens when we form a product of two tables with
columns with the same name?
If R1 has n tuples and R2 has m tuples then R1 × R2 has
Recall the schemas: Hikers(HId, HName, Skill, Age)
n × m tuples.
and Climbs(HId, MId, Date,Time). What is the schema
of Hikers × Climbs? This is an expensive operation: if R1 and R2 have both 1 000
Various possibilities including: tuples (small relation) then R1 × R2 has 1 000 000 tuples
Forget the conﬂicting name (as in R&G) ((Hid), (large relation).
HName,Skill, Age, (HId), MId, Date, Time). Allow
positional references (by number) to columns. Query processors try to avoid building products - instead they
Label the conﬂicting colums with 1,2... (HId.1, attempt to build only subsets which contain relevant
HName,Skill, Age, HId.2, MId, Date, Time). information.
Neither of these is satisfactory. The product operation is no
longer commutative (a property that is useful in optimization.)
Renaming Natural join
To avoid confusion about attribute names, one can use the For obvious reasons of eﬃciency we rarely use unconstrained
renaming operator ρ: cross products in practice.
A natural join ( ) produces the set of all merges of tuples
ρ(C (1 → sid1, 5 → sid2), Hikers × Climbs)
that agree on their commonly named ﬁelds.
This operator Example:
names result relation C ; and HId MId Date Time HId HName Skill Age
123 1 10/10/88 5
explicitly names ﬁelds on positions 1 and 5 into sid1 and sid2. 123 3 11/08/87 2.5 123
313 1 12/08/89 4 313 Bridget EXP 33
In general, 214
5 212 James MED 27
HId MId Date Time HName Skill Age
123 1 10/10/88 5 Edmund EXP 80
ρ(R(oldname → newname, . . . , position → newname, E ), 123
214 2 08/07/92 7 Arnold BEG 25
313 2 06/07/94 5 Bridget EXP 33
Where E is a relational algebra expression.
Natural Join – cont. Conditional Join
Natural join has interesting relationships with other
operations. What is R S when
R=S Extension of natural join in which a join condition is speciﬁed:
R and S have no column names in common R C S for σC (R C S)
R and S have all column names in common, i.e., they are Special case in which join condition consists of equality
conditions is called the equijoin.
Natural join has nice properties (assuming ﬁelds are identiﬁed A natural join is an equijoin in which equalities are speciﬁed
by names): on all common ﬁelds.
Commutative: R S =S R
Associative: R (S T ) = (R S) T
Hence we can always simply write R1 R2 ··· Rk .
Interaction of the relational algebra operators Division
Suppose we have two tables with schemas R(A, B) and S(B).
R/S is deﬁned to be the set of A values in R which are paired
πA (R ∪ S) = πA (R) ∪ πA (S) (in R) with all B values in S.
σC (R ∪ S) = σC (R) ∪ σC (S) That is the set of all x for which πB (S) ⊆ πB (σA=x (R)).
(R ∪ S) T =R T ∪S T
A/B = πA R − πA (πA (R) πB (S) − R)
T (R ∪ S) = T R ∪T S.
The general deﬁnition of division extends this idea to more
than one attribute.
Examples Examples – cont
The highest Munro(s)
This is more tricky. We ﬁrst ﬁnd the peaks (their MIds) that
The names of people who have climbed The Saddle. are lower than some other peak. LowerIds =
πMId (σHeight<Height’ (Munros πHeight’ (
πHName (σMName="The Saddle" (Munros Hikers Climbs))
ρ(Height → Height’, Munros))))
(we could have used × instead of here)
Note the optimization to: Now we ﬁnd the MIds of peaks that are not in this set (they
must be the peaks with maximum height)
πHName (σMName="The Saddle" (Munros) Hikers Climbs)
MaxIds = πMId (Munros) − LowerIds
In what order would you perform the joins?
Finally we get the names:
πMName (MaxIds Munros)
Examples – cont What we cannot compute with relational algebra
The names of hikers who have climbed all Munros
We start by ﬁnding the set of HId,MId pairs for which the Aggregate operations. E.g. “The number of hikers who have
hiker has not climbed that peak. climbed Schiehallion” or “The average age of hikers”. These
We do this by subtracting part of the Climbs table from the are possible in SQL which has numerous extensions to the
set of all HId,MId pairs. NotClimbed= relational algebra.
Recursive queries. Given a table Parent(Parent, Child)
πHId (Hikers) πMId (Munros) − πHId,MId (Climbs) compute the Ancestor table. This appears to call for an
arbitrary number of joins.
(we could have used × instead of here) Non-relational data. For example, lists, arrays, multisets
The HIds in this table identify the hikers who have not climed (bags); or relations that are nested. These are ruled out by
some peak. By subtraction we get the HIds of hikers who the relational data model, but they are important and are the
have climbed all peaks: province of object-oriented databases and
“complex-object”/XML query languages.
ClimbedAll = πHId (Hikers) − πHId (NotClimbed)
Of course, we can always compute such things if we can talk
A join gets us the desired information: to a database from a full-blown (Turing complete)
πHName (Hikers ClimbedAll)
Relational calculus Relational algebra: Summary
Relational algebra: the 6 basic operations.
Declarative way of writing queries; Using labels vs. positions.
Query rewriting for optimization.
Ignorant of how things are computed;
Practice with relational algebra.
Equivalent to relational algebra: Every query that can be
expressed in the relational algebra can be expressed in the Limitations of relational algebra.
calculus, and vice versa.
Chapter 4 (except 4.2.5 about division and 4.3 about relational