VIEWS: 21 PAGES: 20 POSTED ON: 9/23/2011
The Relational Model: Relational Calculus Fall 2002 CSE330/CIS550 Handout 2 1 Relational Calculus • First-order logic (FOL) can also be thought of as a query language, and can be used in two ways: – Tuple relational calculus – Domain relational calculus • The difference is the level at which variables are used: for attributes (domains) or for tuples. • The calculus is non-procedural (declarative) as compared to the algebra. Fall 2002 CSE330/CIS550 Handout 2 2 Domain relational calculus • Queries have form: {<x1,x2, …, xn>|p} where x1,x2, …, xn are domain variables and p is a predicate which may mention the variables x1,x2, …, xn • Example: simple projection {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes} • Example: selection and projection: {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes G >15} Fall 2002 CSE330/CIS550 Handout 2 3 DRC examples, cont • Join: {<CI,R>|RI,RN,G,H,RI‟,Da,Du. <RI,RN,G,R,H>Routes <CI,RI‟,Da,Du>Climbs RI=RI‟} We could also have written the above as: {<CI,R>|RI,RN,G,H,Da,Du. <RI,RN,G,R,H>Routes <CI,RI,Da,Du>Climbs} Fall 2002 CSE330/CIS550 Handout 2 4 Predicate Logic - a quick review • The syntax of predicate logic starts with variables, constants and predicates that can be built using a collection of boolean-valued operators (boolean expressions) • Examples: 1=2, x y, prime(x), contains(t,”Joe”). Precisely what operations are available depends on the domain and on the query language. • For now we will assume the following boolean expressions: – <X,Y,…> Rel, X op Y, X op constant, or constant op X, where op is , , , , , and X,Y,… are domain variables Fall 2002 CSE330/CIS550 Handout 2 5 Predicate Logic, cont. • Starting with these basic predicates (also called atomic) , we can build up new predicates by the following rules: – Logical connectives: If p and q are predicates, then so are pq, pq, p, and pq • (x>2) (x<4) • (x>2) (x>0) – Existential quantification: If p is a predicate, then so is x.p • x. (x>2) (x<4) – Universal quantification: If p is a predicate, then so is x.p • x.x>2 • x. y.y>x Fall 2002 CSE330/CIS550 Handout 2 6 Logical Equivalences • There are two logical equivalences that will be heavily used: – pq p q (Whenever p is true, q must also be true.) – x. p(x) x. p(x) (p is true for all x) • The second will be especially important when we study SQL. Fall 2002 CSE330/CIS550 Handout 2 7 Free and bound variables • A variable v is bound in a predicate p when p is of the form v… or v… • A variable occurs free in p if it occurs in a position where it is not bound by an enclosing or • Examples: – x is free in x>2 – x is bound in x.x>y – x is free in (x>17) (x.x>2) • Note that there are two occurrences of x in the last example. Fall 2002 CSE330/CIS550 Handout 2 8 Renaming variables • When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes • Example: x.x>2 is equivalent to y.y>2 Fall 2002 CSE330/CIS550 Handout 2 9 Some queries… • Try the following examples: – The names and ages of climbers – The names and ages of climbers who have climbed route 214 – The names of climbers who have climbed “Last Tango” – The names of climbers who have climbed all routes with rating greater than 15 – The names of climbers who have climbed the same route twice Fall 2002 CSE330/CIS550 Handout 2 10 Safety • There is a problem with what we have done so far. How should we treat a query like: {<CI,CN,S,A>| <CI,CN,S,A> Climbers>} • This presumably means the set of all four-tuples that are not climbers, which is presumably an infinite set (and unsafe query). • A query is safe if no matter how we instantiate the relations, it always produces a finite answer. • In particular, the query should be domain independent, meaning that the answer is the same regardless of the domain in which it is evaluated. • Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable. Fall 2002 CSE330/CIS550 Handout 2 11 Syntactic Safety • There are syntactic conditions that are used to define “safe” formulas. In particular: – Every “safe” formula is domain independent. – It is implementable. – The formulas that are expressible in real query languages based on relational calculus are all “safe” • The definition is complicated, and is not in the text book. It can be found in Ullman‟s book on databases (Principles of Database and Knowledge- Base Systems). Fall 2002 CSE330/CIS550 Handout 2 12 Translating from RA to DRC • Recall that the relational algebra consists of , , , x, -. We need to work our way through the structure of an RA expression, translating each possible form. • Let TR[e] be the translation of RA expression e into DRC. • Relation names: For the RA expression R, the DRC expression is {<x1,x2, …, xn>| <x1,x2, …, xn> R} Fall 2002 CSE330/CIS550 Handout 2 13 Selection • Suppose the RA expression is c(e‟), where e‟ is another RA expression with TR[e‟]= {<x1,x2, …, xn>| p} Then the translation of c(e‟) is {<x1,x2, …, xn>| pC‟}, where C‟ is the condition obtained from C by replacing each attribute with the corresponding variable. • Example: TR[#1=#2 #4>2.5R] (where R has arity 4) is {<x1,x2, x3, x4>|< x1,x2, x3, x4> R x1=x2 x4>2.5} Fall 2002 CSE330/CIS550 Handout 2 14 Projection • If TR[e]= {<x1,x2, …, xn>| p} then TR[i1,i2,…,im(e)]= {<x i1,x i2, …, x im >| xj1,xj2, …, xjk.p}, where xj1,xj2, …, xjk are variables in x1,x2, …, xn that are not in x i1,x i2, …, x im • Example: With R as before, #1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R} Fall 2002 CSE330/CIS550 Handout 2 15 Union • We know that R and S in RS must be union compatible, so they must have the same arity. Therefore we can assume that for e1e2, where e1, e2are algebra expressions, TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}. Relabel the variables in the second so that TR[e2]={< x1,…,xn>|q‟}. This may involve relabeling bound variables in q to avoid clashes. Then TR[e1e2]={<x1,…,xn>|pq‟}. • Example: TR[RS]= {< x1,x2, x3,x4>| <x1,x2, x3,x4>R <x1,x2, x3,x4>S Fall 2002 CSE330/CIS550 Handout 2 16 Other binary operators • Difference: The same conditions hold as for union. So TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}. Then TR[e1- e2]= {<x1,…,xn>|pq} • Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}, then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq} • Example: TR[RS]= {<x1,…,xn, y1,…,ym >| <x1,…,xn> R <y1,…,ym > S } Fall 2002 CSE330/CIS550 Handout 2 17 Summary • We‟ve seen how to translate relational algebra into (domain) relational calculus. • There are various syntactic restrictions for guaranteeing the safety of a DRC query. From any of these we can translate back into relational algebra • It was this correspondence between an (implementable and optimizable) algebra and first-order logic that was responsible for the initial development of relational databases – a prime example of some theory leading to highly successful practical developments! Fall 2002 CSE330/CIS550 Handout 2 18 What we cannot compute with relational algebra • Aggregate operations, e.g. “The number of climbers who have climbed „Last Tango‟” or “The average age of climbers.” These are possible in SQL. • Recursive queries. Given a relation Parent(Parent, Child) compute the ancestor relation. This appears to call for an arbitrary number of joins. It is known that it cannot be expressed in first-order logic, hence it cannot be expressed in relational algebra. Fall 2002 CSE330/CIS550 Handout 2 19 What we cannot compute with relational algebra, cont • Computing with complex structures that are not (1NF) relations, e.g. lists, arrays, multisets. • Of course, we can always compute such things if we can “talk to” a database from a full-blown (Turing complete) programming language, and we‟ll see how to do this later. However, communicating with a database in this way may well be inefficient, and adding computational power to a query language remains an important research topic. Fall 2002 CSE330/CIS550 Handout 2 20