VIEWS: 26 PAGES: 77 POSTED ON: 6/30/2010 Public Domain
The Relational Model Relations S.S.N street A relation is a more name city concrete construction, of something we have seen students before, the ER diagram. A relation is (just!) a table! name S.S.N street city We will use table and Lisa 1272 Blaine Riverside relation interchangeably, Bart 5592 Apple Irvine except where there is a Lisa 7552 11th Riverside possibility of confusion. Sue 5555 Main Oceanside The students relation A relation consists of a relational schema and a relational instance. A relation schema is essentially a list of column names with their data types. In this case… students(name : string, S.S.N : string, street : string, city : string) • An relation instance is made up of zero of name S.S.N street city more tuples (rows, Lisa 1272 Blaine Riverside records) Bart 5592 Apple Irvine Lisa 7552 11th Riverside Sue 5555 Main Oceanside A schema specifies a relation’s name. students(name : string, S.S.N : string, street : string, city : string) A schema also specifies the name of each field, and its domain. Fields are often referred to as columns, attributes, dimensions A minor, but important point about relations, they are unordered. name S.S.N street city name S.S.N city street Lisa 1272 Blaine Riverside Lisa 1272 Riverside Blaine Bart 5592 Apple Irvine Bart 5592 Irvine Apple Lisa 7552 11th Riverside Lisa 7552 Riverside 11th Sue 5555 Main Oceanside Sue 5555 Oceanside Main This is not a problem, since we refer to fields by name. However sometimes, we refer to the fields by their column number, in which case the ordering becomes important. I will point this out when we get there. Also, the tuples are unordered too! Note that every tuple in our instance is unique. This is not a coincidence. The definition of relation demands it. Later we will see how we can represent weak entities in relations. name S.S.N street city Lisa 1272 Blaine Riverside Bart 5592 Apple Irvine Lisa 7552 11th Riverside Sue 5592 Main Oceanside The number of fields is called the degree (or arity, or dimensionality of the relation). Below we have a table of degree 4. The number of tuples cardinality of the relation name S.S.N street city Of course, we don’t count the Lisa 1272 Blaine Riverside row that has the labels! Bart 5592 Apple Irvine To the right we have a table Lisa 7552 11th Riverside of cardinality 3. students(name : string, S.S.N : string, street : string, city : string) Note that relations have primary keys, just like ER diagrams. Remember that the primary key might not be one field, it may be a combination of two or more fields. name S.S.N street city Lisa 1272 Blaine Riverside Bart 5592 Apple Irvine Lisa 7552 11th Riverside Sue 5555 Main Oceanside Translating ER diagrams into Relations We need to figure out how to translate ER diagrams into relations. There are only three cases to worry about. • Strong entity sets • Weak entity sets • Relationship sets days Number Name PID Name Course Teaches Professor • Strong entity sets days Number Name PID Name Course Teaches Professor professor(PID : string, name : string) PID name This is trivial, the primary key of the ER diagram becomes 1234 Keogh the primary key of the 3421 Lee relation. All other fields are 2342 Smyth copied in (in any order) 4531 Lee • Weak entity sets days Number Name PID Name Course Teaches Professor course(PID : string, number : string, name : string) PID number name The primary key of the relation consists of the union of the primary 1234 CS12 C++ key of the strong entity set and the 3421 CS11 Java discriminator of the weak entity set. The “imported” key from the strong 2342 CS12 C++ entity set is called the foreign key. 4531 CS15 LISP All other fields are copied in (in any order) • Relationship entity sets days Number Name PID Name Course Teaches Professor teaches(PID : string, days : string ) For one-to-one relationship sets, the relation’s primary key can be that of either entity set. PID days • For many-to-many relationship 1234 mwf sets, the union of the primary keys becomes the relation’s primary key 3421 wed •For the other cases, the the relation’s primary key is taken from 2342 tue the strong entity set. 4531 sat So, this ER Model… days Number Name PID Name Course Teaches Professor … maps to this database schema professor(PID : string, name : string) course(PID : string, number : string, name : string) teaches(PID : string, days : string) We have seen how to create a database schema, how do we create an actual database on our computers? professor(PID : string, name : string) course(PID : string, number : string, name : string) teaches(PID : string, days : string) …how do we create an actual database on our computers? We use SQL, a language that allows us to build, modify and query databases. professor(PID : string, name : string) SQL (Structured Query Language) • SQL is a language that allows us to build, modify and query databases. • SQL is an ANSI standard language. American National Standards Institute • SQL is the “engine” behind Oracle, Sybase, Microsoft SQL Server, Informix,Access, Ingres, etc. • Most of these systems have build GUIs on top of the command line interface, so you don’t normally write statements directly in SQL (although you can). Important Note • In our textbook, the authors introduce SQL at the same time as they introduce the relational model (Chapter 3). • My plan is a little different. I plan to discuss operations on databases (using relational algebra) in a more abstract way, and revisit SQL later in the course. • I encourage you to glance at the SQL material as you read about the relational model in Chapter 3, but don’t worry about the details of SQL just yet. Relational Algebra • Procedural language SQL is closely based • Five basic operators on relational algebra. • selection select • projection project • union (why no intersection?) • set difference difference • Cross product Cartesian product • The are some other operators which are composed of the above operators. These show up so often that we give them special names. • The operators take one or two relations as inputs and give a new relation as a result. Select Operation – Example • Relation r A B C D Intuition: The select operation 1 7 allows us to retrieve some rows 5 7 of a relation (by “some” I mean anywhere from none of them to 12 3 all of them) 23 10 Here I have retrieved all the rows of the relation r where • A=B ^ D > 5 (r) either the value in field A A B C D equals the value in field B, or 1 7 the value in field D is greater lowercase 23 10 than 5. Greek sigma Select Operation • Notation: p(r) lowercase Greek sigma • p is called the selection predicate • Defined as: p(r) = {t | t r and p(t)} Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not) Each term is one of: <attribute> op <attribute> or <constant> where op is one of: =, , >, . <. • Example of selection: name=“Keogh(professor) Project Operation – Example I A B C 10 7 • Relation r: 20 1 Intuition: The project operation allows us to retrieve some 30 1 columns of a relation (by 40 2 “some” I mean anywhere from none of them to all of them) • A,C (r) A C Here I have retrieved columns 7 A and C. 1 Greek capital 1 letter pi 2 Project Operation – Example II A B C 10 1 Intuition: The project • Relation r: 20 1 operation removes 30 1 duplicate rows, since 40 2 relations are sets. • A,C (r) A C A C Here there are two rows 1 1 with A = and C = 1. So 1 = 1 one was discarded. 1 2 2 Project Operation • Notation: A1, A2, …, Ak (r) Greek capital letter pi where A1, A2 are attribute names and r is a relation name. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed • Duplicate rows removed from result, since relations are sets. Union Operation – Example A B A B 1 2 Relations r, s: 2 3 1 s r Intuition: The union operation concatenates two relations, and removes A B duplicate rows (since 1 relations are sets). r s: 2 Here there are two rows 1 with A = and B = 2. So 3 one was discarded. Union Operation • Notation: r s • Defined as: r s = {t | t r or t s} For r s to be valid. 1. r, s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s). Although the field types must be the same, the names can be different. For example I can union professor and lecturer where: professor(PID : string, name : string) lecturer(LID : string, first_name : string) Set Difference Operation – Example A B A B 1 2 Relations r, s: 2 3 1 s r Intuition: The set difference operation r – s: A B returns all the rows that are in r but not in s. 1 1 Set Difference Operation • Notation r – s • Defined as: r – s = {t | t r and t s} • Set differences must be taken between compatible relations. – r and s must have the same arity – attribute domains of r and s must be compatible • Note that in general r – s s – r Cross-Product Operation -Example Relations r, s: A B C D E 1 10 a 10 a 2 20 b 10 b r s Intuition: The cross r x s: product operation A B C D E returns all possible 1 10 a combinations of rows in 1 19 a r with rows in s. 1 20 b 1 10 b 2 10 a In other words the result 2 10 a 2 20 b is every possible pairing 2 10 b of the rows of r and s. Cross-Product Operation-Example Relations r, s: A B C D E 1 10 a 10 a 2 20 b 10 b r s Intuition: The cross r x s: product operation A B C D E returns all possible 1 10 a combinations of rows in 1 19 a r with rows in s. 1 20 b 1 10 b 2 10 a In other words the result 2 10 a 2 20 b is every possible pairing 2 10 b of the rows of r and s. Cross-Product Operation • Notation r x s • Defined as: r x s = {t q | t r and q s} • Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ). • If attributes names of r(R) and s(S) are not disjoint, then renaming must be used. Composition of Operations • We can build expressions using r x s: multiple operations • Example: A= C(r x s) A B C D E 1 10 a A B C D E 1 10 a 1 20 b 1 10 a 1 10 b 10 a 2 10 a 2 20 b 2 10 a 10 b 2 20 b r 2 10 b s A B C D E “take the cross product of r 1 10 a and s, then return only the A=C(r x s) 2 20 a rows where A equals B” 2 20 b Rename Operation • Allows us to name, and therefore to refer to, the results of A B A B relational-algebra expressions. 1 2 Example: 2 3 myRelation (r – s) 1 s r Take the set difference of r and s, A B and call the result myRelation 1 Renaming in relational algebra is 1 essentiality the same as assignment in a programming language myRelation Rename Operation If a relational-algebra expression E A B A B has arity n, then 1 2 x (A1, A2, …, An) (E) 2 3 1 r s returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An. E K 1 Take the set difference of r and s, and call the 1 Example result myRelation, myRelation myRelation(E,K) (r – s) while renaming the first field E and the second field K. Banking Examples branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-only) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number) Note that I have not indicated primary keys here for simplicity. Quick note on notation good_customers bad_customers customer-name loan-number customer-name loan-number Patty 1234 Seymour 3432 Apu 3421 Marge 3467 Selma 2342 Selma 7625 Ned 4531 Abraham 3597 If we have two or more relations which feature the same attribute names, we could confuse them. To prevent this we can use dot notation. For example good_customers.loan-number Example Queries • Find all loans of over $1200 “select from the relation loan, only the rows which have a amount > 1200 (loan) amount greater than 1200” loan-number branch-name amount 1234 Riverside 1,923.03 loan 3421 Irvine 123.00 2342 Dublin 56.25 4531 Prague 120.03 amount > 1200 (loan) 1234 Riverside 1,923.03 Example Queries • Find the loan number for each loan of an amount greater than $1200 loan-number (amount > 1200 (loan)) “select from the relation loan, only the rows which have a amount greater than 1200, then loan-number branch-name amount project out just the 1234 Riverside 1,923.03 loan_number” 3421 Irvine 123.00 loan 2342 Dublin 56.25 4531 Prague 120.03 amount > 1200 (loan) 1234 Riverside 1,923.03 loan-number (amount > 1200 (loan)) 1234 Example Queries • Find all loans greater than $1200 or less than $75 amount > 1000 or amount < 75(loan) “select from the relation loan, only the rows which have a amount greater than 1000 or an amount less loan-number branch-name amount than 75 1234 Riverside 1,923.03 3421 Irvine 123.00 loan 2342 Dublin 56.25 4531 Prague 120.03 1234 Riverside 1,923.03 amount > 1000 or amount < 75(loan) 2342 Dublin 56.25 Example Queries • Find the names of all customers who have a loan, an account, or both, from the bank customer-name (borrower) customer-name (depositor) borrower depositor customer-name loan-number customer-name account-number Patty 1234 Moe 3467 Apu 3421 Apu 2312 Selma 2342 Patty 9999 Ned 4531 Krusty 3423 Moe customer-name (borrower) Apu customer-name (depositor) Patty Patty Moe Apu Krusty Apu Selma Selma Patty Ned Ned Krusty Note this example is Example Queries split over two slides! Find the names of all customers who have a loan at the Riverside branch. customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan))) borrower loan customer-name loan-number loan-number branch-name amount We retrieve Patty 1234 1234 Riverside 1,923.03 borrower and loan… Apu 3421 3421 Irvine 123.00 customer-name borrower.loan loan.loan- branch-name amount -number number Patty 1234 1234 Riverside 1,923.03 …we calculate Patty 1234 3421 Irvine 123.00 their cross product… Apu 3421 1234 Riverside 1,923.03 Apu 3421 3421 Irvine 123.00 customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan))) …we calculate customer-name borrower.loan loan.loan- branch-name amount their cross -number number product… Patty 1234 1234 Riverside 1,923.03 Patty 1234 3421 Irvine 123.00 …we select the Apu 3421 1234 Riverside 1,923.03 rows where borrower.loan- Apu 3421 3421 Irvine 123.00 number is equal to loan.loan-number… customer-name borrower.loan loan.loan- branch-name amount -number number Patty 1234 1234 Riverside 1,923.03 …we select the Apu 3421 3421 Irvine 123.00 rows where branch-name is equal to customer-name borrower.loan loan.loan- branch-name amount -number number “Riverside” Patty 1234 1234 Riverside 1,923.03 …we project out the customer-name. Patty Note this example is Example Queries split over three slides! Find the largest account balance ...we will need to rename account relation as d... balance(account) - account.balance(account.balance < d.balance (account x d (account))) account d account- balance account- balance We do a rename to number number get a “copy” of Apu 100.30 Apu 100.30 account which we Patty 12.34 Patty 12.34 call d… Lenny 45.34 Lenny 45.34 … next we will do a cross product… balance(account) - account.balance(account.balance < d.balance (account x d (account))) account.account- account. d.account- d.balance number balance number Apu 100.30 Apu 100.30 … do a cross Apu 100.30 Patty 12.34 product… Apu 100.30 Lenny 45.34 Patty 12.34 Apu 100.30 Patty 12.34 Patty 12.34 Patty 12.34 Lenny 45.34 Lenny 45.34 Apu 100.30 …select out all rows Lenny 45.34 Patty 12.34 Lenny 45.34 Lenny 45.34 where account.balance is less than account.account- account. d.account- d.balance d.balance… number balance number Patty 12.34 Apu 100.30 Patty 12.34 Lenny 45.34 Lenny 45.34 Apu 100.30 .. next we project… balance(account) - account.balance(account.balance < d.balance (account x d (account))) account.account- account. d.account- d.balance number balance number Patty 12.34 Apu 100.30 .. next we project out Patty 12.34 Lenny 45.34 account.balance… Lenny 45.34 Apu 100.30 account. …then we do a set balance difference between it 12.34 an the original account 12.34 account.balance from account- balance 45.34 the account relation… number Apu 100.30 Patty 12.34 … the set difference Lenny 45.34 leaves us with one number, the largest value! 100.30 Formal Definition • A basic expression in the relational algebra consists of either one of the following: – A relation in the database – A constant relation • Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: – E1 E2 – E1 - E2 – E1 x E2 – p (E1), P is a predicate on attributes in E1 – s(E1), S is a list consisting of some of the attributes in E1 – x (E1), x is the new name for the result of E1 Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries. – Natural join All joins are really – Conditional Join special cases of – Equi join conditional join – Division – Set intersection Natural-Join Operation: Motivation Very often we have a query and the borrower loan answer is not contained in a single cust-name l-number l-number branch relation. For example, I might wish to Patty 1234 1234 Dublin know where Apu banks. Apu 3421 3421 Irvine The classic relational algebra way to do such queries is a cross product, followed cust-name borrower.l-number loan.l-number branch by a selection which tests for equality on some pair of fields. Patty 1234 1234 Dublin Patty 1234 3421 Irvine borrower.l-number = loan.l-number(borrower x Apu 3421 1234 Dublin loan))) Apu 3421 3421 Irvine While this works… • it is unintuitive cust-name borrower.l-number loan.l-number branch • it requires a lot of memory Patty 1234 1234 Dublin • the notation is cumbersome Apu 3421 3421 Irvine Note that is this example the two relations are the same size (2 by 2), this does not have to be the case. So we have a more intuitive way of achieving the same effect, the natural join, denoted by the symbol Natural-Join Operation: Intuition Natural join combines a cross product and a selection into one operation. It performs a selection forcing equality on those attributes that appear in both relation schemes. Duplicates are removed as in all relation operations. So if the relations have one attribute in common, as in the last slide (“l-number”), for example, we have… borrower loan = borrower.l-number = loan.l-number(borrower x loan))) There are two special cases: • If the two relations have no attributes in common, then their natural join is simply their cross product. • If the two relations have more than one attribute in common, then the natural join selects only the rows where all pairs of matching attributes match. (lets see an example on the next slide). l-name f-name age l-name f-name ID A Bouvier Selma 40 B Bouvier Selma 1232 Bouvier Patty 40 Smith Selma 4423 Smith Maggie 2 l-name f-name age l-name f-name ID Both the l-name and the Bouvier Selma 40 Bouvier Selma 1232 f-name match, so select. Bouvier Patty 40 Smith Selma 4423 Only the f-names match, Smith Maggie 2 Bouvier Selma 1232 so don’t select. Bouvier Selma 40 Smith Selma 4423 Bouvier Patty 40 Bouvier Selma 1232 Only the l-names match, Smith Maggie 2 Smith Selma 4423 so don’t select. We remove duplicate l-name f-name age l-name f-name ID attributes… Bouvier Selma 40 Bouvier Selma 1232 The natural join of A and B l-name f-name age ID Note that this is just a way to visualize the natural join, we A B= Bouvier Selma 40 1232 don’t really have to do the cross product as in this example Natural-Join Operation • Notation: r s • Let r and s be relations on schemas R and S respectively.The result is a relation on schema R S which is obtained by considering each pair of tuples tr from r and ts from s. • If tr and ts have the same value on each of the attributes in R S, a tuple t is added to the result, where – t has the same value as tr on r – t has the same value as ts on s • Example: R = (A, B, C, D) S = (E, B, D) • Result schema = (A, B, C, D, E) • r s is defined as: r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s)) Natural Join Operation – Example • Relations r, s: A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s r s How did we get here? A B C D E 1 a Lets do a trace over the 1 a next few slides… 1 a 1 a Warning! Example spread 2 b over many slides, you may wish to edit before printing. A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s First we note which attributes the two relations have in common… A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s A B C D E 1 a 1 a There are two rows in s that match our first row in r, (in the relevant attributes) so both are joined to our first row… A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s A B C D E 1 a 1 a …there are no rows in s that match our second row in r, so do nothing… A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s A B C D E 1 a 1 a …there are no rows in s that match our third row in r, so do nothing… A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s A B C D E 1 a 1 a 1 a 1 a There are two rows in s that match our fourth row in r, so both are joined to our fourth row… A B C D B D E 1 a 1 a 2 a 3 a 4 b 1 a 1 a 2 b 2 b 3 b r s A B C D E 1 a 1 a 1 a 1 a 2 b There is one row that matches our fifth row in r,.. so it is joined to our fifth row and we are done! Conditional-Join Operation: The conditional join is actually the most general type of join. I introduced the natural join first only because it is more intuitive and.. natural! Just like natural join, conditional join combines a cross product and a selection into one operation. However instead of only selecting rows that have equality on those attributes that appear in both relation schemes, we allow selection based on any predicate. r c s = c(r x s) Where c is any predicate the attributes of r and/or s Duplicate rows are removed as always, but duplicate columns are not removed! Conditional-Join Example: We want to find all women that are older than their husbands… l-name f-name marr-Lic age l-name f-name marr-Lic age Simpson Marge 777 35 Simpson Homer 777 36 r Lovejoy Helen 234 38 s Lovejoy Timothy 234 36 Flanders Maude 555 24 Simpson Bart null 9 Krabappel Edna 978 40 r r.age > s.age AND r.Marr-Lic = r.Marr-Lic s r.l-name r.f-name r.Marr-Lic r.age s.l-name s.f-name s.marr-Lic s.age Lovejoy Helen 234 38 Lovejoy Timothy 234 36 Note we have removed ambiguity of attribute names by using “dot” notation Also note the redundant information in the marr-lic attributes Set-Intersection Operation - Example A B Relation r, s: A B 1 2 2 3 1 r s Intuition: The intersection operation returns all the rows that are in both r and s. rs A B 2 Set-Intersection Operation • Notation: r s • Defined as: • r s ={ t | t r and t s } • Assume: – r, s have the same arity – attributes of r and s are compatible • Note: r s = r - (r - s) r /s Division Operation • Suited to queries that include the phrase “for all”. • Let r and s be relations on schemas R and S respectively where – R = (A1, …, Am, B1, …, Bn) – S = (B1, …, Bn) The result of r / s is a relation on schema R – S = (A1, …, Am) r / s = { t | t R-S(r) u s ( tu r ) } Division Operation – Example Relations r, s: A B B 1 1 2 2 3 1 s 1 1 3 4 A 6 1 2 r / s: r occurs in the presence of both 1 and 2, so it is returned. occurs in the presence of both 1 and 2, so it is returned. does not occur in the presence of both 1 and 2, so is ignored. ... Another Division Example Relations r, s: A B C D E D E a a 1 a 1 a a 1 b 1 a b 1 s a a 1 a b 3 a a 1 a b 1 A B C a b 1 r /s: a r a <, a , > occurs in the presence of both <a,1> and <b,1>, so it is returned. < , a , > occurs in the presence of both <a,1> and <b,1>, so it is returned. <, a , > does not occur in the presence of both <a,1> and <b,1>, so it is ignored. Assignment Operation • The assignment operation () provides a convenient way to express complex queries, write query as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as a result of the query. • Assignment must always be made to a temporary relation variable. • Example: Write r s as temp1 R-S (r) temp2 R-S ((temp1 x s) – R-S,S (r)) result = temp1 – temp2 – The result to the right of the is assigned to the relation variable on the left of the . – May use variable in subsequent expressions. Extended Relational-Algebra-Operations • Generalized Projection • Outer Join • Aggregate Functions Generalized Projection • Extends the projection operation by allowing arithmetic functions to be used in the projection list. F1, F2, …, Fn(E) • E is any relational-algebra expression • Each of F1, F2, …, Fn are are arithmetic expressions involving constants and attributes in the schema of E. • Given relation credit-info(customer-name, limit, credit- balance), find how much more each person can spend: customer-name, limit – credit-balance (credit-info) Generalized Projection Given relation credit-info(customer-name, limit, credit- balance), find how much more each person can spend: customer-name, limit – credit-balance (credit-info) customer-name limit credit-balance Simpson, Marge 500 400 Lovejoy, Helen 2000 1500 credit-info Flanders, Maude 0 0 Krabappel, Edna 50 11 100 500 0 39 Aggregate Functions and Operations • Aggregation function takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values • Aggregate operation in relational algebra G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E) – E is any relational-algebra expression – G1, G2 …, Gn is a list of attributes on which to group (can be empty) – Each Fi is an aggregate function (i.e avg, min, max etc) – Each Ai is an attribute name Aggregate Operation – Example A B C • Relation r: 7 7 3 10 sum-C g sum(c) (r) 27 i.e we want to find the sum of all the numbers in attribute C Aggregate Operation – Example Relation account grouped by last-name: last-name account-number balance account Simpson A-102 400 Simpson A-201 900 Flanders A-217 750 i.e calculate the Flanders A-215 750 total balances, Nahasapeemapetilon A-222 11700 grouped by last- name. last-name g sum(balance) (account) Yes yes, I make last-name balance good money, but I was shot 14 Simpson 1300 times last year Flanders 1500 Nahasapeemapetilon 11700 Outer Join • An extension of the join operation that avoids loss of information. • Computes the join and then adds tuples from one relation that does not match tuples in the other relation to the result of the join. • Uses null values: – null signifies that the value is unknown or does not exist – All comparisons involving null are (roughly speaking) false by definition. • Will study precise meaning of comparisons with nulls later Outer Join – Example loan-number branch-name amount • Relation loan L-170 Springfield 3000 L-230 Shelbyville 4000 L-260 Dublin 1700 Relation borrower customer-name loan-number Simpson L-170 Wiggum L-230 Flanders L-155 Outer Join – Example loan-number branch-name amount customer-name L-170 Springfield 3000 Simpson • Inner Join L-230 Shelbyville 4000 Wiggum loan Borrower • Left Outer Join loan borrower loan-number branch-name amount customer-name L-170 Springfield 3000 Simpson L-230 Shelbyville 4000 Wiggum L-260 Dublin 1700 null Outer Join – Example loan-number branch-name amount customer-name L-170 Springfield 3000 Simpson Right Outer Join L-230 Shelbyville 4000 Wiggum L-155 null null Flanders loan borrower Full Outer Join loan-number branch-name amount customer-name L-170 Springfield 3000 Simpson L-230 Shelbyville 4000 Wiggum loan borrower L-260 Dublin 1700 null L-155 null null Flanders Null Values • It is possible for tuples to have a null value, denoted by null, for some of their attributes • null signifies an unknown value or that a value does not exist. • The result of any arithmetic expression involving null is null. • Aggregate functions simply ignore null values – Is an arbitrary decision. Could have returned null as result instead. – We follow the semantics of SQL in its handling of null values • For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same – Alternative: assume each null is different from each other – Both are arbitrary decisions, so we simply follow SQL Null Values • Comparisons with null values return the special truth value unknown – If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5 • Three-valued logic using the truth value unknown: – OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown – AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown – NOT: (not unknown) = unknown – In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown • Result of select predicate is treated as false if it evaluates to unknown