VIEWS: 84 PAGES: 54 POSTED ON: 12/3/2010 Public Domain
ISOM MIS415 Module 2 Query Languages –SQL Arijit Sengupta Structure of this quarter ISOM MIS415 0. Intro 1. Design 2. Querying 3. Applications 4. Advanced Topics Database Conceptual Query Java DB Fundamentals Modeling Transaction Languages Applications – Management JDBC Relational Model Advanced Data SQL Mining Normalization Newbie Users Designers Developers Professionals Today‟s Buzzwords ISOM • Query Languages • Formal Query Languages • Procedural and Declarative Languages • Relational Algebra • Relational Calculus • SQL • Aggregate Functions • Nested Queries Objectives ISOM At the end of the lecture, you should • Get a formal as well as practical perspective on query languages • Have a background on query language basics (how they came about) • Be able to write simple SQL queries from the specification • Be able to look at SQL queries and understand what it is supposed to do • Be able to write complex SQL queries involving nesting • Execute queries on a database system Set Theory Basics ISOM • A set: a collection of distinct items with no particular order • Set description: { b | b is a Database Book} {c | c is a city with a population of over a million} {x | 1 < x < 10 and x is a natural number} • Most basic set operation: Membership: x S (read as x belongs to S if x is in the set S) Other Set Operations ISOM • Addition, deletion (note that adding an existing item in the set does not change it) • Set mathematics: Union R S = { x | x R or x S} Intersection R S = { x | x R and x S} Set Difference R – S = { x | x R and x S} Cross-product R x S = { <x,y> | x R and y S} • You can combine set operations much like arithmetic operations: R – (S T) • Usually no well-defined precedence Relational Query Languages ISOM • Query languages: Allow manipulation and retrieval of data from a database. • Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization. • Query Languages != programming languages! QLs not expected to be “Turing complete”. QLs not intended to be used for complex calculations. QLs support easy, efficient access to large data sets. Formal Relational Query Languages ISOM Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.) Understanding Algebra & Calculus is key to understanding SQL, query processing! Structured Query Language ISOM • Need for SQL Operations on Data Types Definition Manipulation Operations on Sets Declarative (calculus) vs. Procedural (algebra) • Evolution of SQL SEQUEL ..SQL_92 .. SQL_93 SQL Dialects Does SQL treat Relations as „Sets‟? Preliminaries ISOM • A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed (but query will run regardless of instance!) The schema for the result of a given query is also fixed! Determined by definition of query language constructs. • Positional vs. named-field notation: Positional notation easier for formal definitions, named-field notation more readable. Both used in SQL Example Instances ISOM • Students, Registers, Courses relations for our examples. cid cname dept R1 sid cid semester C1 22 101 Fall 99 101 Database CIS 58 103 Spring 99 103 Internet ECI S1 sid sname GPA age S2 sid sname GPA age 28 yuppy 3.9 24.0 22 dustin 3.5 25.0 31 lubber 3.8 25.5 31 lubber 3.8 25.5 44 guppy 3.5 25.5 58 rusty 4.0 23.0 58 rusty 4.0 23.0 Relational Algebra ISOM • Basic operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. • Additional operations: Intersection, join, division, renaming: Not essential, but (very!) useful. • Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) Projection ISOM • Deletes attributes that are not in sname GPA projection list. yuppy 3.9 • Schema of result contains lubber 3.8 exactly the fields in the guppy 3.5 projection list, with the same rusty 4.0 sname, gpa(S 2) names that they had in the (only) input relation. • Projection operator has to age eliminate duplicates! (Why??) 24.0 Note: real systems typically 25.5 don‟t do duplicate elimination 23.0 unless the user explicitly asks for it. (Why not?) age(S 2) Vertical Slices ISOM • Projection Algebra: projection Specifying Elements <A1,A2,...Am> (R) No Specification Conditional List all information about List IDs, names, and addresses of Students all students select * select StudentID, name, address from STUDENT; from STUDENT; (Student) StudentID, name, address (Student) Does SQL treat Relations as „Sets‟? ISOM What are the different salaries we pay to our employees? select salary from EMPLOYEE; OR is the following better? select DISTINCT salary from EMPLOYEE; Selection ISOM sid sname GPA age • Selects rows that satisfy 28 yuppy 3.9 35.0 selection condition. 58 rusty 4.0 35.0 • No duplicates in result! (Why?) gpa3.8(S 2) • Schema of result identical to schema of (only) input sname GPA relation. yuppy 3.9 rusty 4.0 • Result relation can be the input for another relational sname,gpa( (S 2)) algebra operation! gpa3.8 (Operator composition.) Horizontal Slices ISOM • Restriction Algebra: selection Specifying Conditions or restriction (R) Unconditional Conditional List all students List all students with GPA > 3.0 select * select * from STUDENT from STUDENT; where GPA > 3.0; (Student) GPA > 3.0 (Student) Specifying Conditions ISOM List all students in ... select * from STUDENT where city in (‘Boston’,’Atlanta’); List all students in ... select * from STUDENT where zip not between 60115 and 60123; Pattern Matching ISOM ‘%’ any string with n characters, n>=0 ‘_’ any single character. x exact sequence of string x. List all CIS courses. select * from COURSE where course# like ‘CIS%’; List all CIS 3200 level courses. select * from COURSE where course# like ? ; Missing or Incomplete Information ISOM •List all students whose address or telephone number is missing: select * from STUDENT where Address is null or GPA is null; Horizontal and Vertical ISOM Query: List all student ID, names and addresses who have GPA > 3.0 and date of birth before Jan 1, 1980. select StudentID, Name, Address from STUDENT where GPA > 3.0 and DOB < „1-Jan-80‟ order by Name DESC; Algebra: StudentID,name, address ( GPA > 3.0 and DOB < „1-Jan-80‟ (STUDENT)) Calculus: {t.StudentID, t.name, t.address | t Student t.GPA > 3.0 t.DOB < „1-Jan-80‟} Order by sorts result in descending (DESC) order. Note: The default order is ascending (ASC) as in: order by Name; Union, Intersection, Set-Difference ISOM sid sname gpa age 22 dustin 3.5 25.0 • All of these operations take 31 lubber 3.8 25.5 two input relations, which 58 rusty 4.0 23.0 44 guppy 3.5 25.5 must be union-compatible: 28 yuppy 3.9 24.0 Same number of fields. S1 S2 `Corresponding‟ fields sid sname gpa age have the same type. 31 lubber 3.8 25.5 58 rusty 4.0 23.0 • What is the schema of result? S1 S2 sid sname gpa age 22 dustin 3.5 25.0 S1 S2 Union ISOM Liststudents who live in Atlanta or GPA > 3.0 select StudentID, Name, DOB, Address from STUDENT where Address = ‘Atlanta’ union select StudentID, Name, DOB, Address from STUDENT where GPA > 3.0; Can we perform a Union on any two Relations ? Union Compatibility ISOM Two relations, A and B, are union-compatible if A and B contain a same number of attributes, and The corresponding attributes of the two have the same domains Examples CIS=Student (ID: Did; Name: Dname; Address: Daddr; Grade: Dgrade); Senior-Student (SName: Dname; S#: Did; Home: Daddr; Grade: Dgrade); Course (C#: Dnumber; Title: Dstr; Credits: Dnumber) Are CIS-Student and Senior-Student union compatible? Are CIS-Student and Course union compatible? What happens if we have duplicate tuples? What will be the column names in the resulting Relation? Union, Intersect, Minus ISOM select CUSTNAME, ZIP select CUSTNAME, ZIP from CUSTOMER from CUSTOMER where STATE = ‘MA’ where STATE = ‘MA’ UNION INTERSECT select SUPNAME, ZIP select SUPNAME, ZIP from SUPPLIER from SUPPLIER where STATE = ‘MA’ where STATE = ‘MA’ ORDER BY 2; ORDER BY 2; select CUSTNAME, ZIP B B from CUSTOMER A where STATE = ‘MA’ A MINUS select SUPNAME, ZIP B from SUPPLIER where STATE = ‘MA’ A ORDER BY 2; Cross-Product ISOM • Each row of S1 is paired with each row of R1. • Result schema has one field per field of S1 and R1, with field names `inherited‟ if possible. Conflict: Both S1 and R1 have a field called sid. (sid) sname GPA Age (sid) cid semester 22 dustin 3.5 25.0 22 101 Fall 99 22 dustin 3.5 25.0 58 103 Spring 99 31 lubber 3.8 25.5 22 101 Fall 99 31 lubber 3.8 25.5 58 103 Spring 99 58 rusty 4.0 23.0 22 101 Fall 99 58 rusty 4.0 23.0 58 103 Spring 99 Renaming operator: (C(1 sid1, 5 sid 2), S1 R1) Joins ISOM • Condition Join: R c S c ( R S) (sid) sname GPA age (sid) cid Semester 22 dustin 3.5 25.0 58 103 Spring 99 31 lubber 3.8 25.5 58 103 Spring 99 S1 R1 S1. sid R1. sid • Result schema same as that of cross- product. • Fewer tuples than cross-product, might be able to compute more efficiently • Sometimes called a theta-join. Joins ISOM • Equi-Join: A special case of condition join where the condition c contains only equalities. sid sname GPA age cid semester 22 dustin 3.5 25.0 101 Fall 99 58 rusty 4.0 23.0 103 Spring 99 S1 R1 sid • Result schema similar to cross-product, but only one copy of fields for which equality is specified. • Natural Join: Equijoin on all common fields. Find names of students who have taken course #103 ISOM Solution 1: sname(( Re gisters) Students) cid 103 Solution 2: (Temp1, Re gisters) cid 103 (Temp2,Temp1 Students) sname (Temp2) Solution 3: sname( (Re gisters Students)) cid 103 Connecting/Linking Relations ISOM List information about all students and the classes they are taking Student ID Name *** s1 Jose *** s2 Alice *** Class s3 Tome *** *** *** *** Emp# ID C# *** e1 s1 BA 201 *** e3 s2 CIS 300 *** e2 s3 CIS 304 *** *** *** *** What can we use to connect/link Relations? Join: Connecting relations so that relevant tuples can be retrieved. Join ISOM R1 R2 Cartesian Product Student: 30 tuples Class: 4 tuples Total Number of Tuples in the Cartesian Product. ? (match each tuple of student to every tuple of class) Select tuples having identical Student Ids. Expected number of such Tuples: Join Selectivity R1 R2 Join Forms ISOM • General Join Forms select s.*, c.* Equijoin from STUDENT s, CLASS c where s.StudentID = c. SID; Operator Dependent • Natural Join R1 R2 • Outer Join = Left x > y Right <> Full ... select s.*, c.* from STUDENT s, CLASS c where s.StudentID = c.SID (+); Find names of students who have taken a CIS course ISOM • Information about departments only available in Courses; so need an extra join: sname(( Courses) Re gisters Students) dept 'CIS ' A more efficient solution: sname( (( Courses) Re gis) Students) sid cid dept 'CIS ' A query optimizer can find this given the first solution! Find students who have taken an MIS or a CS course ISOM • Can identify all MIS or CS courses, then find students who have taken one of these courses: (Temp1,( Courses)) dept 'MIS 'dept 'CS ' sname(Temp1 Re gis Students) Can also define Temp1 using union! (How?) What happens if is replaced by in this query? Find students who have taken a CIS and an ECI Course ISOM • Previous approach won‟t work! Must identify students who have taken CIS courses, students who have taken ECI courses, then find the intersection (note that sid is a key for Students): (Temp1, (( Courses) Re gis)) sid dept 'CIS ' (Temp2, (( Courses) Re gis)) sid dept 'ECI ' sname((Temp1Temp2) Students) Relational Calculus ISOM • Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC). • Calculus has variables, constants, comparison ops, logical connectives and quantifiers. TRC: Variables range over (i.e., get bound to) tuples. DRC: Variables range over domain elements (= field values). Both TRC and DRC are simple subsets of first-order logic. • Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true. Find students with GPA > 3.7 who have taken a CIS Course ISOM TRC: t |tStudents t.GPA 3.7 r rRe gisr.sid t.sid c cCoursesc.cid r.cid c.dept 'CIS ' DRC: I , N ,G, A | I , N ,G, A Students G 3.7 Ir,Cr,S Ir,Cr,S Re gisIr I C,CN , D C,CN , D Courses C Cr D 'CIS ' Find students who have taken all CIS courses ISOM TRC: t|tStudents c cCourses ^c.dept 'CIS ' r rRe gisr.sid t.sid r.cid c.cid DRC: I , N ,G, A | I , N ,G, A Students C,CN ,D C,CN , D Courses ^ D 'CIS ' Ir,Cr,S Ir,Cr,S Re gisI Ir Cr C How will you do this with Relational Algebra? Monotonic and Non-Monotonic Queries ISOM • Monotonic queries: queries for which the size of the results either increase or stay the same as the size of the inputs increase. The result size never decreases • Non-monotonic queries: queries for which it is possible that the size of the result will DECREASE when the size of the input increases • Examples of each? • Which of the algebra operations is non-monotonic? • What does this signify? Summaries and Aggregates ISOM Calculate the average GPA select avg. (GPA) from STUDENT, Find the lowest GPA select min (GPA) as minGPA from STUDENT, How many CIS majors? select count (StudentId) from STUDENT where major=‘CIS’; Discarding duplicates select avg (distinct GPA) STUDENT where major=‘CIS’ (is this above query correct?) Aggregate Functions ISOM COUNT (attr) - a simple count of values in attr SUM (attr) - sum of values in attr AVG (attr) - average of values in attr MAX (attr) - maximum value in attr MIN (attr) - minimum value in attr Take effect after all the data is retrieved from the database Applied to either the entire resulting relation or groups Can’t be involved in any query qualifications (where clause) Would the following query be permitted? select StudentId from STUDENT where GPA = max (GPA); Grouping Results Obtained ISOM Show all students enrolled in each course. select cno, StudentID from REGISTRATION group by cno; Is this grouping OK? Calculate the average GPA of students by county. select county, avg (GPA) as CountyGPA from STUDENT group by county; Calculate the enrollment of each class. select cno, year , term, count (StudentID) as enroll from REGISTRATION group by cno, year, term; Selections on Groups ISOM Show all CIS courses that are full. select cno, count (StudentID) from REGISTRATION group by cno having count (StudentID) > 29; Grouping Results after Join ISOM Calculate the average GPA of each class select course#, avg (GPA) from STUDENT S, CLASS C where S.StudentID = C.SID group by course#, Nesting Queries ISOM SELECT attribute(s) FROM relation(S) WHERE attr [not] {in | comparison operator | exists } ( query statement(s) ); List names of students who are taking “BA201” select Name from Student where StudentID in ( select StudentID from REGISTRATION where course#=„BA201‟); Sub Queries ISOM List all students enrolled in CIS courses select name from STUDENT where StudentId in (select StudentId from REGISTRATION where cno like ‘CIS%’); List all courses taken by Student (Id 1011) select cname from COURSE where cnum = any (select cno from REGISTRATION where StudentId = 1011); Sub Queries ISOM Who received the highest grade in CIS 8140 select StudentId from REGISTRATION where cnum = ‘CIS 8140’ and grade >=all (select grade from REGISTRATION where cno = ‘CIS 8140’); List all students enrolled in CIS courses. select name from STUDENT S where exists (select * from REGISTRATION where StudentId = S.StudentId and cno like ‘CIS%’); Relational Views ISOM • Relations derived from other relations. • Views have no stored tuples. • Are useful to provide multiple user views. View 1 View 2 View N Base Base Relation 1 Relation 2 •What level in the three layer model do views belong? •Which kind of independence do they support? View Creation ISOM Create View view-name [ ( attr [ , attr ] ...) ] AS subquery [ with check option ] ; DROP VIEW view-name; Create a view containing the student ID, Name, Age and GPA for those who are qualified to take 300 level courses, i.e., GPA >=2.0. View Options ISOM • With Check Option enforces the query condition for insertion or update To enforce the GPA >=2.0 condition on all new student tuples inserted into the view • A view may be derived from multiple base relations Create a view that includes student IDs, student names and their instructors‟ names for all CIS 300 students. View Retrieval ISOM Queries on views are the same as that on base relations. Queries on views are expanded into queries on their base relations. select Name, Instructor-Name from CIS300-Student where Name = Instructor-Name; View: Update ISOM Update on a view actually changes its base relation(s)! update Qualified-Student set GPA = GPA-0.1 where StudentID = „s3‟; insert into Qualified-Student values ( „s9‟, „Lisa‟, 4.0 ) insert into Qualified-Student values ( „s10‟, „Peter‟, 1.7 ) Why are some views not updateable? What type of views are updateable? Non-monotonic queries – again! ISOM • Need to use either MINUS or NOT EXISTS! • Find courses where no student has gpa over 3.5 • Find students who have taken all courses that Joe has taken • How would you solve these? Summary ISOM • SQL is a low-complexity, declarative query language • The good thing about being declarative is that internally the query can be changed automatically for optimization • Good thing about being low-complexity? No SQL query ever goes into an infinite loop No SQL query will ever take indefinite amount of space to get the solution • Can be used for highly complex problems!