# SQL by wuyunyi

VIEWS: 84 PAGES: 54

• pg 1
```									ISOM

MIS415 Module 2
Query Languages –SQL

Arijit Sengupta
Structure of this quarter
ISOM

MIS415

0. Intro           1. Design        2. Querying    3. Applications     4. Advanced
Topics
Database          Conceptual
Query          Java DB
Fundamentals         Modeling                                           Transaction
Languages     Applications –      Management
JDBC
Relational
Data
SQL
Mining
Normalization

Newbie          Users   Designers          Developers                    Professionals
Today‟s Buzzwords
ISOM

•   Query Languages
•   Formal Query Languages
•   Procedural and Declarative Languages
•   Relational Algebra
•   Relational Calculus
•   SQL
•   Aggregate Functions
•   Nested Queries
Objectives
ISOM

At the end of the lecture, you should
• Get a formal as well as practical perspective on query
languages
• Have a background on query language basics (how they
• Be able to write simple SQL queries from the specification
• Be able to look at SQL queries and understand what it is
supposed to do
• Be able to write complex SQL queries involving nesting
• Execute queries on a database system
Set Theory Basics
ISOM

• A set: a collection of distinct items with
no particular order
• Set description:
 { b | b is a Database Book}
 {c | c is a city with a population of over a
million}
 {x | 1 < x < 10 and x is a natural number}
• Most basic set operation:
 Membership: x  S (read as x belongs to S if
x is in the set S)
Other Set Operations
ISOM

• Addition, deletion (note that adding an existing
item in the set does not change it)
• Set mathematics:
 Union R  S = { x | x  R or x  S}
 Intersection R  S = { x | x R and x  S}
 Set Difference R – S = { x | x  R and x  S}
 Cross-product R x S = { <x,y> | x  R and y  S}
• You can combine set operations much like
arithmetic operations: R – (S  T)
• Usually no well-defined precedence
Relational Query Languages
ISOM

• Query languages: Allow manipulation and retrieval
of data from a database.
• Relational model supports simple, powerful QLs:
   Strong formal foundation based on logic.
   Allows for much optimization.
• Query Languages != programming languages!
   QLs not expected to be “Turing complete”.
   QLs not intended to be used for complex calculations.
   QLs support easy, efficient access to large data sets.
Formal Relational Query
Languages
ISOM

Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for
implementation:
Relational Algebra: More operational, very
useful for representing execution plans.
Relational Calculus: Lets users describe
what they want, rather than how to compute
it. (Non-operational, declarative.)

 Understanding Algebra & Calculus is key to
understanding SQL, query processing!
Structured Query Language
ISOM

• Need for SQL
 Operations on Data Types
 Definition Manipulation
 Operations on Sets
 Declarative (calculus) vs. Procedural (algebra)
• Evolution of SQL
 SEQUEL ..SQL_92 .. SQL_93
 SQL Dialects

Does SQL treat Relations as „Sets‟?
Preliminaries
ISOM

• A query is applied to relation instances, and
the result of a query is also a relation
instance.
   Schemas of input relations for a query are fixed
(but query will run regardless of instance!)
   The schema for the result of a given query is also
fixed! Determined by definition of query language
constructs.
• Positional vs. named-field notation:
   Positional notation easier for formal definitions,
named-field notation more readable.
   Both used in SQL
Example Instances
ISOM

• Students, Registers, Courses relations for our examples.

cid cname    dept
R1    sid cid   semester        C1
22 101    Fall 99               101 Database CIS
58 103    Spring 99             103 Internet ECI

S1   sid   sname GPA    age         S2   sid   sname GPA    age
28    yuppy  3.9   24.0
22    dustin 3.5   25.0             31    lubber 3.8   25.5
31    lubber 3.8   25.5             44    guppy  3.5   25.5
58    rusty  4.0   23.0             58    rusty  4.0   23.0
Relational Algebra
ISOM

• Basic operations:

Selection ( ) Selects a subset of rows from relation.


   Projection ( ) Deletes unwanted columns from relation.


   Cross-product ( ) Allows us to combine two relations.
   Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
   Union (  ) Tuples in reln. 1 and in reln. 2.
   Intersection, join, division, renaming: Not essential, but
(very!) useful.
• Since each operation returns a relation, operations
can be composed! (Algebra is “closed”.)
Projection
ISOM

• Deletes attributes that are not in
sname     GPA
projection list.
yuppy     3.9
• Schema of result contains
lubber    3.8
exactly the fields in the               guppy     3.5
projection list, with the same          rusty     4.0
 sname, gpa(S 2)
names that they had in the
(only) input relation.
• Projection operator has to                       age
eliminate duplicates! (Why??)                    24.0
 Note: real systems typically                  25.5
don‟t do duplicate elimination                23.0
unless the user explicitly asks
for it. (Why not?)
 age(S 2)
Vertical Slices
ISOM

• Projection                         Algebra: projection
 Specifying Elements
<A1,A2,...Am> (R)

 No Specification            Conditional
List all information about   List IDs, names, and addresses of
Students                     all students

select    *              select StudentID, name, address
from      STUDENT;       from STUDENT;

(Student)
 StudentID, name, address (Student)
Does SQL treat Relations as
„Sets‟?
ISOM

What are the different salaries we pay to our
employees?

select     salary
from       EMPLOYEE;

OR is the following better?

select     DISTINCT salary
from       EMPLOYEE;
Selection
ISOM
sid sname GPA          age
• Selects rows that satisfy      28 yuppy 3.9           35.0
selection condition.           58 rusty 4.0           35.0
• No duplicates in result!
(Why?)                             gpa3.8(S 2)
• Schema of result identical
to schema of (only) input         sname    GPA
relation.                         yuppy    3.9
rusty    4.0
• Result relation can be the
input for another relational    sname,gpa(          (S 2))
algebra operation!                             gpa3.8
(Operator composition.)
Horizontal Slices
ISOM

• Restriction                    Algebra: selection
 Specifying Conditions         or restriction
(R)

 Unconditional
 Conditional
List all students
List all students with GPA > 3.0
select *
select   *
from STUDENT
from     STUDENT;
where GPA > 3.0;

(Student)
   GPA > 3.0   (Student)
Specifying Conditions
ISOM

List all students in ...
select *
from     STUDENT
where city in (‘Boston’,’Atlanta’);

List all students in ...
select *
from     STUDENT
where zip not between 60115 and 60123;
Pattern Matching
ISOM

‘%’   any string with n characters, n>=0
‘_’   any single character.
x    exact sequence of string x.

List all CIS courses.
select *
from    COURSE
where course# like ‘CIS%’;

List all CIS 3200 level courses.
select *
from    COURSE
where course# like ? ;
Missing or Incomplete Information
ISOM

•List all students whose address or telephone number is missing:

select   *
from     STUDENT
where    Address is null or GPA is null;
Horizontal and Vertical
ISOM

Query:
List all student ID, names and addresses who have
GPA > 3.0 and date of birth before Jan 1, 1980.
select               StudentID, Name, Address
from                 STUDENT
where                GPA > 3.0 and DOB < „1-Jan-80‟
order by             Name DESC;

Algebra:  StudentID,name, address ( GPA > 3.0 and DOB < „1-Jan-80‟ (STUDENT))
Calculus: {t.StudentID, t.name, t.address | t  Student t.GPA > 3.0 
t.DOB < „1-Jan-80‟}
Order by sorts result in descending (DESC) order.
Note: The default order is ascending (ASC) as in:
order by Name;
Union, Intersection, Set-Difference
ISOM
sid    sname     gpa   age
22     dustin    3.5   25.0
• All of these operations take     31     lubber    3.8   25.5
two input relations, which       58     rusty     4.0   23.0
44     guppy     3.5   25.5
must be union-compatible:        28     yuppy     3.9   24.0

 Same number of fields.                   S1 S2
 `Corresponding‟ fields            sid sname gpa      age
have the same type.              31 lubber 3.8      25.5
58 rusty 4.0       23.0
• What is the schema of
result?                                    S1 S2
sid     sname     gpa    age
22      dustin    3.5    25.0

S1 S2
Union
ISOM

Liststudents who live in Atlanta or GPA > 3.0
select    StudentID, Name, DOB, Address
from      STUDENT
where     Address = ‘Atlanta’
union
select    StudentID, Name, DOB, Address
from      STUDENT
where     GPA > 3.0;

Can we perform a Union on any two Relations ?
Union Compatibility
ISOM

Two   relations, A and B, are union-compatible
if
A and B contain a same number of attributes, and
The corresponding attributes of the two have the same domains
Examples
Course (C#: Dnumber; Title: Dstr; Credits: Dnumber)

Are CIS-Student and Senior-Student union compatible?
Are CIS-Student and Course union compatible?

What happens if we have duplicate tuples?
What will be the column names in the resulting Relation?
Union, Intersect, Minus
ISOM

select  CUSTNAME, ZIP             select  CUSTNAME, ZIP
from    CUSTOMER                  from    CUSTOMER
where   STATE = ‘MA’              where   STATE = ‘MA’
UNION                             INTERSECT
select  SUPNAME, ZIP              select  SUPNAME, ZIP
from    SUPPLIER                  from    SUPPLIER
where   STATE = ‘MA’              where   STATE = ‘MA’
ORDER BY 2;                       ORDER BY 2;

select  CUSTNAME, ZIP                    B
B
from    CUSTOMER
A             where   STATE = ‘MA’                         A
MINUS
select  SUPNAME, ZIP
B
from    SUPPLIER
where   STATE = ‘MA’              A
ORDER BY 2;
Cross-Product
ISOM
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1,
with field names `inherited‟ if possible.
 Conflict: Both S1 and R1 have a field called sid.

(sid) sname GPA      Age (sid) cid     semester
22   dustin   3.5   25.0   22   101   Fall 99
22   dustin   3.5   25.0   58   103   Spring 99
31   lubber   3.8   25.5   22   101   Fall 99
31   lubber   3.8   25.5   58   103   Spring 99
58   rusty    4.0   23.0   22   101   Fall 99
58   rusty    4.0   23.0   58   103   Spring 99

 Renaming operator:  (C(1 sid1, 5  sid 2), S1  R1)
Joins
ISOM

• Condition Join:         R  c S   c ( R  S)


(sid)   sname    GPA age (sid) cid Semester
22      dustin   3.5 25.0 58   103 Spring 99
31      lubber   3.8 25.5 58   103 Spring 99
S1                        R1
S1. sid  R1. sid
• Result schema same as that of cross-
product.
• Fewer tuples than cross-product, might be
able to compute more efficiently
• Sometimes called a theta-join.
Joins
ISOM

• Equi-Join: A special case of condition join where the
condition c contains only equalities.
sid    sname     GPA      age      cid   semester
22     dustin    3.5      25.0     101   Fall 99
58     rusty     4.0      23.0     103   Spring 99
S1          R1
sid
• Result schema similar to cross-product, but only one
copy of fields for which equality is specified.
• Natural Join: Equijoin on all common fields.
Find names of students who have taken
course #103
ISOM

   Solution 1:
 sname((                        
Re gisters) Students)
cid 103
       Solution 2:
 (Temp1,          Re gisters)
cid 103
 (Temp2,Temp1 Students)

 sname (Temp2)
   Solution 3:
 sname(                        
(Re gisters  Students))
cid 103
ISOM

List   information about all students and the classes they are taking
Student
ID       Name        ***
s1       Jose        ***
s2       Alice       ***
Class
s3       Tome        ***
***       ***        ***          Emp#        ID       C#          ***
e1          s1      BA 201       ***
e3          s2     CIS 300       ***
e2          s3     CIS 304       ***
***         ***                  ***
What can we use to connect/link Relations?
Join: Connecting relations so that relevant tuples can be retrieved.
Join
ISOM

R1               R2
Cartesian
Product

Student: 30 tuples          Class: 4 tuples
Total Number of Tuples in the Cartesian Product. ?
(match each tuple of student to every tuple of class)

Select tuples having identical Student Ids.
Expected number of such Tuples:            Join Selectivity
R1                 R2
Join Forms
ISOM

• General Join Forms            select   s.*, c.*
 Equijoin                 from     STUDENT s, CLASS c
where    s.StudentID = c. SID;
 Operator Dependent
• Natural Join
R1       R2
• Outer Join                                             =
 Left                                            x > y
 Right                                            <>
 Full                                              ...
select   s.*, c.*
from     STUDENT s, CLASS c
where    s.StudentID = c.SID (+);
Find names of students who have taken a
CIS course
ISOM

• Information about departments only available in
Courses; so need an extra join:

 sname((                                   
Courses) Re gisters Students)
dept 'CIS '
   A more efficient solution:
 sname(         ((                                    
Courses) Re gis)  Students)
sid         cid dept 'CIS '

 A query optimizer can find this given the first solution!
Find students who have taken an MIS or a CS
course
ISOM

• Can identify all MIS or CS courses, then find students
who have taken one of these courses:
 (Temp1,(                               Courses))
dept 'MIS 'dept 'CS '
 sname(Temp1 Re gis  Students)
        

   Can also define Temp1 using union! (How?)

   What happens if  is replaced by  in this query?
Find students who have taken a CIS and an ECI
Course
ISOM

• Previous approach won‟t work! Must identify
students who have taken CIS courses, students who
have taken ECI courses, then find the intersection
(note that sid is a key for Students):

 (Temp1,           ((                        
Courses) Re gis))
sid         dept 'CIS '
 (Temp2,           ((                        
Courses) Re gis))
sid         dept 'ECI '
 sname((Temp1Temp2) Students)

Relational Calculus
ISOM

• Comes in two flavours: Tuple relational calculus
(TRC) and Domain relational calculus (DRC).
• Calculus has variables, constants, comparison ops,
logical connectives and quantifiers.
   TRC: Variables range over (i.e., get bound to) tuples.
   DRC: Variables range over domain elements (= field values).
   Both TRC and DRC are simple subsets of first-order logic.
• Expressions in the calculus are called formulas. An
answer tuple is essentially an assignment of
constants to variables that make the formula evaluate
to true.
Find students with GPA > 3.7 who have
taken a CIS Course
ISOM

TRC:         



t |tStudents t.GPA  3.7


r rRe gisr.sid  t.sid 






c cCoursesc.cid  r.cid c.dept 'CIS '




 
 
 
 
                                       

DRC:




I , N ,G, A | I , N ,G, A Students G  3.7


Ir,Cr,S Ir,Cr,S Re gisIr  I 






C,CN , D C,CN , D Courses C  Cr  D 'CIS '
                                       
                                       
                                       
                                       
                                       
Find students who have taken all
CIS courses
ISOM

TRC:       t|tStudents 






c cCourses ^c.dept 'CIS '








r rRe gisr.sid  t.sid r.cid  c.cid




  
  
  
  
                                               

DRC:




I , N ,G, A | I , N ,G, A Students


C,CN ,D C,CN , D Courses ^ D 'CIS '






Ir,Cr,S Ir,Cr,S Re gisI  Ir Cr C
                                                     
                                                     
                                                     
                                                     
                                                     

How will you do this with Relational Algebra?
Monotonic and Non-Monotonic
Queries
ISOM

• Monotonic queries: queries for which the size of
the results either increase or stay the same as the
size of the inputs increase. The result size never
decreases
• Non-monotonic queries: queries for which it is
possible that the size of the result will DECREASE
when the size of the input increases
• Examples of each?

• Which of the algebra operations is non-monotonic?
• What does this signify?
Summaries and Aggregates
ISOM

Calculate the average GPA   select avg. (GPA)
from STUDENT,

Find the lowest GPA         select min (GPA) as minGPA
from STUDENT,

How many CIS majors?        select    count (StudentId)
from      STUDENT
where     major=‘CIS’;

Discarding duplicates       select avg (distinct GPA)
STUDENT
where major=‘CIS’
(is this above query correct?)
Aggregate Functions
ISOM

COUNT (attr)                           - a simple count of values in
attr
SUM (attr)                             -   sum of values in attr
AVG (attr)                             -   average of values in attr
MAX (attr)                             -   maximum value in attr
MIN (attr)                             -   minimum value in attr

Take effect after all the data is retrieved from the database
Applied to either the entire resulting relation or groups
Can’t be involved in any query qualifications (where clause)

Would   the following query be permitted?
select    StudentId
from      STUDENT
where    GPA = max (GPA);
Grouping Results Obtained
ISOM

Show all students enrolled in each course.
select    cno, StudentID
from      REGISTRATION
group by cno;                           Is this grouping OK?
Calculate the average GPA of students by county.
select    county, avg (GPA) as CountyGPA
from      STUDENT
group by county;
Calculate the enrollment of each class.
select    cno, year , term, count (StudentID) as enroll
from      REGISTRATION
group by cno, year, term;
Selections on Groups
ISOM

   Show all CIS courses that are full.
select         cno, count (StudentID)
from           REGISTRATION
group by       cno
having         count (StudentID) > 29;
Grouping Results after Join
ISOM

Calculate   the average GPA of each class

select          course#, avg (GPA)
from            STUDENT S, CLASS C
where           S.StudentID = C.SID
group by        course#,
Nesting Queries
ISOM

SELECT      attribute(s)
FROM        relation(S)
WHERE       attr [not] {in | comparison operator | exists
}
( query statement(s) );

List names of students who are taking “BA201”
select       Name
from                  Student
where        StudentID in
( select         StudentID
from           REGISTRATION
where course#=„BA201‟);
Sub Queries
ISOM

List all students enrolled in CIS courses
select name
from    STUDENT
where StudentId in
(select           StudentId
from              REGISTRATION
where             cno like ‘CIS%’);

List all courses taken by Student (Id 1011)
select cname
from     COURSE
where cnum = any
(select           cno
from              REGISTRATION
where             StudentId = 1011);
Sub Queries
ISOM

Who received the highest grade in CIS 8140
select    StudentId
from      REGISTRATION
where     cnum = ‘CIS 8140’ and
from      REGISTRATION
where     cno = ‘CIS 8140’);
List all students enrolled in CIS courses.
select   name
from     STUDENT S
where    exists
(select *
from       REGISTRATION
where      StudentId = S.StudentId
and cno like ‘CIS%’);
Relational Views
ISOM

• Relations derived from other relations.
• Views have no stored tuples.
• Are useful to provide multiple user views.

View 1                View 2                 View N

Base                   Base
Relation 1             Relation 2

•What level in the three layer model do views belong?
•Which kind of independence do they support?
View Creation
ISOM

Create View view-name [ ( attr [ , attr ] ...) ]
AS subquery
[ with check option ] ;

DROP VIEW view-name;

 Create a view containing the student ID,
Name, Age and GPA for those who are
qualified to take 300 level courses, i.e., GPA
>=2.0.
View Options
ISOM

• With Check Option enforces the query
condition for insertion or update
To enforce the GPA >=2.0 condition on
all new student tuples inserted into the
view

• A view may be derived from multiple
base relations
Create a view that includes student IDs, student names
and their instructors‟ names for all CIS 300 students.
View Retrieval
ISOM

Queries on views are the same as that on base
relations.
Queries on views are expanded into queries on
their base relations.

select      Name, Instructor-Name
from        CIS300-Student
where       Name = Instructor-Name;
View: Update
ISOM

Update on a view actually changes its base relation(s)!

update Qualified-Student
set            GPA = GPA-0.1
where          StudentID = „s3‟;
insert into    Qualified-Student
values         ( „s9‟, „Lisa‟, 4.0 )
insert into    Qualified-Student
values         ( „s10‟, „Peter‟, 1.7 )

Why are some views not updateable?
What type of views are updateable?
Non-monotonic queries – again!
ISOM

• Need to use either MINUS or NOT
EXISTS!
• Find courses where no student has
gpa over 3.5
• Find students who have taken all
courses that Joe has taken
• How would you solve these?
Summary
ISOM

• SQL is a low-complexity, declarative query
language
• The good thing about being declarative is
that internally the query can be changed
automatically for optimization
• Good thing about being low-complexity?
 No SQL query ever goes into an infinite loop
 No SQL query will ever take indefinite amount of
space to get the solution
• Can be used for highly complex problems!

```
To top