Docstoc

SQL

Document Sample
SQL Powered By Docstoc
					ISOM




         MIS415 Module 2
       Query Languages –SQL

          Arijit Sengupta
                       Structure of this quarter
ISOM

                                           MIS415


        0. Intro           1. Design        2. Querying    3. Applications     4. Advanced
                                                                                  Topics
         Database          Conceptual
                                              Query          Java DB
       Fundamentals         Modeling                                           Transaction
                                             Languages     Applications –      Management
                                                               JDBC
                           Relational
                            Model            Advanced
                                                                                   Data
                                               SQL
                                                                                  Mining
                           Normalization



   Newbie          Users   Designers          Developers                    Professionals
                 Today‟s Buzzwords
ISOM



       •   Query Languages
       •   Formal Query Languages
       •   Procedural and Declarative Languages
       •   Relational Algebra
       •   Relational Calculus
       •   SQL
       •   Aggregate Functions
       •   Nested Queries
                              Objectives
ISOM

       At the end of the lecture, you should
       • Get a formal as well as practical perspective on query
          languages
       • Have a background on query language basics (how they
          came about)
       • Be able to write simple SQL queries from the specification
       • Be able to look at SQL queries and understand what it is
          supposed to do
       • Be able to write complex SQL queries involving nesting
       • Execute queries on a database system
                   Set Theory Basics
ISOM

       • A set: a collection of distinct items with
         no particular order
       • Set description:
           { b | b is a Database Book}
           {c | c is a city with a population of over a
            million}
           {x | 1 < x < 10 and x is a natural number}
       • Most basic set operation:
           Membership: x  S (read as x belongs to S if
            x is in the set S)
                 Other Set Operations
ISOM

       • Addition, deletion (note that adding an existing
         item in the set does not change it)
       • Set mathematics:
          Union R  S = { x | x  R or x  S}
          Intersection R  S = { x | x R and x  S}
          Set Difference R – S = { x | x  R and x  S}
          Cross-product R x S = { <x,y> | x  R and y  S}
       • You can combine set operations much like
         arithmetic operations: R – (S  T)
       • Usually no well-defined precedence
              Relational Query Languages
ISOM

       • Query languages: Allow manipulation and retrieval
         of data from a database.
       • Relational model supports simple, powerful QLs:
             Strong formal foundation based on logic.
             Allows for much optimization.
       • Query Languages != programming languages!
             QLs not expected to be “Turing complete”.
             QLs not intended to be used for complex calculations.
             QLs support easy, efficient access to large data sets.
               Formal Relational Query
                    Languages
ISOM

       Two mathematical Query Languages form the
        basis for “real” languages (e.g. SQL), and for
        implementation:
       Relational Algebra: More operational, very
        useful for representing execution plans.
       Relational Calculus: Lets users describe
        what they want, rather than how to compute
        it. (Non-operational, declarative.)

        Understanding Algebra & Calculus is key to
         understanding SQL, query processing!
            Structured Query Language
ISOM

       • Need for SQL
          Operations on Data Types
          Definition Manipulation
          Operations on Sets
          Declarative (calculus) vs. Procedural (algebra)
       • Evolution of SQL
          SEQUEL ..SQL_92 .. SQL_93
          SQL Dialects


            Does SQL treat Relations as „Sets‟?
                           Preliminaries
ISOM

       • A query is applied to relation instances, and
         the result of a query is also a relation
         instance.
            Schemas of input relations for a query are fixed
             (but query will run regardless of instance!)
            The schema for the result of a given query is also
             fixed! Determined by definition of query language
             constructs.
       • Positional vs. named-field notation:
            Positional notation easier for formal definitions,
             named-field notation more readable.
            Both used in SQL
                       Example Instances
ISOM

       • Students, Registers, Courses relations for our examples.

                                             cid cname    dept
       R1    sid cid   semester        C1
             22 101    Fall 99               101 Database CIS
             58 103    Spring 99             103 Internet ECI




        S1   sid   sname GPA    age         S2   sid   sname GPA    age
                                                 28    yuppy  3.9   24.0
             22    dustin 3.5   25.0             31    lubber 3.8   25.5
             31    lubber 3.8   25.5             44    guppy  3.5   25.5
             58    rusty  4.0   23.0             58    rusty  4.0   23.0
                         Relational Algebra
ISOM


       • Basic operations:
                         
              Selection ( ) Selects a subset of rows from relation.
                          
          
             Projection ( ) Deletes unwanted columns from relation.
                              
                               
             Cross-product ( ) Allows us to combine two relations.
             Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
             Union (  ) Tuples in reln. 1 and in reln. 2.
       • Additional operations:
             Intersection, join, division, renaming: Not essential, but
              (very!) useful.
       • Since each operation returns a relation, operations
         can be composed! (Algebra is “closed”.)
                            Projection
ISOM

       • Deletes attributes that are not in
                                                 sname     GPA
         projection list.
                                                 yuppy     3.9
       • Schema of result contains
                                                 lubber    3.8
         exactly the fields in the               guppy     3.5
         projection list, with the same          rusty     4.0
                                               sname, gpa(S 2)
         names that they had in the
         (only) input relation.
       • Projection operator has to                       age
         eliminate duplicates! (Why??)                    24.0
           Note: real systems typically                  25.5
            don‟t do duplicate elimination                23.0
            unless the user explicitly asks
            for it. (Why not?)
                                                     age(S 2)
                        Vertical Slices
ISOM

       • Projection                         Algebra: projection
           Specifying Elements
                                                <A1,A2,...Am> (R)

            No Specification            Conditional
           List all information about   List IDs, names, and addresses of
           Students                     all students

               select    *              select StudentID, name, address
               from      STUDENT;       from STUDENT;

                  (Student)
                                         StudentID, name, address (Student)
             Does SQL treat Relations as
                      „Sets‟?
ISOM


       What are the different salaries we pay to our
        employees?

         select     salary
         from       EMPLOYEE;

         OR is the following better?

         select     DISTINCT salary
         from       EMPLOYEE;
                          Selection
ISOM
                                    sid sname GPA          age
   • Selects rows that satisfy      28 yuppy 3.9           35.0
     selection condition.           58 rusty 4.0           35.0
   • No duplicates in result!
     (Why?)                             gpa3.8(S 2)
   • Schema of result identical
     to schema of (only) input         sname    GPA
     relation.                         yuppy    3.9
                                       rusty    4.0
   • Result relation can be the
     input for another relational    sname,gpa(          (S 2))
     algebra operation!                             gpa3.8
     (Operator composition.)
              Horizontal Slices
ISOM


       • Restriction                    Algebra: selection
           Specifying Conditions         or restriction
                                                (R)

           Unconditional
                                     Conditional
          List all students
                                    List all students with GPA > 3.0
                                         select *
              select   *
                                         from STUDENT
              from     STUDENT;
                                         where GPA > 3.0;

               (Student)
                                            GPA > 3.0   (Student)
                      Specifying Conditions
ISOM


       List all students in ...
                 select *
                 from     STUDENT
                 where city in (‘Boston’,’Atlanta’);



                    List all students in ...
                              select *
                              from     STUDENT
                              where zip not between 60115 and 60123;
                       Pattern Matching
ISOM


       ‘%’   any string with n characters, n>=0
       ‘_’   any single character.
        x    exact sequence of string x.

             List all CIS courses.
                       select *
                       from    COURSE
                       where course# like ‘CIS%’;

                              List all CIS 3200 level courses.
                                        select *
                                        from    COURSE
                                        where course# like ? ;
              Missing or Incomplete Information
ISOM


       •List all students whose address or telephone number is missing:

               select   *
               from     STUDENT
               where    Address is null or GPA is null;
                        Horizontal and Vertical
ISOM

       Query:
         List all student ID, names and addresses who have
         GPA > 3.0 and date of birth before Jan 1, 1980.
                 select               StudentID, Name, Address
                 from                 STUDENT
                 where                GPA > 3.0 and DOB < „1-Jan-80‟
                 order by             Name DESC;

       Algebra:  StudentID,name, address ( GPA > 3.0 and DOB < „1-Jan-80‟ (STUDENT))
       Calculus: {t.StudentID, t.name, t.address | t  Student t.GPA > 3.0 
                                                           t.DOB < „1-Jan-80‟}
              Order by sorts result in descending (DESC) order.
                Note: The default order is ascending (ASC) as in:
                        order by Name;
       Union, Intersection, Set-Difference
ISOM
                                      sid    sname     gpa   age
                                      22     dustin    3.5   25.0
   • All of these operations take     31     lubber    3.8   25.5
     two input relations, which       58     rusty     4.0   23.0
                                      44     guppy     3.5   25.5
     must be union-compatible:        28     yuppy     3.9   24.0

       Same number of fields.                   S1 S2
       `Corresponding‟ fields            sid sname gpa      age
         have the same type.              31 lubber 3.8      25.5
                                          58 rusty 4.0       23.0
   • What is the schema of
     result?                                    S1 S2
                                    sid     sname     gpa    age
                                    22      dustin    3.5    25.0

                                               S1 S2
                                       Union
ISOM


       Liststudents who live in Atlanta or GPA > 3.0
           select    StudentID, Name, DOB, Address
           from      STUDENT
           where     Address = ‘Atlanta’
           union
           select    StudentID, Name, DOB, Address
           from      STUDENT
           where     GPA > 3.0;




           Can we perform a Union on any two Relations ?
                        Union Compatibility
ISOM


       Two   relations, A and B, are union-compatible
          if
          A and B contain a same number of attributes, and
          The corresponding attributes of the two have the same domains
          Examples
             CIS=Student (ID: Did; Name: Dname; Address: Daddr; Grade: Dgrade);
             Senior-Student (SName: Dname; S#: Did; Home: Daddr; Grade: Dgrade);
             Course (C#: Dnumber; Title: Dstr; Credits: Dnumber)

          Are CIS-Student and Senior-Student union compatible?
          Are CIS-Student and Course union compatible?

          What happens if we have duplicate tuples?
          What will be the column names in the resulting Relation?
                     Union, Intersect, Minus
ISOM

               select  CUSTNAME, ZIP             select  CUSTNAME, ZIP
               from    CUSTOMER                  from    CUSTOMER
               where   STATE = ‘MA’              where   STATE = ‘MA’
                  UNION                             INTERSECT
               select  SUPNAME, ZIP              select  SUPNAME, ZIP
               from    SUPPLIER                  from    SUPPLIER
               where   STATE = ‘MA’              where   STATE = ‘MA’
                  ORDER BY 2;                       ORDER BY 2;

                         select  CUSTNAME, ZIP                    B
       B
                         from    CUSTOMER
           A             where   STATE = ‘MA’                         A
                            MINUS
                         select  SUPNAME, ZIP
                                                       B
                         from    SUPPLIER
                         where   STATE = ‘MA’              A
                            ORDER BY 2;
                    Cross-Product
ISOM
       • Each row of S1 is paired with each row of R1.
       • Result schema has one field per field of S1 and R1,
         with field names `inherited‟ if possible.
           Conflict: Both S1 and R1 have a field called sid.

            (sid) sname GPA      Age (sid) cid     semester
             22   dustin   3.5   25.0   22   101   Fall 99
             22   dustin   3.5   25.0   58   103   Spring 99
             31   lubber   3.8   25.5   22   101   Fall 99
             31   lubber   3.8   25.5   58   103   Spring 99
             58   rusty    4.0   23.0   22   101   Fall 99
             58   rusty    4.0   23.0   58   103   Spring 99

  Renaming operator:  (C(1 sid1, 5  sid 2), S1  R1)
                               Joins
ISOM

   • Condition Join:         R  c S   c ( R  S)
                                

       (sid)   sname    GPA age (sid) cid Semester
       22      dustin   3.5 25.0 58   103 Spring 99
       31      lubber   3.8 25.5 58   103 Spring 99
                   S1                        R1
                           S1. sid  R1. sid
   • Result schema same as that of cross-
     product.
   • Fewer tuples than cross-product, might be
     able to compute more efficiently
   • Sometimes called a theta-join.
                                   Joins
ISOM

       • Equi-Join: A special case of condition join where the
         condition c contains only equalities.
         sid    sname     GPA      age      cid   semester
         22     dustin    3.5      25.0     101   Fall 99
         58     rusty     4.0      23.0     103   Spring 99
                         S1          R1
                                 sid
       • Result schema similar to cross-product, but only one
         copy of fields for which equality is specified.
       • Natural Join: Equijoin on all common fields.
   Find names of students who have taken
                course #103
ISOM


          Solution 1:
                 sname((                        
                                      Re gisters) Students)
                             cid 103
          Solution 2:
                          (Temp1,          Re gisters)
                                      cid 103
                          (Temp2,Temp1 Students)
                                        
                          sname (Temp2)
      Solution 3:
                sname(                        
                                   (Re gisters  Students))
                           cid 103
                Connecting/Linking Relations
ISOM

       List   information about all students and the classes they are taking
                        Student
         ID       Name        ***
         s1       Jose        ***
         s2       Alice       ***
                                                                          Class
         s3       Tome        ***
         ***       ***        ***          Emp#        ID       C#          ***
                                           e1          s1      BA 201       ***
                                           e3          s2     CIS 300       ***
                                           e2          s3     CIS 304       ***
                                           ***         ***                  ***
       What can we use to connect/link Relations?
       Join: Connecting relations so that relevant tuples can be retrieved.
                                    Join
ISOM

                            R1               R2
       Cartesian
       Product


               Student: 30 tuples          Class: 4 tuples
         Total Number of Tuples in the Cartesian Product. ?
         (match each tuple of student to every tuple of class)

         Select tuples having identical Student Ids.
         Expected number of such Tuples:            Join Selectivity
                                            R1                 R2
                  Join Forms
ISOM

  • General Join Forms            select   s.*, c.*
        Equijoin                 from     STUDENT s, CLASS c
                                  where    s.StudentID = c. SID;
        Operator Dependent
  • Natural Join
                                                 R1       R2
  • Outer Join                                             =
        Left                                            x > y
        Right                                            <>
        Full                                              ...
                     select   s.*, c.*
                     from     STUDENT s, CLASS c
                     where    s.StudentID = c.SID (+);
       Find names of students who have taken a
                     CIS course
ISOM


        • Information about departments only available in
          Courses; so need an extra join:

        sname((                                   
                                Courses) Re gisters Students)
                    dept 'CIS '
      A more efficient solution:
        sname(         ((                                    
                                               Courses) Re gis)  Students)
                   sid         cid dept 'CIS '



        A query optimizer can find this given the first solution!
       Find students who have taken an MIS or a CS
                          course
ISOM

       • Can identify all MIS or CS courses, then find students
         who have taken one of these courses:
        (Temp1,(                               Courses))
                      dept 'MIS 'dept 'CS '
        sname(Temp1 Re gis  Students)
                             

          Can also define Temp1 using union! (How?)

          What happens if  is replaced by  in this query?
       Find students who have taken a CIS and an ECI
                          Course
ISOM

       • Previous approach won‟t work! Must identify
         students who have taken CIS courses, students who
         have taken ECI courses, then find the intersection
         (note that sid is a key for Students):


    (Temp1,           ((                        
                                          Courses) Re gis))
                  sid         dept 'CIS '
    (Temp2,           ((                        
                                          Courses) Re gis))
                  sid         dept 'ECI '
             sname((Temp1Temp2) Students)
                                  
                       Relational Calculus
ISOM


       • Comes in two flavours: Tuple relational calculus
         (TRC) and Domain relational calculus (DRC).
       • Calculus has variables, constants, comparison ops,
         logical connectives and quantifiers.
             TRC: Variables range over (i.e., get bound to) tuples.
             DRC: Variables range over domain elements (= field values).
             Both TRC and DRC are simple subsets of first-order logic.
       • Expressions in the calculus are called formulas. An
         answer tuple is essentially an assignment of
         constants to variables that make the formula evaluate
         to true.
           Find students with GPA > 3.7 who have
                     taken a CIS Course
ISOM


   TRC:         
                
                
                
                 t |tStudents t.GPA  3.7
                

                    r rRe gisr.sid  t.sid 
                      
                      
                      
                      
                      


                    c cCoursesc.cid  r.cid c.dept 'CIS '
                      
                      
                      
                      
                                                              
                                                              
                                                              
                                                              
                                                             

       DRC:
       
       
       
       
           I , N ,G, A | I , N ,G, A Students G  3.7
       


              Ir,Cr,S Ir,Cr,S Re gisIr  I 
                              
                              
                              
                              
                              

           C,CN , D C,CN , D Courses C  Cr  D 'CIS '
                                                                 
                                                                 
                                                                 
                                                                 
                                                                 
         Find students who have taken all
                   CIS courses
ISOM


       TRC:       t|tStudents 
                  
                  
                  
                  
                  

                      c cCourses ^c.dept 'CIS '
                               
                               
                               
                               
                               
                           
                           
                           
                               r rRe gisr.sid  t.sid r.cid  c.cid
                                   
                                   
                                   
                                   
                                                                            
                                                                            
                                                                            
                                                                            
                                                                          



       DRC:
              
              
              
              
                      I , N ,G, A | I , N ,G, A Students
              


              C,CN ,D C,CN , D Courses ^ D 'CIS '
                                        
                                        
                                        
                                        
                                        

                          Ir,Cr,S Ir,Cr,S Re gisI  Ir Cr C
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           


         How will you do this with Relational Algebra?
           Monotonic and Non-Monotonic
                     Queries
ISOM

       • Monotonic queries: queries for which the size of
         the results either increase or stay the same as the
         size of the inputs increase. The result size never
         decreases
       • Non-monotonic queries: queries for which it is
         possible that the size of the result will DECREASE
         when the size of the input increases
       • Examples of each?



       • Which of the algebra operations is non-monotonic?
       • What does this signify?
               Summaries and Aggregates
ISOM


       Calculate the average GPA   select avg. (GPA)
                                   from STUDENT,

       Find the lowest GPA         select min (GPA) as minGPA
                                   from STUDENT,

       How many CIS majors?        select    count (StudentId)
                                   from      STUDENT
                                   where     major=‘CIS’;

       Discarding duplicates       select avg (distinct GPA)
                                   STUDENT
                                   where major=‘CIS’
                                   (is this above query correct?)
                        Aggregate Functions
ISOM

       COUNT (attr)                           - a simple count of values in
       attr
       SUM (attr)                             -   sum of values in attr
       AVG (attr)                             -   average of values in attr
       MAX (attr)                             -   maximum value in attr
       MIN (attr)                             -   minimum value in attr

          Take effect after all the data is retrieved from the database
          Applied to either the entire resulting relation or groups
          Can’t be involved in any query qualifications (where clause)


       Would   the following query be permitted?
                           select    StudentId
                           from      STUDENT
                           where    GPA = max (GPA);
               Grouping Results Obtained
ISOM


       Show all students enrolled in each course.
               select    cno, StudentID
               from      REGISTRATION
               group by cno;                           Is this grouping OK?
       Calculate the average GPA of students by county.
               select    county, avg (GPA) as CountyGPA
               from      STUDENT
               group by county;
       Calculate the enrollment of each class.
               select    cno, year , term, count (StudentID) as enroll
               from      REGISTRATION
                group by cno, year, term;
                        Selections on Groups
ISOM


          Show all CIS courses that are full.
           select         cno, count (StudentID)
           from           REGISTRATION
           group by       cno
           having         count (StudentID) > 29;
             Grouping Results after Join
ISOM



       Calculate   the average GPA of each class

       select          course#, avg (GPA)
       from            STUDENT S, CLASS C
       where           S.StudentID = C.SID
       group by        course#,
                   Nesting Queries
ISOM

       SELECT      attribute(s)
       FROM        relation(S)
       WHERE       attr [not] {in | comparison operator | exists
       }
                   ( query statement(s) );

       List names of students who are taking “BA201”
       select       Name
       from                  Student
       where        StudentID in
                    ( select         StudentID
                      from           REGISTRATION
                    where course#=„BA201‟);
                                 Sub Queries
ISOM

       List all students enrolled in CIS courses
                  select name
                  from    STUDENT
                  where StudentId in
                          (select           StudentId
                          from              REGISTRATION
                          where             cno like ‘CIS%’);

                      List all courses taken by Student (Id 1011)
                                select cname
                                from     COURSE
                                where cnum = any
                                         (select           cno
                                         from              REGISTRATION
                                         where             StudentId = 1011);
                                 Sub Queries
ISOM


       Who received the highest grade in CIS 8140
                select    StudentId
                from      REGISTRATION
                where     cnum = ‘CIS 8140’ and
                          grade >=all
                          (select grade
                          from      REGISTRATION
                          where     cno = ‘CIS 8140’);
                               List all students enrolled in CIS courses.
                                           select   name
                                           from     STUDENT S
                                           where    exists
                                                    (select *
                                                    from       REGISTRATION
                                                    where      StudentId = S.StudentId
                                                               and cno like ‘CIS%’);
                        Relational Views
ISOM


        • Relations derived from other relations.
        • Views have no stored tuples.
        • Are useful to provide multiple user views.

       View 1                View 2                 View N

                  Base                   Base
                Relation 1             Relation 2


       •What level in the three layer model do views belong?
       •Which kind of independence do they support?
                       View Creation
ISOM



       Create View view-name [ ( attr [ , attr ] ...) ]
       AS subquery
       [ with check option ] ;

       DROP VIEW view-name;

           Create a view containing the student ID,
            Name, Age and GPA for those who are
            qualified to take 300 level courses, i.e., GPA
            >=2.0.
                         View Options
ISOM

       • With Check Option enforces the query
         condition for insertion or update
          To enforce the GPA >=2.0 condition on
         all new student tuples inserted into the
         view

       • A view may be derived from multiple
         base relations
         Create a view that includes student IDs, student names
         and their instructors‟ names for all CIS 300 students.
                      View Retrieval
ISOM


       Queries on views are the same as that on base
         relations.
       Queries on views are expanded into queries on
         their base relations.

       select      Name, Instructor-Name
       from        CIS300-Student
       where       Name = Instructor-Name;
                            View: Update
ISOM

       Update on a view actually changes its base relation(s)!

               update Qualified-Student
               set            GPA = GPA-0.1
               where          StudentID = „s3‟;
               insert into    Qualified-Student
               values         ( „s9‟, „Lisa‟, 4.0 )
               insert into    Qualified-Student
               values         ( „s10‟, „Peter‟, 1.7 )



       Why are some views not updateable?
       What type of views are updateable?
          Non-monotonic queries – again!
ISOM


       • Need to use either MINUS or NOT
         EXISTS!
       • Find courses where no student has
         gpa over 3.5
       • Find students who have taken all
         courses that Joe has taken
       • How would you solve these?
                         Summary
ISOM


       • SQL is a low-complexity, declarative query
         language
       • The good thing about being declarative is
         that internally the query can be changed
         automatically for optimization
       • Good thing about being low-complexity?
          No SQL query ever goes into an infinite loop
          No SQL query will ever take indefinite amount of
           space to get the solution
       • Can be used for highly complex problems!

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:84
posted:12/3/2010
language:English
pages:54