CS520: Introduction to Database Design & Engineering
Fall 2002
(C) Frieder, Grossman, & Goharian 1996, 2002
1
Database Course Differentiation
Credit not given for both classes since they are similar in content
CS 425 Prerequisite: CS401 Emphasis:
» Database Design & Use » Application Development
CS 520 Prerequisite: CS402 Emphasis:
» Database Design & Eng. » DBMS Development
Project:
» Application Development
Project:
» DBMS Development
(C) Frieder, Grossman, & Goharian 1996, 2002
2
Contents
Introduction SQL Database Design Query Optimization Recovery and Concurrency Control Integration of Structured Data and Text Distributed Database Systems 1 - 44 45 - 122 123 - 188 189 - 211 212 - 233 234 - 241 242 - end
(C) Frieder, Grossman, & Goharian 1996, 2002
3
Background
Initially, installations wrote separate applications with large amounts of repeated code to implement concurrency control, security, and recovery. This was a lot of wasted effort that was also very much error prone
(C) Frieder, Grossman, & Goharian 1996, 2002
4
Example of Common Functionality
Payroll User Interface Business Logic Concurrency Control Security Recovery
(C) Frieder, Grossman, & Goharian 1996, 2002
Inventory User Interface Business Logic Concurrency Control Security Recovery
Marketing User Interface Business Logic Concurrency Control Security Recovery
5
Database System Example
Payroll User Interface Business Logic Inventory User Interface Business Logic Database System Concurrency Control Security Recovery
(C) Frieder, Grossman, & Goharian 1996, 2002
Marketing User Interface Business Logic
6
Definitions
DBMS - a Database Management System is a set of routines that is capable of providing the following basic functions: » Add » Delete » Update » Retrieve
(C) Frieder, Grossman, & Goharian 1996, 2002
7
Primitive Functionality
Add (X) = Find (X); If not found then insert (X) else return (error_code) Delete (X) = Find (X); If found then remove (X) else return (error_code)
(C) Frieder, Grossman, & Goharian 1996, 2002
8
Primitive Functionality (cont)
Update (X, Y) = Delete (X); If not error_code then Add (Y)
Retrieve (X) = Delete (X); If not error_code then Add (X)
(C) Frieder, Grossman, & Goharian 1996, 2002
9
Database Models
Network » Any links supporting quick access Hierarchical » Links but no cycles (hierarchy) Relational » Data Independence Object – Oriented » Entity Abstraction
(C) Frieder, Grossman, & Goharian 1996, 2002
10
Inverted Index
I N D E X
Posting List
(C) Frieder, Grossman, & Goharian 1996, 2002
11
Inverted List Example
Hank Record 1 Record 3 Record 5
Query: find all occurrences of the name (value of attribute) is ‘Hank’ in the database: Hash to the value Hank in the “index” Scan the posting list for all occurrences
(C) Frieder, Grossman, & Goharian 1996, 2002
12
Inverted Index
Associates a posting list with each attribute value
abacus: abatement: … zoology: 83: 2002: (F3AB, 873A, FF32) (6A15) (D381, DA32) (F623, B001, 879D, 76AA) (AAAA, BBBB, CCCC)
Inverted because it lists for each attribute the location on disk where the value is stored.
(C) Frieder, Grossman, & Goharian 1996, 2002
13
Building an Inverted Index
For each relation r in the collection
» For each attribute t in relation r
– Find attribute t in the item dictionary – If term t exists, add a new disk location to its posting list – Otherwise,
Add attribute value t to the item dictionary Add a node to the posting list
(C) Frieder, Grossman, & Goharian 1996, 2002
14
File Organizations
Unsorted files (Heap Files)
– Good storage efficiency, fast insertion, deletion. – Slow searches
Sorted Files
– Good storage efficiency and search of range – Slow insertion and deletion – not practical (files never sorted) => B+ tree data structure
Hashed-Based Files
– – Not efficient storage Fast insertion, deletion, and equality searches
15
(C) Frieder, Grossman, & Goharian 1996, 2002
Sample Database
Hank graduated: » Michigan » IIT » MIT Hank worked: » IBM » Intel » Bellcore » Harris
(C) Frieder, Grossman, & Goharian 1996, 2002
16
Network Model
MIT IIT
Michigan
Graduated Hank Worked
IBM Intel Bellcore Harris
(C) Frieder, Grossman, & Goharian 1996, 2002
17
Hierarchical Model
Hank Graduated Worked
IBM Bellcore Harris Intel
MIT
IIT
Michigan
(C) Frieder, Grossman, & Goharian 1996, 2002
18
Relational Model
Person
Hank Hank Hank
Graduated
IIT MIT Michigan
Person
Hank Hank Hank Hank
Worked
IBM Intel Harris Bellcore
Key: (Person, Graduated)
Key: (Person, Worked)
(C) Frieder, Grossman, & Goharian 1996, 2002
19
Object Oriented Model
Type EMPLOYEE begin name : char (20); graduated : SETOF (schools); worked : SETOF (companies); end;
INSERT ( 1,Hank, {IIT, Michigan, MIT}, {IBM, Intel, Bellcore, Harris} )
(C) Frieder, Grossman, & Goharian 1996, 2002
20
Relational Model
The initial paper on the relational model was written in 1969 by Codd. System R was a relational prototype implemented in the mid 70’s by IBM. Ingres was a relational prototype implemented at UC Berkeley in the mid 70’s. Finally, commercial offerings of relational systems started with Oracle in 1979 and was quickly followed by SQL/DS and DB2 by IBM in the mid 80’s.
(C) Frieder, Grossman, & Goharian 1996, 2002
21
Data Independence
The relational model allows users to simply specify what data they require, not how to get them. This is referred to as data independence and is a key contribution of the relational model. Older models are referred to as navigational as users must navigate through the data and follow pointers from one datum to another.
(C) Frieder, Grossman, & Goharian 1996, 2002
22
Structure of RDBMS
User Query
Query Optimizer Operators File Manager Buffer Manager Disk Space Manager DB
(C) Frieder, Grossman, & Goharian 1996, 2002
Concurrency Control & Crash Recovery: Transaction Manager Lock Manager Recovery Manager
23
Relational Model
Relational algebra is used to specify the operations allowed within the relational model. Relational algebra is theoretically based on set theory. A relation can be illustrated as a table of rows and columns. The table is referred to as a relation, rows are referred to as tuples, and columns are attributes. Relational operators are closed. A relational operation applied to a relation always results in a relation.
(C) Frieder, Grossman, & Goharian 1996, 2002
24
EMPLOYEE(emp#, name, department, salary)
Example Relation:
Emp# 002 004 007
Name Jones Smith Bond Table Row Column
Department Marketing Sales Diplomacy = = = Relation Tuple Attribute
Salary 300.00 150.00 999.00
(C) Frieder, Grossman, & Goharian 1996, 2002
25
Formal Definition
IF R is the set of attributes (columns), commonly referred to as schema, then r(R) is a mapping of a set of tuples (rows ), commonly referred to as instance. Since each attribute is restricted to a limited domain, a relation is actually a subset of: dom(A1) x dom(A2) x dom(A3) where dom(X) indicates the domain or set of valid values for attribute X.
(C) Frieder, Grossman, & Goharian 1996, 2002
26
Characteristics of Relations
No inherent ordering of tuples. Conceptually, no inherent ordering of attributes. In practice, attributes are ordered based on the initial schema definition. All attribute values within a tuple should be atomic (1NF).
(C) Frieder, Grossman, & Goharian 1996, 2002
27
Key Attributes
Since a relation is merely just a set of tuples, NO DUPLICATE ELEMENTS are theoretically possible. (Unfortunately, some implementations violate this uniqueness definition) A column or set of columns must uniquely identify a tuple. Superkey - any combination of attributes that uniquely identify a tuple.
(C) Frieder, Grossman, & Goharian 1996, 2002
28
Keys (continued)
Key - a superkey from R such that the removal of any attribute results in a set of attributes that is NOT a superkey. Hence, a key is a: MINIMAL SUPERKEY Ex: (emp#, name, department, salary) is a superkey but not a key, because name, department, or salary could be removed and we would still have the superkey, emp#.
(C) Frieder, Grossman, & Goharian 1996, 2002
29
Candidate Keys
It is possible that more than one set of attributes qualify as a key. These are called candidate keys. Typically, one is chosen and referred to as the primary key. This key is underlined in the description of the relation. EMPLOYEE(emp#, name, department, salary)
(C) Frieder, Grossman, & Goharian 1996, 2002
30
Student-Grade Database
Student #
1 1 1 2 2 3 4 4 4 5 6 6
GPA
4.0 4.0 4.0 3.8 3.0 2.1 3.5 3.4 3.9 3.6 4.0 3.1
Degree
Bachelors Masters Doctorate Bachelors Masters Bachelors Bachelors Masters Doctorate Bachelors Associates Bachelors
Key: (Student#, Degree) – Assumes only one degree type per person
(C) Frieder, Grossman, & Goharian 1996, 2002
31
SELECT
SELECT - extract tuples from a relation Syntax: σ
(Relation Name)
σGPA=4.0 (SG) - obtains all tuples from the Student-Grade (SG) relation where the GPA is 4.0. Student #
1 1 1 6
(C) Frieder, Grossman, & Goharian 1996, 2002
GPA
4.0 4.0 4.0 4.0
Degree
Bachelors Masters Doctorate Associates
32
Project
Project retrieves columns from a table Syntax: π (Relation Name)
πstudent#, degree(SG) Retrieves student number and degree from the StudentGrade relation.
(C) Frieder, Grossman, & Goharian 1996, 2002
33
Single – Scan Project
πstudent#, degree(SG)
Student #
1 1 1 2 2 3 4 4 4 5 6 6
Degree
Bachelors Masters Doctorate Bachelors Masters Bachelors Bachelors Masters Doctorate Bachelors Associates Bachelors
34
(C) Frieder, Grossman, & Goharian 1996, 2002
Multiple – Scan Project
πstudent#, GPA(SG)
Student #
1 1 1 2 2 3 4 4 4 5 6 6
(C) Frieder, Grossman, & Goharian 1996, 2002
GPA
4.0 4.0 4.0 3.8 3.0 2.1 3.5 3.4 3.9 3.6 4.0 3.1
35
Undergrad & Graduate Student Grade Database
Student # Graduate (GSG)
Key: (Student#, Degree) 1 1 2 4 4
GPA
4.0 4.0 3.0 3.4 3.9
Degree
Masters Doctorate Masters Masters Doctorate
Student #
1 2 3 4 5 6 6
GPA
4.0 3.8 2.1 3.5 3.6 4.0 3.1
Degree
Bachelors Bachelors Bachelors Bachelors Bachelors Associates Bachelors
36
Undergrad (USG)
(C) Frieder, Grossman, & Goharian 1996, 2002
Set Theoretic Operations: USG UNION GSG
List all information on all students
Student #
1 1 1 2 2 3 4 4 4 5 6 6
(C) Frieder, Grossman, & Goharian 1996, 2002
GPA
4.0 4.0 4.0 3.8 3.0 2.1 3.5 3.4 3.9 3.6 4.0 3.1
Degree
Bachelors Masters Doctorate Bachelors Masters Bachelors Bachelors Masters Doctorate Bachelors Associates Bachelors
37
Set Theoretic Operations:
πstudent#(USG) INTERSECTION πstudent#(GSG)
List all students who were both graduate and undergraduate students
Grad & Undergrad (GU)
Student #
1 2 4
(C) Frieder, Grossman, & Goharian 1996, 2002
38
Set Theoretic Operations:
πstudent#(USG) DIFFERENCE πstudent#(GSG)
List all students who were undergraduate but not graduate students
Only Undergrads (OU)
Student #
3 5 6
(C) Frieder, Grossman, & Goharian 1996, 2002
39
Cartesian Product
Relational Algebra includes: GU x OU For the sets:
1 GU 4
2
3 OU
5 6
GU x OU =
<1,3> <2,3> <4,3>
<1,5> <2,5> <4,5>
<1,6> <2,6> <4,6>
40
(C) Frieder, Grossman, & Goharian 1996, 2002
θ - Join
θ - Join is a Cartesian product with the addition of a
condition that determines which tuples are selected.
Can be logically viewed as: » Step 1: Cartesian Product » Step 2: Select from result of Step 1
(C) Frieder, Grossman, & Goharian 1996, 2002
41
Additional Join Types
In a θ - Join, the joining attributes are explicitly specified. An Equi - Join is the most common θ - Join with an equality as the
condition.
A Natural - Join is an Equi-Join where the joining attributes are all those attributes with a common name (implicitly specified)
(C) Frieder, Grossman, & Goharian 1996, 2002
42
πstudent#,degree(GSG) Equi-Join πstudent#,GPA(USG)
Student #
1 1 2 4 4
GPA
4.0 4.0 3.8 3.5 3.5
Degree
Masters Doctorate Masters Masters Doctorate
(C) Frieder, Grossman, & Goharian 1996, 2002
43
Relational Algebra Summary
Relations are viewed as sets. Relational operations are closed. Any operation on one or more relations yields a relation. No inherent ordering. Unary Operators: SELECT, PROJECT Binary Operators: UNION, INTERSECTION, SET DIFFERENCE, CARTESIAN PRODUCT, JOIN Single Scan: SELECT, PROJECT including key Multiple Scan: All others
(C) Frieder, Grossman, & Goharian 1996, 2002
44
Structured Query Language (SQL)
(C) Frieder, Grossman, & Goharian 1996, 2002
45
Structured Query Language
First relational query language, SQUARE was implemented in System R (1975). SQUARE was followed by SEQUEL. First commercial implementation, Oracle (1979), followed closely by SQL/DS in 1982. Major SQL DBMS vendors: Oracle, IBM (DB2), Sybase, Informix, Computer Associates, and Microsoft.
(C) Frieder, Grossman, & Goharian 1996, 2002
46
SQL Overview
Data Manipulation Language (DML)
» » » » SELECT INSERT UPDATE DELETE
Data Definition Language (DDL)
» CREATE TABLE » CREATE INDEX » GRANT
(C) Frieder, Grossman, & Goharian 1996, 2002
47
SELECT Overview
Single Relation » SELECT » Boolean Operators » IN » BETWEEN » Aggregate Operators » Calculated Attributes » Sorting » Wildcard Searches » GROUP BY » HAVING » NULLS » Varchar Multiple Relations
(C) Frieder, Grossman, & Goharian 1996, 2002
48
Syntax:
SELECT FROM Ex: SELECT p#,name,qty FROM PARTS Ex: SELECT * FROM PARTS
p# 1 2 3
PARTS
name Nut Bolt Wheel qty 42 25 15
(C) Frieder, Grossman, & Goharian 1996, 2002
49
Select with WHERE
SELECT FROM WHERE is of the form: [=,<,>,<>,<=,>=] EX: SELECT * p# FROM PARTS 3 WHERE p# = 3
name Wheel qty 15
(C) Frieder, Grossman, & Goharian 1996, 2002
50
Use of Boolean Operators
Conditions can be separated by Boolean operators: AND, OR, NOT EX: “List information about parts 1 and 2” SELECT * p# name qty FROM PARTS Nut 42 WHERE p# = 1 OR p# = 2 1 2 Bolt 25 EX: “LIST information about all wheels that contain more than 20 in stock” SELECT * FROM PARTS WHERE name = ‘Wheel’ and qty > 20
(C) Frieder, Grossman, & Goharian 1996, 2002
51
Shortcut Number 1: IN
To find information for a list of values, the IN operator may be used: Ex: “List the name of all parts whose part # is 1,2, or 5” SELECT name FROM PARTS WHERE p# IN (1, 2, 5)
Name Nut Bolt
instead of: WHERE (p# = 1) OR (p# = 2) OR (p# = 5)
(C) Frieder, Grossman, & Goharian 1996, 2002
52
Shortcut Number 2: BETWEEN
To find values within a range, it is often easier to use BETWEEN. Ex: “Find all parts where the quantity on hand is greater than or equal to twenty parts, but less than or equal to fifty.” p# name qty SELECT * 1 Nut 42 FROM PARTS 2 Bolt 25 WHERE qty BETWEEN 20 and 50 instead of: WHERE qty >= 20 AND qty <= 50
(C) Frieder, Grossman, & Goharian 1996, 2002
53
Aggregate Operators
A common requirement is to compute statistics such as MIN, MAX, and AVERAGE on a range of data. Ex: Find the min, max, and average quantity of all wheels. SELECT MIN(qty), MAX(qty), AVG(qty) FROM PARTS min(qty) max(qty) avg(qty) WHERE name = ‘Wheel’ 15 15 15
(C) Frieder, Grossman, & Goharian 1996, 2002
54
Calculated Attributes
A new attribute is obtained by using arithmetic operators (+,-, *, /) on other numeric attributes. All operators follow standard precedence. Ex: List all parts and their quantity given an increase of 20%
p# 1 SELECT p#,name,(qty+(qty*.2)) ‘qty’ 2 3 FROM PARTS
name qty Nut 50 Bolt 30 Wheel 18
55
(C) Frieder, Grossman, & Goharian 1996, 2002
Sorting
ORDER BY [DESC] can be added to SELECT to obtain sorted output. Ex: List all part names in ascending order: SELECT p#, name, qty FROM PARTS ORDER BY name
p# name 2 Bolt 1 Nut 3 Wheel qty 25 42 15
For descending order change to: ORDER BY name DESC
(C) Frieder, Grossman, & Goharian 1996, 2002
56
Sorting Calculated Attributes
To refer to a computed attribute in the ORDER BY, use the position in the list of columns following SELECT. Ex: “List all part information in descending order of a projected 20 percent reduction in quantity” SELECT p#,name,(qty-(qty*.2)) ‘qty’ FROM PARTS ORDER BY 3 DESC
(C) Frieder, Grossman, & Goharian 1996, 2002
p# 1 2 3
name qty Nut 34 Bolt 20 Wheel 12
57
Wildcard Searches of Strings
The LIKE operator is used to search parts of a string. The following wildcard characters are used: % - zero or more characters _ - exactly one character Ex: List all parts whose name starts with a ‘W’
p# name SELECT * 3 Wheel FROM PARTS WHERE name LIKE ‘W%’
(C) Frieder, Grossman, & Goharian 1996, 2002
qty 15
58
More LIKE Examples
Ex: List all parts whose name starts with a ‘W’ and ends with an ‘L’ p# name Wheel WHERE name LIKE ‘W%L’ 3 Ex: List all parts whose name is three characters long and starts with a ‘N’ name WHERE name LIKE ‘N__’ p#
1 Nut qty 15
qty 42
(C) Frieder, Grossman, & Goharian 1996, 2002
59
Review
List all parts whose name starts with a ‘W’ or whose part number is either 2,4,8,11,12,13,14,15. Sort the list in descending order by quantity. SELECT * FROM PARTS WHERE _____ LIKE _______ OR _____ IN (_,_,_) OR ______ BETWEEN __ AND __ ORDER BY ____ DESC
(C) Frieder, Grossman, & Goharian 1996, 2002
60
Review (Answer)
List all parts whose name starts with a ‘W’ or whose part number is either 2,4,8,11,12,13,14,15. Sort the list in descending order by quantity. SELECT * FROM PARTS WHERE name LIKE ‘W%’ OR p# IN (2,4,8) OR p# BETWEEN 11 AND 15 ORDER BY qty DESC
(C) Frieder, Grossman, & Goharian 1996, 2002
61
More Review
Ex: List the total number of parts that would exist if quantity was increased by 25 percent.
Ex: List all parts that would have a quantity greater than 50, if the quantity was increased by 25 percent.
(C) Frieder, Grossman, & Goharian 1996, 2002
62
More Review (Answer)
Ex: List the total number of parts that would exist if the quantity was increased by 25 percent. SELECT SUM(qty + qty * .25) FROM PARTS Sum(qty + qty * .25)
103
Ex: List all parts that would have a quantity greater than 50, if the quantity was increased by 25 percent. SELECT * p# name 1 Nut FROM PARTS WHERE qty + qty * .25 > 50
(C) Frieder, Grossman, & Goharian 1996, 2002
qty 42
63
GROUP BY
It is often necessary to review data about groups of related tuples. Consider an employee relation that contains the “Department” attribute. Assume one employee may work in only a single department. DEPARTMENT partitions the EMP set into subsets:
Sales Marketing Service Finance
(C) Frieder, Grossman, & Goharian 1996, 2002
64
GROUP BY (continued)
EMPLOYEE (emp#, name, salary, department) emp# name salary department
1 2 3 4 5 6 7 8 Fred Mike Sam Martha Juanita Steve Tom Sue 200 300 400 350 500 800 200 900 Sales Sales Sales Marketing Marketing Finance Service Service
65
(C) Frieder, Grossman, & Goharian 1996, 2002
GROUP BY (continued)
For each department, list the average salary. SELECT department, AVG(salary) FROM EMPLOYEE GROUP BY department department AVG(salary) Sales 300 Marketing 425 Finance 800 Service 550
(C) Frieder, Grossman, & Goharian 1996, 2002
66
Group By (continued)
If a WHERE clause exists, it is executed as well. Ex: For each department, list the highest salary, but exclude all employees whose name starts with a ‘S’ SELECT department, MAX(salary) FROM EMPLOYEE WHERE name NOT LIKE ‘S%’ GROUP BY department
(C) Frieder, Grossman, & Goharian 1996, 2002
67
GROUP BY (continued)
department Sales Marketing Service max(salary) 300 500 200
(C) Frieder, Grossman, & Goharian 1996, 2002
68
GROUP BY (continued)
More refined groups are obtained by using multiple attributes in GROUP BY. Add the attribute “REGION” to the employee relation. Now the department partition may be partitioned into different regions.
North West
MARKETING
South
East
(C) Frieder, Grossman, & Goharian 1996, 2002
69
GROUP BY (continued)
EMPLOYEE (emp#, name, salary, department,rgn) emp# name salary department rgn
1 2 3 4 5 6 7 8 Fred Mike Sam Martha Juanita Steve Tom Sue 200 300 400 350 500 800 200 900 Sales Sales Sales Marketing Marketing Finance Service Service north north east west west south north south
70
(C) Frieder, Grossman, & Goharian 1996, 2002
GROUP BY (multiple partitions)
To specify more than one partition, simply add more attributes after the GROUP BY: Ex: Compute the average salary for each region within each department. SELECT department, region, AVG(salary) FROM EMPLOYEE GROUP BY department, region
(C) Frieder, Grossman, & Goharian 1996, 2002
71
GROUP BY (continued)
department Sales Sales Marketing Finance Service Service reg avg(salary) north 250 east 400 west 425 south 800 north 200 south 900
(C) Frieder, Grossman, & Goharian 1996, 2002
72
HAVING (restricts groups)
Syntax: HAVING (list of conditions) Aggregate functions used (SUM, MIN, MAX, COUNT). Ex: List the average salary for all departments that have more than two employees. SELECT department, AVG(salary) FROM EMPLOYEE department avg(salary) GROUP BY department Sales 300 HAVING COUNT(*) > 2
(C) Frieder, Grossman, & Goharian 1996, 2002
73
HAVING (continued)
Ex: List the minimum salary in departments sales, marketing and service as long as the department has an average salary greater than 400. SELECT department, MIN(salary) FROM EMPLOYEE WHERE department IN (‘sales’,’marketing’,’service’) GROUP BY department HAVING AVG(salary) > 400
department min(salary) Marketing 350
(C) Frieder, Grossman, & Goharian 1996, 2002
74
Multiple Entity Relationships
Multiple tables needed to store multi-entity relationships. One to many: One parent may have many children Many to one: Many people may attend a single meeting Many to Many: Students graduate from multiple colleges Each college graduates by many students
(C) Frieder, Grossman, & Goharian 1996, 2002
75
Single Relation Design
It is tempting to try and stuff all multi-valued relationships into a single relation: emp# name salary 1 Fred 200 2 Ethel 300 3 Mike 400 college1 Harvard IIT MIT college2
Unused
college3
Unused Unused
Michigan Stanford
IIT
(C) Frieder, Grossman, & Goharian 1996, 2002
76
Problems with Poor Design
Incompleteness » Unable to store more than three colleges per individual. Many to many relationships can have an infinite number of values. Query Complexity » Queries such as “list all colleges attended by Mike” become substantially more difficult. Wasted Storage » Many entries are not used.
(C) Frieder, Grossman, & Goharian 1996, 2002
77
Multiple Relations: 2 Relations for Many-Many
EMPLOYEE emp# name salary 1 Fred 200 2 3 Ethel Mike 300 400 3 3 3 MIT Stanford IIT emp# 1 2 2 COLLEGE name Harvard IIT Michigan
emp# in EMPLOYEE is a primary key emp# in COLLEGE is a foreign key emp#,name in COLLEGE is a primary key
(C) Frieder, Grossman, & Goharian 1996, 2002
78
Preserving Relationships
Joining the two relations restores the original relation assuming a key is part of the partitioning. Poor partitioning may result in additional spurious tuples being introduced.
(C) Frieder, Grossman, & Goharian 1996, 2002
79
Equi-Join
Explicit indication of the join attribute conditions. Typically involves joining on foreign key attributes. List all colleges attended by “Mike” SELECT b.name FROM EMPLOYEE a, COLLEGE b WHERE a.emp# = b.emp# AND a.name = ‘Mike’ name
MIT Stanford IIT
(C) Frieder, Grossman, & Goharian 1996, 2002
80
Problem with only using Two Relations for Many-Many
Suppose we need to maintain the college location as well. location must be replicated many times. EMPLOYEE emp# name salary 1 Fred 200 2 3 Ethel Mike 300 400 COLLEGE emp# name 1 Harvard 2 IIT 2 Michigan 3 MIT 3 Stanford 3 IIT location Boston Chicago Ann Arbor Boston Stanford Chicago
(C) Frieder, Grossman, & Goharian 1996, 2002
81
Use of 3 Relations
To avoid needless repetition, we create a separate table for each of the two entities involved in a many-many relationship, and then a third “linking” relation to contain data about the relationship between the entities. All 1-1 information about employees: EMPLOYEE(emp#, name, salary) All 1-1 information about colleges: COLLEGE(col#, name, location) All data pertaining to a single employee attending a single college: ATTENDS(emp#, col#, gpa)
(C) Frieder, Grossman, & Goharian 1996, 2002
82
Use of Three Relations (continued)
EMPLOYEE
emp# 1 2 3 name Fred Ethel Mike salary 200 300 400 col# 11 22 33 44 55
COLLEGE
name Harvard IIT Michigan MIT Stanford gpa 2.45 3.79 3.65 2.85 2.65 4.0
83
location Boston Chicago Ann Arbor Boston Stanford
ATTENDS
emp# 1 2 2 3 3 3 col# 11 22 33 44 55 22
(C) Frieder, Grossman, & Goharian 1996, 2002
Sample Query (3 Relations)
Ex: List the names of all colleges attended by “Mike.”
SELECT b.name FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND name b.col# = c.col# AND Michigan MIT a.name = ‘Mike’
Stanford
(C) Frieder, Grossman, & Goharian 1996, 2002
84
Sample Query (3 Relations)
Ex: List the names of all employees who attended Harvard.
SELECT a.name FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND name Fred b.name = ‘Harvard’
(C) Frieder, Grossman, & Goharian 1996, 2002
85
Subqueries
Instead of hard-coding the list that is used by IN, it is possible to dynamically generate the list using a subquery. Ex: SELECT * FROM EMPLOYEE WHERE emp# IN (1,2,3,4,5,7)
could be rewritten as; Ex: SELECT * FROM EMPLOYEE WHERE emp# IN (SELECT num FROM SAMPLE)
Assuming SAMPLE(num) is a relation with tuples: 1,2,3,4,5, and 7.
(C) Frieder, Grossman, & Goharian 1996, 2002
86
EXISTS
EXISTS prefaces a subquery and evaluate to TRUE if one or more tuples are present in the result set of the subquery. Ex: List all employees who attended at least one college. SELECT * FROM EMPLOYEE a WHERE EXISTS (SELECT c.emp# FROM ATTENDS c WHERE c.emp# = a.emp#)
(C) Frieder, Grossman, & Goharian 1996, 2002
87
DISTINCT
DISTINCT is used to remove duplicates. Ex: List all distinct salaries. SELECT DISTINCT(salary) FROM EMPLOYEE
distinct (salary) 200 300 400
(C) Frieder, Grossman, & Goharian 1996, 2002
88
UNION
The union of two result sets (of the same data type) is obtained via the UNION operator (duplicates are removed). The OR query can be written using UNION: Syntax: UNION Ex: Obtain billing from 2000 and 2001. SELECT * FROM BILL2000 UNION SELECT * FROM BILL2001
(C) Frieder, Grossman, & Goharian 1996, 2002
89
EXCEPT
Syntax: EXCEPT Ex: Find employees, who attended IIT but not MIT.
SELECT e.emp# FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND b.name = ‘IIT’ EXCEPT SELECT e.emp# FROM EMPLOYEE a, COLLEGE b, ATTENDS c WHERE a.emp# = c.emp# AND b.col# = c.col# AND b.name = ‘MIT’
(C) Frieder, Grossman, & Goharian 1996, 2002
90
Other Data Manipulation
UPDATE » Modify tuples in a single relation DELETE » Remove tuples from a single relation INSERT » Add tuples to a single relation
(C) Frieder, Grossman, & Goharian 1996, 2002
91
INSERT
Syntax: INSERT INTO [list of columns] VALUES () Ex. For the relation: EMPLOYEE (emp#, name, salary) INSERT INTO EMPLOYEE (emp#, name, salary) VALUES (5, ‘Herbert’, 200) Note: If optional column list is not found, the values must be listed in the order of their initial definition
(C) Frieder, Grossman, & Goharian 1996, 2002
92
INSERT - Format 2
Syntax: INSERT INTO (select statement) Ex: Copy all tuples in the EMPLOYEE relation and place them in NEW_EMPLOYEE INSERT INTO NEW_EMPLOYEE SELECT * FROM EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002
93
UPDATE
Syntax: UPDATE WHERE where is of the form: SET = Ex: Modify John’s salary to 150 UPDATE EMPLOYEE SET salary = 150.00 WHERE name = ‘John’
(C) Frieder, Grossman, & Goharian 1996, 2002
94
UPDATE (continued)
An assignment statement may contain a numeric expression: Ex: Give all employees a ten percent raise. UPDATE EMPLOYEE SET salary = salary * 1.10
(C) Frieder, Grossman, & Goharian 1996, 2002
95
DELETE
Syntax: DELETE FROM WHERE Ex: Remove all employees who work in department 5 DELETE FROM EMPLOYEE WHERE dept = 5 To remove all employees: DELETE FROM EMPLOYEE
(C) Frieder, Grossman, & Goharian 1996, 2002
96
Data Definition Language (DDL)
» » » » » » » Create Table Drop Table Create Index Drop Index GRANT REVOKE ALTER TABLE
(C) Frieder, Grossman, & Goharian 1996, 2002
97
CREATE TABLE
CREATE TABLE [ ], .... [ ] (
) typical data types are: CHAR(x), VARCHAR(x), SMALLINT, INTEGER, DATE, TIME, DECIMAL (x,y)
(C) Frieder, Grossman, & Goharian 1996, 2002
98
CREATE TABLE (example)
CREATE TABLE EMPLOYEE (emp# SMALLINT, name CHAR(20), salary DECIMAL(5,2))
(C) Frieder, Grossman, & Goharian 1996, 2002
99
Varying Length Character
VARCHAR(x) indicates that a string will be no longer than x characters. Fixed length strings are padded to fill fixed space. Varying length strings have a Length Indicator.
FIXED
200 Hank FILL 252.35 200
VARYING
4 Hank 252.35
(C) Frieder, Grossman, & Goharian 1996, 2002
100
Effect of Varying Length Columns on Performance FIXED
tuple 1 tuple 2 tuple 3 tuple 4 tuple 5
VARCHAR
tuple 1 tuple 2 (cont) tuple 3 (cont) tuple 4 tuple 5 tuple 2 tuple 3
A modification to the FIXED table only affects one tuple. A modification to VARCHAR might result in the reshuffling or copying to OVERFLOW of other tuples so that they fit on a single page.
(C) Frieder, Grossman, & Goharian 1996, 2002
101
Rule of Thumb (VARCHAR)
Avoid VARCHAR when it is not necessary. One “rule of thumb” is to avoid VARCHAR when the maximum savings is less then thirty characters. Advantage: Save storage Disadvantage: Degrades performance of UPDATE
(C) Frieder, Grossman, & Goharian 1996, 2002
102
Nulls
An attribute may be defined as null. This indicates that the value is unknown and avoids the need for user-defined special indicators. CREATE TABLE EMPLOYEE (emp# SMALLINT, name CHAR(20), salary
DECIMAL(5,2) NULL)
(C) Frieder, Grossman, & Goharian 1996, 2002
103
Effect of Nulls on Performance
Any data in a tuple that allows nulls is prefaced by a null indicator.
Null Indicator
200 Hank 252.35 Null Indicator (1 byte) 200
No Null
Hank 252.35
(C) Frieder, Grossman, & Goharian 1996, 2002
104
Effect of Nulls on Performance
A table that specifies about NULLS results in one byte added for each tuple stored in the relation. This can be a tremendous waste of storage if no data are ever NULL. Most DBMS default to allow NULLS, while many real world applications do not require NULLS. Performance of retrieval and update is slightly degraded because the null indicator must be examined before checking tuple content.
(C) Frieder, Grossman, & Goharian 1996, 2002
105
Syntax Modifications for Nulls
Allow NULL specification in INSERT » To add an employee whose salary is unknown: – INSERT INTO EMPLOYEE (3,’Hank’, null) Use of NULL in select. SELECT * FROM EMPLOYEE WHERE salary IS NULL
(C) Frieder, Grossman, & Goharian 1996, 2002
106
CREATE INDEX
The relational model does not specify how data should be accessed. To create a separate access path, SQL allows users to use CREATE INDEX to create a separate structure, called access method. Usually B+tree is used.
Ex: CREATE UNIQUE INDEX I1 ON EMPLOYEE (num) SELECT * FROM EMPLOYEE WHERE num = 25 will use a B-tree instead of a sequential scan.
(C) Frieder, Grossman, & Goharian 1996, 2002
107
DROP INDEX / TABLE
To remove an index use: DROP INDEX To remove a table use: DROP TABLE
(C) Frieder, Grossman, & Goharian 1996, 2002
108
Referential Integrity
When a primary key is modified, it is often necessary to delete the corresponding foreign keys. A single employee id might be a foreign key in many tables.
Employee
Colleges
Projects
Dependents
(C) Frieder, Grossman, & Goharian 1996, 2002
109
Specification of Primary and Foreign Key
EMPLOYEE(emp#, name, salary) and COLLEGE (emp#, col#, col_name) emp# is a primary key in the EMPLOYEE relation and emp# is a foreign key of the COLLEGE relation. CREATE TABLE EMPLOYEE CREATE TABLE COLLEGE (emp# SMALLINT, (emp# SMALLINT, name CHAR(20), col# SMALLINT, salary DECIMAL(5,2), col_name CHAR(20), PRIMARY KEY (emp#)) FOREIGN KEY K1 (emp#) REFERENCES EMPLOYEE ON DELETE CASCADE, PRIMARY KEY (emp#, col#))
(C) Frieder, Grossman, & Goharian 1996, 2002
110
Referential Integrity (continued)
ON DELETE: » [CASCADE, SET NULL, RESTRICT] CASCADE » A delete to a primary key results in a delete of all corresponding tuples that contain the foreign key. SET NULL » A delete to a primary key results in null values placed in all corresponding foreign keys. RESTRICT » A delete to primary key results in an error if a matching foreign key exists.
(C) Frieder, Grossman, & Goharian 1996, 2002
111
Views
A view is a logical relation. It is defined as a subset of tuples and attributes of a physical (base) relation. Syntax: CREATE VIEW as Ex: Create a view on the EMPLOYEE relation such that the salary attribute is omitted. CREATE VIEW V1 AS (SELECT num, name FROM EMPLOYEE) Now a user may be given access to only V1 without access to the base relation: EMPLOYEE.
(C) Frieder, Grossman, & Goharian 1996, 2002
112
Views (continued)
The tuples in the base relation may be restricted by adding a WHERE condition to the view definition. EX: Create a view that contains information only about employees in named ‘Steve’ CREATE VIEW V2 as (SELECT * FROM EMPLOYEE WHERE name = ‘Steve’) Any queries against V2 are executed by merging the view definition with the query to ensure the result set only accesses data allowed by the view.
(C) Frieder, Grossman, & Goharian 1996, 2002
113
View Insert
An insert to a view that does not contain all of the attributes in the base relation results in the additional attributes being set to NULL. This is only valid if nulls are permitted. Ex: Consider the view V1 that omits the SALARY attribute. INSERT INTO V1 (4, ‘Hank’) is equivalent to: INSERT INTO EMPLOYEE (4,’Hank’, null) A null value will be placed in the salary attribute.
(C) Frieder, Grossman, & Goharian 1996, 2002
114
Security and Authorization
Access to relations and views is controlled by the GRANT and REVOKE statements. GRANT [ALL, SELECT, INSERT, UPDATE, DELETE] ON