Database Conceptual and Logical Design
Zachary G. Ives
University of Pennsylvania CIS 550 – Database & Information Systems
October 2, 2004
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
How Do I Talk to the DB?
Generally, apps are in a different (“host”) language with embedded SQL statements
Static: SQLJ, embedded SQL in C Runtime: ODBC, JDBC, ADO, OLE DB, …
Typically, predefined mappings between host language types and SQL types (e.g., VARCHAR String or char[])
2
Embedded SQL in C
EXEC SQL BEGIN DECLARE SECTION int sid; char name[20]; EXEC SQL END DECLARE SECTION … EXEC SQL INSERT INTO STUDENT VALUES (:sid, :name);
EXEC SQL SELECT name, age INTO :sid, :name FROM STUDENT WHERE sid < 20
3
The Impedance Mismatch and Cursors
SQL is set-oriented – it returns relations There’s no relation type in most languages! Solution: cursor that’s opened, read DECLARE sinfo CURSOR FOR SELECT sid, name FROM STUDENT … OPEN sinfo; while (…) { FETCH sinfo INTO :sid, :name … } CLOSE sinfo;
4
JDBC: Dynamic SQL for Java
Roughly speaking, a Java version of ODBC
You’ll likely use this in the course project
See Chapter 6 of the text for more info
import java.sql.*; Connection conn = DriverManager.getConnection(…); PreparedStatement stmt = conn.prepareStatement(“SELECT * FROM STUDENT”); … ResultSet rs = stmt.executeQuery(); while (rs.next()) { sid = rs.getInteger(1); … }
5
Database-Backed Web Sites
We all know traditional static HTML web sites:
Web Browser
HTTP-Request GET ...
Web-Server
HTML-File
Load File
HTML-File
File-System
6
DB Access with Java Applets and Server Processes
Java Applet
Browser JVM
TCP/UDP IP
Java-Server-Process
JDBC Driver manager
JDBCDriver JDBCDriver JDBCDriver
Sybase
Oracle
...
7
Java Applets: Discussion
Advantages:
Can take advantage of client processing Platform independent – assuming standard java
Disadvantages:
Requires JVM on client; self-contained Inefficient: loading can take a long time ... Resource intensive: Client needs to be state of the art Restrictive: can only connect to server where applet was loaded from (for security … can be configured)
8
Server Pages (*P) and Servlets (IIS, Tomcat, …)
Web Server
Web Server
HTTP Request Load File File-System
HTML HTML File Output
HTML? Script?
File
Server Extension
I/O, Network, DB
9
ASP/JSP/PHP Versus Servlets
The goal: combine direct HTML (or XML) output with program code that’s executed at the server
The code is responsible for generating more HTML, e.g., to output the results of a database table as HTML table elements
How might I do this?
HTML with embedded code (*P) Code that prints out HTML (Servlets)
10
Now: How Do We Get the Database in the First Place?
Database design…
11
Databases Anonymous: A 6-Step Program
1. Requirements Analysis: what data, apps, critical operations 2. Conceptual DB Design: high-level description of data and constraints – typically using ER model 3. Logical DB Design: conversion into a schema 4. Schema Refinement: normalization (eliminating redundancy) 5. Physical DB Design: consider workloads, indexes and clustering of data 6. Application/Security Design
12
Entity-Relationship Diagram (based on our running example)
Underlined attributes are keys
fid
entity set relationship set
PROFESSORS
name
Teaches
semester
STUDENTS sid
Takes serno
COURSES subj
name
exp-grade
cid
13
attributes (recall these have domains)
Conceptual Design Process
What are the entities being represented? What are the relationships?
Takes STUDENTS
What info (attributes) do we store about each?
exp-grade
name
What keys & integrity constraints do we have?
sid
14
Translating Entity Sets to Logical Schemas & SQL DDL
Fairly straightforward to generate a schema…
CREATE TABLE STUDENTS (sid INTEGER, name VARCHAR(15) PRIMARY KEY (sid) ) CREATE TABLE COURSES (serno INTEGER, subj VARCHAR(30), cid CHAR(15), PRIMARY KEY (serno) )
15
Translating Relationship Sets
Generate schema with attributes consisting of:
Key(s) of each associated entity (foreign keys) Descriptive attributes
CREATE TABLE Takes (sid INTEGER, serno INTEGER, exp-grade CHAR(1), PRIMARY KEY (?), FOREIGN KEY (serno) REFERENCES COURSES, FOREIGN KEY (sid) REFERENCES STUDENTS)
16
… OK, But What about Connectivity in the E-R Diagram?
Attributes can only be connected to entities or relationships Entities can only be connected via relationships As for the edges, let’s consider kinds of relationships and integrity constraints…
PROFESSORS Teaches
COURSES
(warning: the book has a slightly different notation here!)
17
Logical Schema Design
Roughly speaking, each entity set or relationship set becomes a table (not always be the case; see Thursday) Attributes associated with each entity set or relationship set become attributes of the relation; the key is also copied (ditto with foreign keys in a relationship set)
18
Binary Relationships & Participation
Binary relationships can be classified as 1:1, 1:Many, or Many:Many, as in:
1:1
1:n
m:n
19
1:Many (1:n) Relationships
Placing an arrow in the many one direction, i.e. towards the entity that’s ref’d via a foreign key Suppose profs teach multiple courses, but may not have taught yet:
PROFESSORS Teaches COURSES
Partial participation (0 or more…)
Suppose profs must teach to be on the roster:
PROFESSORS
Total participation (1 or more…)
20
Teaches
COURSES
Many-to-Many Relationships
Many-to-many relationships have no arrows on edges
The “relationship set” relation has a key that includes the foreign keys, plus any other attributes specified as key STUDENTS Takes
COURSES
21
Examples
Suppose courses must be taught to be on the roster Suppose students must have enrolled in at least one course
22
Representing 1:n Relationships in Tables
• Key of relationship fid INTEGER, serno CHAR(15), set:
CREATE TABLE Teaches(
semester CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS, FOREIGN KEY (serno) REFERENCES Teaches)
CREATE TABLE Teaches_Course(
• Or embed relationship in “many” entity set:
serno INTEGER, subj VARCHAR(30), cid CHAR(15), fid CHAR(15), when CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS)
23
1:1 Relationships
If you borrow money or have credit, you might get:
ssn
rid
CreditReport delinquent? debt
Describes
Borrower name
What are the table options?
24
Roles: Labeled Edges
Sometimes a relationship connects the same entity, and the entity has more than one role:
Includes Assembly
qty
Subpart id
Parts
name
This often indicates the need for recursive queries
25
DDL for Role Example
CREATE TABLE Parts (Id INTEGER, Name CHAR(15), … PRIMARY KEY (ID) )
CREATE TABLE Includes (Assembly INTEGER, Subpart INTEGER, Qty INTEGER, PRIMARY KEY (Assemb, Sub), FOREIGN KEY (Assemb) REFERENCES Parts, FOREIGN KEY (Sub) REFERENCES Parts)
26
Roles vs. Separate Entities
id Husband
name What is the difference between these two representations? Married Husband Person Wife id name
27
Married
Wife
name
id
ISA Relationships: Subclassing (Structurally)
Inheritance states that one entity is a “special kind” of another entity: “subclass” should be member of “base class”
id People
name
ISA Employees salary
28
But How Does this Translate into the Relational Model?
Compare these options:
Two tables, disjoint tuples Two tables, disjoint attributes One table with NULLs Object-relational databases
29
Weak Entities
A weak entity can only be identified uniquely using the primary key of another (owner) entity. Owner and weak entity sets in a one-to-many relationship set, 1 owner : many weak entities Weak entity set must have total participation
People ssn
Feeds weeklyCost
Pets name species
name
30
Translating Weak Entity Sets
Weak entity set and identifying relationship set are translated into a single table; when the owner entity is deleted, all owned weak entities must also be deleted CREATE TABLE Feed_Pets ( name VARCHAR(20), species INTEGER, weeklyCost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)
31
N-ary Relationships
Relationship sets can relate an arbitrary number of entity sets:
Student
Indep Study
Project
Advisor
32
Summary of ER Diagrams
One of the primary ways of designing logical schemas CASE tools exist built around ER (e.g. ERWin, PowerBuilder, etc.)
Translate the design automatically into DDL, XML, UML, etc. Use a slightly different notation that is better suited to graphical displays Some tools support constraints beyond what ER diagrams can capture
Can you get different ER diagrams from the same data?
33
Schema Refinement & Design Theory
ER Diagrams give us a start in logical schema design Sometimes need to refine our designs further
There’s a system and theory for this Focus is on redundancy of data
Let’s briefly touch on one key concept in preparation for Thursday’s lecture on normalization…
34
Not All Designs are Equally Good
Why is this a poor schema design?
Stuff(sid, name, cid, subj, grade)
And why is this one better?
Student(sid, name) Course(cid, subj) Takes(sid, cid, exp-grade)
35
Focus on the Bad Design
sid name cid subj exp-grade 1 Sam 570 AI B 23 Nitin 550 DB A 45 Jill 505 OS A 1 Sam 505 OS C
Certain items (e.g., name) get repeated Some information requires that a student be enrolled (e.g., courses) due to the key
36
Functional Dependencies Describe “Key-Like” Relationships
A key is a set of attributes where:
If keys match, then the tuples match
A functional dependency (FD) is a generalization:
If an attribute set determines another, written A ! B then if two tuples agree on A, they must agree on B: sid ! Address What other FDs are there in this data?
FDs are independent of our schema design choice
37
Formal Definition of FD’s
Def. Given a relation scheme R (a set of attributes) and subsets X,Y of R:
An instance r of R satisfies FD X Y if, for any two tuples t1, t2 2 r, t1[X ] = t2[X ] implies t1[Y] = t2[Y]
For an FD to hold for scheme R, it must hold for every possible instance of r (Can a DBMS verify this? Can we determine this by looking at an instance?)
38
General Thoughts on Good Schemas
We want all attributes in every tuple to be determined by the tuple’s key attributes
What does this say about redundancy?
But:
What about tuples that don’t have keys (other than the entire value)? What about the fact that every attribute determines itself?
Stay tuned for Thursday!
39