Introduction to Databases
A Flat-file database
A Flat-file database holds its data in a single file of fixed-size records. Think of
the file as a two-dimensional table:
LNAME FNAME ADDRESS CITY STATE ZIP
Simpson Homer 43 Pine St SpringField NY 10201
Gumble Barney 1 Oak St SpringField NY 10233
Flanders Ned 41 Pine St SpringField NY 10201
Groundskeeper Willy 3 Auch St. SpringField NY 10244
Flat-file databases are adequate in only the simplest situations.
An address application
Data is held in memory in a User-Defined Type (note fixed-size strings):
Type Address
FName As String * 20
LName As String * 20
Address As String * 40
City As String * 15
State As String * 2
Zip As String * 5
End Type
The data is stored in a random access file, so that only one record at a time is
copied into memory, and records at specific offsets can be accessed directly:
Public CR As Address ‘Global variable in code module
' Generate the full data file path and name.
' We store it in the application directory.
FullFileName = App.Path & "\" & FILENAME
FileNum = FreeFile
' Open the data file and calculate the number of records in it..
Open FullFileName For Random As #FileNum Len = Len(CR)
NumberOfRecords = (LOF(FileNum) / Len(CR))
1
Recall that VB makes reading or storing records a snap:
Public CurrentRecord As Integer ‘Global variable in code module
Public Sub DisplayRecord(Record As Integer)
' Reads the specified record from the file and displays it.
Get #FileNum, Record, CR
txtLName.Text = CR.LName
txtFName.Text = CR.FName
txtAddress.Text = CR.Address
txtCity.Text = CR.City
txtState.Text = CR.State
txtZip.Text = CR.Zip
End Sub
Public Sub SaveCurrentRecord()
' Saves the current record.
CR.LName = txtLName.Text
CR.FName = txtFName.Text
CR.Address = txtAddress.Text
CR.City = txtCity.Text
CR.State = txtState.Text
CR.Zip = txtZip.Text
Put #FileNum, CurrentRecord, CR
End Sub
What if you want to use classes? For technical reasons the UDT is necessary
in the Get and Put statements. You can have a class CAddress, and then do
one of the following two things.
1. Hide the UDT in CAddress: make it a private member of the class. The
class CAddress will have I/O methods that contain Get and Put
statements.
2. Define a second class CAddressBuffer and hide the UDT there. This
class’s job will be to transfer data between a random access file and a
CAddress object, using the UDT. If Address were a previously-defined
class (third party?) this is your best option.
2
What is a database?
A database is a place to store organized information for a set of purposes.
A Flat-file database holds its data in a single file of fixed-size records. This is
only adequate in simple situations, otherwise, you need a…
…Relational database, which has multiple tables. A table has rows and
columns. The rows are the records or tuples of the table; the columns are the
fields or attributes classes. An individual cell is an attribute.
With few exceptions, each table in a relational database should have a
primary key that uniquely identifies the record in that table. Usually the
primary key is one of the fields:
CHARACTER table
SIN LNAME FNAME PHONE AGE
404 202 101 Simpson Homer 555-DUFF 35
434 626 474 Gumble Barney 555-BURP 35
454 737 010 Flanders Ned 555-LEFT 37
464 292 171 Groundskeeper Willy 555-AUCH 50
Here the SIN field can be the primary key. If no such field is present in the
problem domain, invent an arbitrary, unique serial number for each record.
Tables will be related (hence relational database).
STOCK table
STOCK_NO DESCRIPTION WHOLESALE RETAIL MAN_ID
101 Hammer … … 177
103 Chisel … … 177
104 Sander … … 202
106 Saw … … 111
3
MANUFACTURER table
ID NAME ADDRESS PHONE
177 Acme Ltd. … 555-ACME
178 Universal Corp. … 555-UNIV
Field MAN_ID in the Stock table takes us to the Manufacturer table. It is
called a foreign key. This is usually done to avoid duplicate information.
The most common way to interact with a database is to use the special
purpose language SQL (Structured Query Language) to form queries and
other statements:
SELECT * FROM STOCK
WHERE RETAIL > 5.00
ORDER BY WHOLESALE
The result of a select query is called a RecordSet or ResultSet. Both VB and
Java use SQL in their database access code.
Most commercial VB applications interact with a database as at least part of
their functionality.
4
Essential SQL
Identifiers
Names of tables and fields can be enclosed in [square brackets]. This is
mandatory when a name contains spaces.
o Field17
o [FIELD17] SQL is case – insensitive
o [Max Temp]
A Field name can be prefixed by its table's name. This is mandatory when the
name is ambiguous.
o Table2.Field17
o [Table2].[Field17]
o [Weather Statistics].[Max Temp]
Where a list of field names is expected, * stands for all fields, and Table2.*
stands for all fields in Table2.
Statements
These are the statements in SQL:
o CREATE – add an index to a table, or add a table to a database
o DROP – remove an index from a table, or a table from a database
o INSERT INTO – add record(s) to a table
o DELETE – remove records
o UPDATE – alter records
o SELECT -- create a collection of records
o SELECT INTO – create a new table as the result of a selection
Convention: SQL reserved word are capitalized.
Whitespace doesn't matter in SQL.
String constants should be delimited with 'single quotes'
5
SELECT statement
SELECT * FROM Publishers
SELECT * FROM [ALL TITLES]
All Titles is the name of a stored query, not a table like Publishers, but both count
as "table expressions".
SELECT PubID, Name, [Company Name] FROM Publishers
You can "project" out just the fields of interest.
WHERE clause
The WHERE clause lets you skip over records that don't meet a criteria
SELECT * FROM Publishers WHERE PubID = 200
How to retrieve one record from a database.
SELECT * FROM Publishers WHERE PubID , =, =, , BETWEEN, IN and LIKE.
BETWEEN is used for ranges (includes end values); IN is used for matching
one of a number of choices.; LIKE is used for pattern matching in strings.
The logical operators include AND, OR and NOT.
Note the special syntax above for testing for the NULL value Many databases
have fields that allow the NULL value (for missing or omitted data).
6
ORDER BY clause
SELECT * FROM Employees ORDER BY [Last Name]
SELECT * FROM Employees ORDER BY [Last Name] DESC
Employees in reverse alphabetical order.
SELECT * FROM Employees ORDER BY [Last Name], [First Name]
Break ties using First name.
SELECT * FROM Employees ORDER BY [Age] DESC, [Last Name]
List older employees first, and with an age group list alphabetically
SELECT * FROM Employees WHERE [Age] = 40 ORDER BY [Last Name]
You can have both WHERE and ORDER BY clauses is a SELECT statement.
The WHERE clause is always written first.
SELECT Predicates: DISTINCT, TOP n [PERCENT]
SELECT DISTINCT [Last Name], [Age] FROM Employees
DISTINCT removes any duplicates from the selected tuples.
SELECT DISTINCTROW [Last Name], [Age] FROM Employees
DISTINCTROW removes duplicates from the source data. If we employ two
forty-year-old Smiths, there will be duplicates in the recordset.
SELECT TOP 10 [Last Name], [First Name] FROM Students
WHERE [Year] = 2000 ORDER BY [GRADE] DESC
The top 10 (+) students in the year 2000
SELECT TOP 10 PERCENT [Last Name], [First Name] FROM Students
WHERE [Year] = 2000 ORDER BY [GRADE] DESC
7
GROUP BY clause
SELECT COUNT(City) FROM Publishers
This will produce a recordset with a single record. Its lone field has value 122.
There are 122 records in Publishers with a non null City field
SELECT City, COUNT(City) AS Tally FROM Publishers WHERE City IS NOT
NULL GROUP BY City ORDER BY 2 DESC, 1
After filtering out records with null City fields, the records are partitioned
according to their City field. Each group will generate a single record in the
resulting recordset. This recordset goes from (New York, 28) to (Watsonville, 1).
Note the use of 1 and 2 to refer to the fields in the result, and "AS Tally" to name
the new field.
Aggregate functions: COUNT, AVG, MIN, MAX, SUM
SELECT COUNT(*) FROM Publishers
The number of records in the table
SELECT COUNT(*), AVG([Year published]), MIN([Year published]),
MAX([Year published]), [Company name]
FROM [All Titles] GROUP BY [Company Name]
SELECT [ShipCountry], SUM(freight) FROM Orders GROUP BY [ShipCountry]
HAVING clause
SELECT [ShipCountry], SUM(freight) FROM Orders GROUP BY [ShipCountry]
HAVING SUM(FREIGHT) > 1000
A WHERE clause filters out records before you do anything else. A HAVING
clause (which can only be will GROUP BY) does a final filtering: which records
produced by grouping are actually output?
8
Summary
The SELECT statement's clauses are all optional, apart from FROM. The clauses
must be given in the following order.
SELECT
Fields
FROM Tables
[WHERE ...]
[GROUP BY ...]
[HAVING ...]
[ORDER BY ...]
subqueries
SELECT e.emp_id, e.last_name, e.first_name, ep.pay_rate
FROM employee_tbl e, employee_pay_tbl ep
WHERE e.emp_id = ep.emp_id AND
ep.pay_rate > (SELECT pay_rate
FROM employee_pay_tbl WHERE emp_id = ‘99’)
The subquery must be placed in parentheses.
Because the above subquery is used where a value is expected, it must
return a result set with one field and one record.
UPDATE employee_pay_tbl
SET pay_rate = pay_rate * 1.15
WHERE emp_id in (SELECT emp_id FROM employee_tbl
WHERE city = ‘Ottawa’)
Because the above subquery is used where a sequence of values is
expected, it must return a result set with one field and any number of records.
DELETE employee_pay_tbl
WHERE emp_id = (SELECT emp_id FROM employee_tbl
WHERE last_name = ‘Simpson’ and
First_name = ‘Homer’)
SELECT c.cust_name FROM customer_tbl c
WHERE 10 < (SELECT SUM(o.qty) FROM orders_tbl o
WHERE o.cust_id = c.cust_id)
The above is an example of a correlated subquery. The query returns the
name of all customers that have ordered at least 10 units of any one product.
9
Views
A view is a predefined query that is stored in the database, has the appearance
of an ordinary table, and is accessed like a table, but does not require physical
storage.
Important: if you change the underlying tables, then use the view, you will
be seeing the new data.
Views are handy for often-used queries. The database only has to parse
and pre-process them once, and they can be shared among all the
applications that access the database. Use them!
CREATE VIEW [all sittings] AS
SELECT *
FROM person_tbl INNER JOIN sitting_tbl
ON person_tbl.id = sitting_tbl.personid
Use the view like a table, although there are a few restrictions, like a view
involving JOIN can not be updated.
SELECT * FROM [all sitting] WHERE . . .
Joins
A Join composes a pair of tables into one. The most common join is when you
connect a foreign key with a primary key. The Join expression is a table
expression and can be used in a FROM clause.
SELECT CategoryName, ProductName
FROM Categories INNER JOIN Products
ON Categories.CategoryID = Products.CategoryID
An inner join only combines tuples when the left tuple matches the right tuple.
There many be information loss: what if there is a category without a product?
SELECT CategoryName, ProductName
FROM Categories LEFT JOIN Products
ON Categories.CategoryID = Products.CategoryID
A left join will create a record made from a tuple from the first table with a tuple of
null values, if there is no other match for the left tuple. In this query, this happens
to unrepresented categories. There is a corresponding right join.
10
SELECT INTO statement
SELECT * INTO NewTable FROM ExistingTable
SELECT * INTO NewTable FROM Authors WHERE [Year Born] IS NOT NULL
The FROM clause can have all the buzzers and whistles described above.
11
INSERT Statement
INSERT INTO Table17 Values (99, 'Fred', 1,)
That's easy, but what if you want to insert a new record into a table with an
autoincrement counter? Specify the other fields after the table name. Missing
fields are given their default value, which means the auto- increment field can do
its thing:
INSERT INTO Table17(Name, Value) Values ('Fred', 1)
INSERT INTO Customers SELECT * FROM [New Customers]
Again, we can go into a full-blown SELECT statement here. The main difference
between this version of INSERT INTO and SELECT… INTO, is that the table
must already exist for the INSERT.
DELETE Statement
DELETE FROM Employees WHERE Title = 'Trainee'
Delete removes entire records. There is no undo, apart from rolling back
transactions. Often you INSERT/SELECT INTO the records into a backup table
before deleting them from the real table.
UPDATE Statement
UPDATE Questions SET Uses = Uses + 1, OK = OK + 1 WHERE ID = 17
DROP Statements
DROP INDEX index17 ON TABLE Employees
DROP TABLE HotPotato
12
CREATE Statements
CREATE TABLE Questions
(ID AUTOINCREMENT, Question TEXT(50) NOT NULL, Position SHORT NOT NULL)
When creating a table, you can also indicate the primary key:
CREATE TABLE Questions (
ID AUTOINCREMENT CONSTRAINT pkey PRIMARY KEY,
Question TEXT(50) NOT NULL,
Position SHORT NOT NULL)
Types you can use in a CREATE table method are:
BOOLEAN
AUTOINCREMENT (built on top of a LONG)
SHORT
LONG
CURRENCY
SINGLE
DOUBLE
TEXT (up to 255 chars)
LONGTEXT ( how does a Gigabyte sound?)
DATE
There are several synonyms, like COUNTER for AUTOINCREMENT.
CREATE INDEX PKey ON Questions(ID) WITH PRIMARY
A primary key is always unique and forces its fields to be non null.
CREATE UNIQUE INDEX FullName ON Employees(LName, FName)
An error will result if there are two employees listed with the same name.
13
constaints
Here is a clean way to specify a primary key in a create statement:
CREATE TABLE abc_tbl
( id AUTOINCREMENT NOT NULL,
another field,
another field,
CONSTRAINT pkey PRIMARY KEY (id)
)
And here is how to specify a foreign key (of course, a table can have both
a primary and a foreign key):
CREATE TABLE xyz_tbl
( A field,
abc_id LONG NOT NULL
another field,
CONSTRAINT abc_fk FOREIGN KEY (abc_id)
REFERENCES abc_tbl [ON DELETE CASCADE]
)
We didn’t specify a field in the table abc_tbl, so the primary key field id is
assumed. The cascading delete is optional (that is the meaning of the square
bracket meta-notation.)
14
ALTER Statement
ALTER TABLE Employees ADD COLUMN Notes TEXT(25)
Any existing records will have a NULL value in this new field
ALTER TABLE Employees DROP COLUMN Notes
How to change a column’s attributes
The short answer is you can’t, in SQL. What you can do is move data from one
column to another column. In the following example, I assume that I want to
“change” a 2-byte integer column val in table T into a 4-byte column.
ALTER TABLE T ADD COLUMN temp LONG;
UPDATE T SET temp = val;
DROP COLUMN val ON TABLE T;
ALTER TABLE T ADD COLUMN val LONG;
UPDATE T SET val = temp;
DROP COLUMN temp ON TABLE T;
And if the column had an index on it, you would have to DROP it before
DROPping the column and CREATE it again after the new column was complete.
15
Normalization
Normalization is the process of decomposing data into tables and
relationships between tables. Redundancy is eliminated as well as
inconsistent dependence between fields. Computer Scientists have identified
a number of normal forms (I lost count at 5th normal form).
Zeroth Normal Form. Fields are atomic. Why follow this form? To make it
possible to search for a given last name.
NAME
NAME PHONE CALLDATE MEMO
Mullder,, Fox
Mu der Fox UFO-1212 16-9-00 little green .
Sculllly,, Dana
Scu y Dana 555-0101 17-9-00 …
Mullder,, Fox
Mu der Fox UFO-1212 17-9-00 …
Sculllly,, Dana
Scu y Dana 555-0101 18-9-00 …
LNAME FNAME PHONE CALLDATE MEMO
Mulder Fox UFO-1212 16-9-00 little green .
Scully Dana 555-0101 17-9-00 …
Mulder Fox UFO-1212 17-9-00 …
Scully Dana 555-0101 18-9-00 …
16
First Normal Form. Eliminate repeating groups
LNAME FNAME PHONE CALLDATE
CALLDATE MEMO
MEMO
Mulder Fox UFO-1212 16--9--00
1 6 9 00 lliittttlle green ..
e green
Scully Dana 555-0101 17--9--00
1 7 9 00 ……
Mulder Fox UFO-1212 17--9--00
1 7 9 00 ……
Scully Dana 555-0101 18--9--00
1 8 9 00 ……
ID LNAME FNAME PHONE PER_ID DATE MEMO
17 Mulder Fox UFO-1212 17 16-9-00 …
18 Scully Dana 555-0101 18 17-9-00 …
17 17-9-00 …
18 18-9-00 …
17
Second Normal Form. Fields should fully depend on the key – if not, place the
field in another table.
PER_ID DATE MEMO CallllType
Ca Type
17 16-9-00 … Alliien
A en
18 17-9-00 … Auttopsy
Au opsy
17 17-9-00 … ESP
ESP
18 18-9-00 … Auttopsy
Au opsy
PER_ID DATE MEMO CALLTYPE
17 16-9-00 … 1
18 17-9-00 … 2
17 17-9-00 … 3
18 18-9-00 … 2
CALLTYPE DESCRIPTION
1 Alien
2 Autopsy
3 ESP
Should you do this with a field like PROVENCE? Since there are a fixed number
of provinces, the complexity of adding a separate table is not worth it. Do this
only for “enumerated” types that need to be flexible.
18
Third Normal Form. Non-key fields should be independent of each other. A
change in one field should not force a change in another.
ID LNAME FNAME PHONE ZIIP
ZP STATE
STATE
17 Mulder Fox UFO-1212 21244
21244 VA
VA
18 Scully Dana 555-0101 21265
21265 VA
VA
ZIP STATE
ID LNAME FNAME PHONE ZIP
17 Mulder Fox UFO-1212 21244 21244 VA
18 Scully Dana 555-0101 21265 21265 VA
… …
Again, it may not be worth the complexity of the additional table to reduce
data to third normal form. Some programmers would not do the last
decomposition.
How would you indicate that you know the state, but not the zip code? Once
again, it seems that the best solution is to generate an internal ID value for
the zip-state table:
ID LNAME FNAME PHONE ZIP_ID ID ZIP STATE
17 Mulder Fox UFO-1212 99 99 NULL VA
18 Scully Dana 555-0101 177 177 21265 VA
… …
19