Understanding Relational Databases by HC120714075126


									Understanding Relational Databases
White Paper by: Rob Hernlund (January 2000)

Target Audience and Learning Objectives for this Paper
Building a Foundation
Understanding the Relational Database Model
The Importance of a Structured Query Language
Understanding a Relational Database Management System (RDBMS)
RDBMSs for Client/Server Environments
The Next Logical Progression Into Web Based Relational Databases
References Used for this Paper

Target Audience and Learning Objectives for this Paper
This paper is intended for anyone interested in understanding relational databases from
a conceptual perspective. This may include relational database users, programmers,
salespeople, managers, technical writers, etc. This paper will also help the reader to
gain a more thorough understanding of how the Structured Query Language came
about, what it is, and why it is always used in conjunction with relational databases. The
buzz phrase “Database Management System (DBMS)” will be defined. Finally, we will
take a brief look at the natural progression into Web Based Relational Databases.

Building a Foundation
In order to understand a relational database, it is important to have a good grasp of the
earlier models of databases. Many people using or developing databases today do not
have a firm appreciation for or through understanding of a relational database because
they have never been exposed to the concepts behind the earlier models. Prior to the
advent of the relational database model, the two models most commonly (if not always)
used were the hierarchical and the network database. The hierarchical model for a
database can be thought of as an inverted tree.

Most people with a background in computer programming or file management will
envision this as a directory tree, with a root directory, and a hierarchical path to various
subdirectories. However, when envisioning a model of a database, you must replace
directories with tables for the model to be correct. Figure one shows how this looks in a
diagram format.

Page 1 of 8
Figure 1: Hierarchical Database Model

Keep in mind, we are examining the earliest type of database, which had almost become
obsolete by the early eighties, but had held up pretty well for the 30 or so years prior to
that. The key point here is that even this early model of a database employed the use of
multiple tables of data in a hierarchical layout. This is where a common misconception
occurs when people try to understand a relational database. They often think of a
relational database in these simple terms: “a relational database is a database where
data from one table can be accessed through data from another table”. As you can see,
even the earlier “non-relational” databases could do that.

However, this early model had many limitations. The main one being that the user could
only access data at the sub levels by starting at the root and addressing through the
branches using pointers. In other words, to access a record in Sub Table 3, the user
would have to start at the Root Table, and know the pointer to Table 1 and the pointer
from Table One to Sub Table 3. This may sound trivial, but the requirement to know all
the pointers necessary to access a record was a major disadvantage of the hierarchical
model. In a complex database, there would be thousands of pointers. This made the
tasks of programming and using these types of databases quite arduous. An even bigger
disadvantage of the hierarchical model is that the relationships between tables were so
limited. Records could only be related to other records in a single branch of the tree, with
no cross access to information from tables in any of the parallel branches. This created
a serious limitation in the complexity of the data that could be dealt with. For example, a
modern day travel agency would have to have several hundred databases of the
hierarchical data model to do the equivalent of what a single relational database can do
today. If a travel agent were only dealing with one airline in one city, the hierarchical
model would be fine. However, if they wanted to compare bookings between 50 different
airlines, the hierarchical model will fall on its face. This is because of the lack of
capability to access records in parallel branches of the chain. In a relational database,
records from 50 different airlines can easily be kept in separate tables, and each table
will have access to data in any of the other tables through indirect relationships. This
concept will be explained in detail in the section entitled Understanding Relational

Page 2 of 8
For now let’s stay focused on understanding the two most common database models
that predated the relational database. The network database was the next step in the
evolution of databases after the hierarchical database. The network database model is
more capable of dealing with complex data, because the model allows a greater number
of relationships between tables and records within the database. In discussing the
network database model, it is important to understand that the term “network” does not
refer to a communications network, but rather, it refers to the structure of the model.
Figure 2 shows how the structure of the network database can be perceived as a
“network of data”.

Figure 2: Network Database Model

In the network database model shown in Figure 2, each table within the database is
linked by a set structure. Unlike the hierarchical model, data can be accessed in tables
that are in parallel branches. In fact, with the advent of set structures, data in any table in
the database can be accessed from any other table in the database by following the
paths defined by the set structures. This feature enables the network database model to
be powerful enough to handle complex data models, such as the case of multiple airline
bookings for scores of cities. To follow this model through, let’s examine how data in
Table 3c table can be accessed from the data in Table 3a. In this model, you can
address data between tables in either direction. Therefore, to get from 3a to 3c, you
would invoke the following set structures: Set 2a to 3a; Set 2a to 3b; Set 2b to 3b; Set 2b
to 3c. You could have even chosen to address the data by following the path from Table
3a up through Table 1 and then back down to Table 3c. In this case, the following set

Page 3 of 8
structures would be invoked: Set 2a to 3a; Set 1a to 2a; Set 1 to 2b; Set 2b to 3c. Try not
to get confused by the fact that the direction of the set structure does not always indicate
the direction you are travelling to access tables. The set structures are only arbitrary
names to indicate that the two tables are connected to each other, and can to used to
access data in either direction. These relationships are kept straight by the fact that each
pair of tables are related to each other in the form of an owner and a member. In Figure
2, Table 1 would most likely be designated as the owner for Table 2a and Table 2b. In
this arrangement, Table 2a and Table 2b would each be members of Table 1. One
owner can have multiple members, but each member may only have one owner.

You can see how this model can get complicated from a programming point of view.
Figure 2 depicts a simple model, but when you are building a database for businesses
with thousands of customers, the model becomes quite complicated. In fact this model
would not be able to handle the complex data required for a business like a travel
agency with the need to track bookings for multiple airlines in multiple cities for
thousands of customers.

Once the set structures are in place in a network database, they cannot be changed
without affecting an applications program that uses the database. If a set structure is
changed, all references from within the applications program to that set structure will
have to be changed. In summary, although the network database was powerful enough
to handle complex data models, it was a pain to program, which eventually brought
about the motivation for the development of the relational database model.

Understanding the Relational Database Model
Discussing the creation of the relational database model without mentioning Dr. E. F.
Codd would be like discussing the invention of the electric light bulb without mentioning
Thomas Edison. Dr. Codd was a research scientist for IBM during the 1960s, and like
many research scientists of the time, he was dissatisfied with the cumbersome methods
being used to handle large amounts of data. With the information age just beginning to
take off, the need to handle large amounts of data with less effort was at the top of the
priority list for those in the development community.

Dr. Codd was a mathematician, and he applied the science of mathematics to solve the
problems inherent in database models that were in use at the time. In 1970, he
presented a paper to the general scientific community entitled “A Relational Model of
Data for Large Shared Data Banks”. The eventual widespread use of the concepts in his
paper revolutionized the world of data management. Dr. Codd’s model is based on two
branches of mathematics: set theory and first order predicate logic. Discussing the
mathematical concepts behind a relational database is beyond the scope of this paper.
However, the reader should understand that the mathematical models used by Dr. Codd
allowed for even broader and more indirect relations between the tables in a database.
We will examine this broader scope of relationships in the graphical view of a relational
database as shown in Figure 3.

Page 4 of 8
Figure 3: Relational Database Model

In this model, the data in the Side Table appears to be unrelated to data in Table 1,
Table 2a, Table 3a, and Table 2b. However, by beginning at the Side Table, and by
using a structured query, data in any other table in the database can easily be accessed.

Although this model is similar to the network database model, it does allow for more
indirect relationships, and there is a key difference is in the way the data is stored. In a
relational database model, data is stored in relations that are perceived by the user as
tables. Each relation is composed of records and fields. Each record in a table is
identified by a field that contains a unique value. The physical order of the records or
fields in a relational database is completely immaterial. For this reason, the data can
exist independent of the way it is physically stored in the computer. The makes the task
of keeping track of the data far easier, because the user does not even need to know the
physical location of a record in order to retrieve its data.

The concept of using key fields to access records is shown in Figure 4.

Page 5 of 8
Figure 4: Relating Tables Using ID Fields

These key fields that establish a relationship between the data in tables greatly improve
the ease with which the user can retrieve the data. With all previous models of
databases, the user had to be painfully familiar with the internal structure of the
database. With the relational model, the user simply has to glance at the layout on the
screen, and form queries using a language created to do just that. This of course, brings
us to a discussion of the Structured Query Language (SQL).

The Importance of a Structured Query Language
First and foremost, it is important to understand that the Structured Query Language
(SQL) was developed out of a specific need to be able to query data from a relational

Tables in relational databases are related to each other through key fields. In most
cases, a table will have a primary key that identifies the table itself, and a foreign key
that identifies which table it is related to, and what the relationship is.

The Structured Query Language is used to retrieve data from specific fields in specific
tables within a database. The following statement is a sample SQL query:

SELECT         Airline, Arrivals, Departures   // fields

Page 6 of 8
FROM          Airlines                       // table
WHERE         Austin
ORDER         By Departures

A basic SQL query normally consists of three components: the SELECT FROM,
WHERE, and ORDER. Although the SELECT and FROM clauses are usually on two
different lines, they are considered to be one component as they are used to identify the
field and table the user wishes to query. The WHERE clause is used to impose criteria
against the data, and the ORDER clause is used to perform sort functions on the data.

Of course, nearly all programming languages are defined by a written standard of some
kind. The American National Standards Institute (ANSI) has adopted SQL as the
standard language for relational database management systems as defined in the
document ANSI X3.135-1989 "Database Language SQL with Integrity Enhancement", or
ANSI SQL89. An excellent resource for all things related to SQL standards can be found
at: http://www.db.informatik.uni-kassel.de/~czi/sql_stnd.html.

Understanding a Relational Database Management System
A Relational Database Management System (RDBMS) is a software program (or suite of
programs) used to create, maintain, modify, and manipulate a relational database. This
includes the Graphical User Interface (GUI) and all the associated underlying code
needed to do these tasks. Most modern day RDBMSs are designed for use in a
client/server environment. In this model, the bulk of the data for the database is on a
central server, and users access this data through applications on their workstations.

RDBMSs for Client/Server Environments
Again, a full appreciation of the improvements of a client/server environment can best be
demonstrated by a trip backward in time. Many readers of this paper will remember the
days before PC networks. People who only have exposure to computer systems from
the late 1980s onward will find the idea of individual PCs on desktops hard to image! In
the early eighties, the entire database had to be contained on one PC, with no
communication with the outside world except through a floppy disk. Large firms were still
using mainframe computers for database management, but many small businesses
began to use desktop RDBMSs for their internal data management needs. Inventory
systems for small businesses were a very common application for these early PC based
models. The first two RDBMSs that became popular were dBASE by Ashton-Tate, and
FoxPro by Fox Software. More advanced PC based RDBMSs were soon offered in the
form of R:BASE developed by Microrim, and Paradox, which was developed by Ansa

During the late 1980s and early 1990s the need to network data between individual PCs
became much greater as we exploded into the information age. With the advent of
widespread use of PC networks, client/server based RDBMSs began to emerge to meet
the needs of handling data over these networks. There are at least 40 or 50 vendors

Page 7 of 8
offering RDBMS products for client/server applications today. An excellent resource for
links to these products can be found at: http://www.llp.fu-berlin.de/lsoft/H/1/index.shtml.

The Next Logical Progression Into Web Based Relational
As we examine the progression from PC based databases to client/server based
databases and on to web based databases, it is important to note that it is the
improvement in hardware technology that opens to the door to the creation of the
software technology. This trend also applies to the development of database
applications that can be used for web based environments. The biggest limiting factor for
relational databases to have practical use on the Internet is in the bandwidth available to
transmit data. Simply put, it takes a large amount of bandwidth to move large amounts of
data though a computer network. Conceptually, an Internet based environment is not
much different than a client/server based environment. The main difference is that the
bandwidth available for transmission over the Internet is typically much less than in a
Local Area Network (LAN), a Wide Area Network (WAN), or even an Intranet

This will change rapidly in the next two years. The hardware technology for transmission
of data over the Internet is improving at a very fast rate, and the home user will see a
tremendous difference over the next few years. As cable modems and other similar
technologies begin to replace phone modems the home user will begin to experience
response that is as much as 50 times faster. Pages that now take as much as 30
seconds to a minute to load will appear on the screen the very instant the mouse button
is clicked to access the page. I predict that by January of 2002, at least 90% of the
residential connections will be running at a minimum transmission rate of T1 (1.544
megabits per second). Currently the fastest speed a phone modem can run at is 56
kilobits per second.

These improvements in hardware technology will pave the way for more widespread use
of web based relational databases. I believe as this situation develops, there will be a
push to develop applications that are extremely trim in terms of program size that can
easily be downloaded over the Internet for use on residential PCs. As, always, the key
is to have a central server carry the majority of the data, and to have the client side of
the system as light as possible in terms of program size and in terms of the volume of
data manipulated.

References Used for this Paper
Celko, Joe, Databases: Concepts in Practice, Morgan Kaufmann Publishers, 1999.
Hernandez, Michael J., Database Design for Mere Mortals, Addison Wesley, 1997.

End of Paper.

Page 8 of 8

To top