Chapter 9
Database
Management
Systems
1
Objectives for Chapter 9
Problems in the flat-file approach to data management
Why these gave rise to the database concept
Relationships among the elements of the database environment
Characteristics of the relational database model
Three stages in database design including
Conceptual design
Logical design
Physical design
Anomalies caused by unnormalized databases and the need for data
normalization
Features of distributed databases
Considerations in deciding on a particular database
configuration
2
Flat-File Versus Database
Environments
Computer processing involves two components: data
and instructions (programs).
Conceptually, there are two methods for designing the
interface between program instructions and data:
File-oriented processing: A specific data file was
created for each application
Data-oriented processing: Creates a single data
repository to support numerous applications
Disadvantages of file-oriented processing include
redundant data and programs and varying formats for
storing the redundant data.
The format for similar fields may vary because the
programmer used inconsistent field formats.
3
Flat-File Environment
User 1 Data
Transactions
Program 1 A,B,C
User 2
Transactions
Program 2
X,B,Y
User 3
Transactions
Program 3
L,B,M
Data Redundancy &
Flat-File Problems
Data Storage - creates excessive storage
costs of paper documents and/or magnetic
form
Data Updating - any changes or additions
must be performed multiple times
Currency of Information - potential problem
of failing to update all affected files
Task-Data Dependency - user’s inability to
obtain additional information as his or her
needs change
5
Database Approach
User 1
Database
Transactions
Program 1
A,
User 2
D B,
Transactions B C,
Program 2 M X,
S Y,
User 3 L,
Transactions M
Program 3
Advantages of the
Database Approach
Data sharing/centralize database resolves flat-file problems:
No data redundancy - Data is stored only once, eliminating
data redundancy and reducing storage costs.
Single update - Because data is in only one place, it
requires only a single update procedure, reducing the time
and cost of keeping the database current.
Current values - A change to the database made by any
user yields current data values for all other users.
Task-data independence - As users’ information needs
expand beyond their immediate domain, the new needs can
be more easily satisfied than under the flat-file approach.
7
Disadvantages of the
Database Approach
Can be costly to implement
additional hardware, software, storage, and
network resources are required
Can only run in certain operating
environments
may make it unsuitable for some system
configurations
Because it is so different from
the file-oriented approach, the database
approach requires training users
may be inertia or resistance
8
Elements of the Database Approach
Database
System Development Administrator
Process
Applications
User DBMS
Transactions
Programs Data
Definition Host
U Language Operating
S Transactions User System
Data
E Programs Manipulation
R Language
S Transactions User
Query
Programs Language Physical
Database
User Queries
DBMS Features
User Programs - make the presence of the DBMS
transparent to the user
Direct Query - allows authorized users to access
data without programming
Application Development - user created
applications
Backup and Recovery - copies database
Database Usage Reporting - captures statistics on
database usage (who, when, etc.)
Database Access - authorizes access to sections of
the database 10
Internal Controls and
DBMS
The purpose of the DBMS is to provide
controlled access to the database.
The DBMS is a special software system
programmed to know which data
elements each user is authorized to
access and deny unauthorized
requests of data.
11
Data Definition Language
(DDL)
DDL is a programming language used to define
the database to the DBMS.
The DDL identifies the names and the relationship of
all data elements, records, and files that constitute
the database.
Viewing Levels:
internal view - physical arrangement of records
(1)
conceptual view - representation of database
(1)
user view - the portion of the database each
user views (many) 12
ANSI Model
External
User Views
Conceptual
Model
Internal
Model
Physical
Database
Data Manipulation
Language (DML)
DML is the proprietary programming
language that a particular DBMS uses to
retrieve, process, and store data.
Entire user programs may be written in
the DML, or selected DML commands can
be inserted into universal programs, such
as COBOL and FORTRAN.
14
Query Language
The query capability permits end users
and professional programmers to access
data in the database without the need for
conventional programs.
IBM’s Structured Query Language (SQL)
is a fourth-generation language that has
emerged as the standard query language.
15
Three Steps in Designing
a Database
Prepare the conceptual model
Identify the entities
Identify the relationships between the entities
Prepare the ER diagram
Specify the logical design
Select the logical database model (relational)
Transform the conceptual data model using a
logical database model
Implement the physical design
Physical structures
Access methods 16
Phase 1
Prepare the Conceptual
Model
Draw an ERD to
capture the
process.
17
ER-Diagram Symbols
ER-Diagram Symbols
Entity Relationship Attribute
Primary Key
Example of a Relationship Linking
Two Entities
1 M
CUSTOMER places ORDER
Order Number Item #
Name Number
An Entity
...is an
individual object
concept
event
...may be a specific
tangible object
intangible object
Entity Class is a collection of entities with
similar attributes.
21
Attributes
A property of an entity that we choose to record (of
interest to an organization).
CUSTOMER (entity) PRODUCT (entity)
customer # product #
name description
address finish
telephone no. price
balance qty. on hand
22
Cardinalities
Entity Relationship Entity
Sales- 1 1 Car
Assigned
person
1 M
Customer Places Order
M M
Vendor Supply Inventory
ER-Diagram using REA Model
R E A
M M M 1
Line items Sales Sales
Inventory Party to
M person
M
1
Pays for Made to
Customer
1
M
M Received
1 M Cash from
Increases
Cash Collections M 1 Cashier
Received 34
by
Phase 2
Specify the Logical
Design
Create relational
tables.
25
Logical Data Structures
A particular method used to organize records
in a database is called the database’s
structure.
The objective is to develop this structure
efficiently so that data can be accessed
quickly and easily.
Four types of structures are:
hierarchical (AKA the tree structure)
network
relational
object-oriented 26
The Relational Model
The relational model portrays data in the
form of two dimensional tables:
relation - the database table
attributes (data elements) - form
columns
tuples (records) - form rows
data - the intersection of rows and columns
27
RESTRICT - filtering out rows, PROJECT - filtering out columns,
such as the purple such as the purple
JOIN
X1 Y1 Y1 Z1 X1 Y1 Z1
X2 Y2 Y2 Z2 X2 Y2 Z2
X3 Y1 Y3 Z3 X3 Y1 Z1
Properly Designed Relational Tables
No repeating values - All occurrences at
the intersection of a row and column are a
single value.
The attribute values in any column must all
be of the same class.
Each column in a given table must be
uniquely named.
Each row in the table must be unique in at
least one attribute, which is the primary key.
29
Relational Model Data
Linkages (>1 table)
No explicit pointers are present. The data are viewed as a
collection of independent tables.
Relations are formed by an attribute that is common to
both tables in the relation.
Assignment of foreign keys:
if 1 to 1 association, either of the table’s primary keys may
be the foreign key.
if 1 to many association, the primary key on one of the
sides is embedded as the foreign key on the other side.
if many to many association, may embed foreign keys or
create a separate linking table.
30
Three Types of Anomalies
Insertion Anomaly: A new item
cannot be added to the table until
at least one entity uses a
particular attribute item.
Deletion Anomaly: If an
attribute item used by only one
entity is deleted, all information
about that attribute item is lost.
Update Anomaly: A modification
on an attribute must be made in
each of the rows in which the
attribute appears.
Anomalies can be corrected by
creating relational tables.
31
Advantages of Relational
Tables
Removes all three anomalies
Various items of interest (customers,
inventory, sales) are stored in separate
tables.
Space is used efficiently.
Very flexible. Users can form ad hoc
relationships.
32
The Normalization Process
A process which systematically splits
unnormalized complex tables into
smaller tables that meet two conditions:
all nonkey (secondary) attributes in the table
are dependent on the primary key
all nonkey attributes are independent of the
other nonkey attributes
When unnormalized tables are split and
reduced to third normal form, they must
then be linked together by foreign keys.
33
Steps in Normalization
Table with
repeating groups Remove
repeating
groups
First normal
form 1NF
Remove
partial
dependencies
Second normal
form 2NF
Remove
transitive
dependencies
Third normal
form 3NF
Remove
remaining
Higher normal anomalies
forms
Accountants and Data
Normalization
The update anomaly can generate conflicting and
obsolete database values.
The insertion anomaly can result in unrecorded
transactions and incomplete audit trails.
The deletion anomaly can cause the loss of
accounting records and the destruction of audit
trails.
Accountants should have an understanding of the
data normalization process and be able to
determine whether a database is properly
normalized.
35
Phase 3
Implement the Physical
Design
Decide about
software and
hardware.
36
Physical Database Design
Transition from theoretical to physical
aspects of database
IS IT
Decisions about software and hardware
Implementation
Populate the database with data
Produce physical user views (multiple)
37
Data Structures
allow records to be located, stored, and
retrieved and allow movement through
the database. Two components:
The organization of a file is the physical
arrangement of records.
The access method is the technique used
to locate records and to navigate through the
database.
38
Distributed Data
Processing
Central Centralized
Site Database
Site A Site B Site C
Distributed Data
Processing
DP is organized around several
information processing units (IPUs)
distributed throughout the organization
and placed under the control of the
end users.
DDP does NOT mean
Decentralization!
IPUs are connected to
one another and
coordinated.
40
Potential Advantages of
DDP
Cost reductions in hardware and data
entry tasks
Improved cost control responsibility
Improved user satisfaction since
control is closer to the user level
Backup of data can be improved
through the use of multiple data storage
sites
41
Potential Disadvantages of
DDP
Loss of control
Mismanagement of organization-wide
resources
Hardware and software incompatibility
Redundant tasks and data
Consolidating incompatible tasks
Difficulty attracting qualified personnel
Lack of standards 42
Centralized Databases in DDP
Environment
The data is retained in a central
location.
Remote IPUs send requests for data.
Central site services the needs of the
remote IPUs.
The actual processing of the data is
performed at the remote IPU.
43
Data Currency
Occurs in DDP with a centralized database
During transaction processing, the data
will temporarily be inconsistent as a
record is being read and updated.
Database lockout procedures are
necessary to keep IPUs from reading
inconsistent data and from writing over a
transaction being written by another IPU.
44
Distributed Databases:
Partitioning
Splits the central database into segments that
are distributed to their primary users
Advantages:
users’ control is increased by having data
stored at local sites
transaction processing response time is
improved
the volume of transmitted data between IPUs is
reduced
reduces the potential data loss from a
disaster
45
The Deadlock Phenomenon
Especially a problem with partitioned
databases
Occurs when multiple sites lock each
other out of data that they are currently
using
One site needs data locked by another site.
Special software is needed to analyze and
resolve conflicts.
Transactions may be terminated and have to be
restarted.
46
The Deadlock Phenomenon
Locked A, waiting for C Locked E, waiting for A
A,B
E, F
C,D
Locked C, waiting for E
47
Distributed Databases:
Replication
The duplication of the entire
database for multiple IPUs
This method is effective for situations with
a high degree of data sharing, but no
primary user, and supports read-only
queries.
The data traffic between sites is reduced
considerably.
48
Concurrency Problems and
Control Issues
Database concurrency is the presence
of complete and accurate data at
all IPU sites. With replicated
databases, maintaining current data at
all locations is a difficult task.
Time stamping may be used to
serialize transactions and to prevent
and resolve any potential conflicts
created by updating data at various
IPUs.
49