Data and Knowledge Management
Chapter 4 1
1. Understand information system concepts
including input, processing, and output,
data and information.
2. Understand information system components
including hardware, software, databases
10. Be able to create a database using a
relational DBMS such as Access.
Demonstrate the ability to create reports,
queries and join two tables in the DBMS 2
Chapter 4 3
Difficulties of Managing Data.
Data is a company’s most valuable resource.
Amount of data increases exponentially.
Data are scattered and collected by many
individuals using various methods and devices.
Data come from many sources including internal
sources, personal sources and external sources.
Data security, quality and integrity are critical.
Chapter 4 4
Bit (a binary digit): a circuit that is either on
Byte: group of 8 bits, represents a single
Field: name, number, or characters that
describe an aspect of a business object or
Chapter 4 5
Data Hierarchy (Continued)
Record: collection of related data
File (or table): collection of related
Database: a collection of integrated and
Chapter 4 6
Chapter 4 7
File Processing Systems
Traditional File Processing:
Data are organized, stored,
and processed in
independent files of data
Chapter 4 8
Problems of File Processing
Data Redundancy – duplicate data requires an update
to be made to all files storing that data
Example – what if the same customer has a checking
account, savings account, and loan?
Lack of Data Integration – data stored in separate files
require special programs for output making ad hoc
Data Dependence – programs must include
information about how the data is stored so a change in
storage format requires a change in programs
Chapter 4 9
Why Data Redundancy is Bad
Loan files will
changes in one
file and not the
Chapter 4 10
Solution: Database Approach
Database management system (DBMS) provides all users
with access to all the data.
DBMSs minimizes the following problems:
Data redundancy: the same data stored in many places.
Data isolation: applications cannot access data
associated with other applications.
Data inconsistency: various copies of the data do not
Chapter 4 11
DBMS = A software system that is used to create, maintain, and provide
controlled and secure access to data
Chapter 4 12
Another example of a
Database Management System
Chapter 4 13
Database Approach (Continued)
DBMSs maximize the following issues:
Data integrity: data meets certain constraints, no
alphabetic characters in zip code field.
Data independence: applications and data are
independent of one another, all applications are
able to access the same data.
Chapter 4 14
Designing the Database
Data model. Diagram that represents the entities in
the database and their relationships.
Entity is a person, place, thing or event.
Attribute is a characteristic or quality of a particular
Primary key is a field that uniquely identifies that
Secondary keys are fields that have identifying
information but may not identify with complete accuracy.
Chapter 4 15
Process where the relationships between data elements are
Standard approach is to create an Entity-Relationship (ER)
Chapter 4 16
Database designers plan the database design in a process
called entity-relationship (ER) modeling.
ER diagrams consists of entities, attributes and
Entity classes are a group of entities of a given type, e.g.
Instance is the representation of a particular entity, i.e.
STUDENT(John Smith, 123-45-6789, …).
Attributes are fields that contain specific data about an
Identifiers (primary keys) are attributes unique to that entity
instance, i.e. StudentIDNumber.
Chapter 4 17
Cardinalities of relationships:
1 – to – 1
1 – to – many
many – to - many
Chapter 4 18
fields that contain specific data about entities
the attribute that uniquely identifies an entity
Chapter 4 19
may place many
orders, but each
order is placed by
a single customer
Chapter 4 20
One order has
many order lines;
each order line is
associated with a
Chapter 4 21
One product can
be in many
order lines, each
order line refers
to a single
Chapter 4 22
and one product is
involved in many
Chapter 4 23
ER Diagram for Product Inventory
and Ordering System
Chapter 4 24
ER Diagram of Employee Database
JobTypes 1|N 1|N
Employees M|N Projects
Entities in boxes
Chapter 4 25
Entities in Employee Database
Employees - data about people in the
Departments - data about the organizational
JobTypes - data about the work
Projects - data about the current projects
Chapter 4 26
Relationships in Employee
Each department has many employees, but each
employee works for only one department (1:N)
There are many employees of a given job type, but each
employee has only one job title (1:N)
An employee who is a manager has many employees
under her, but each employee has only one manager to
report to (1:N)
Each employee can be working on several projects, and
each project may have several employees working on it
Chapter 4 27
Chapter 4 28
What is a relational database?
A database in which:
entity classes are represented by TABLES (also
specific entity instances are represented by ROWS
in the TABLE (records)
attributes are represented by COLUMNS in the
relationships between entities are represented by
associations between primary and foreign keys
Chapter 4 29
A Table in Microsoft Access
Popular examples of relational databases
are Microsoft Access and Oracle.
Chapter 4 30
Relationships View of the tables in
the Employee Database (MS Access)
8 M:N relationships require an
additional table, called an
intersection (or junction) table.
The relationships are The M:N relationship between
implemented via associations Employees and Projects is
between primary keys implemented via the
(shown here in boldface) and EmployeeProject intersection
foreign keys of tables
Chapter 4 31
Design View of Employees Table
is of a
Chapter 4 32
Design View of Employees Table
for a field
Chapter 4 33
1:N Relationship Between
Departments and Employees
The DepartmentID field of the Employees table is a
foreign key. It references the DepartmentID field of
the Departments table (primary key). In this way, we
can see that Sam Smith, Mike Mitri, Alice Friedman,
and Brendan Mitri are all in the Payroll department.
(DepartmentID = 2).
A department has several employees, but each
employee is in only one department.
Chapter 4 34
M:N relationship between
Employees and Projects
The EmployeeProject table is an intersection table that implements the M:N
relationship. The EmployeeID field of the EmployeeProject table is a foreign key
that references the EmployeeID field of the Employees table. Likewise for the
ProjectID fields. Here we see that James Smith is one of the four employees who
works on Accounts Payable project. James Smith also works on the Accounts
Each employee can have several projects and each project can have several
Chapter 4 35
Normalization for Well-Structured
Normalization is a method for analyzing and
reducing a relational database to its most
streamlined form for:
Maximum data integrity;
Best processing performance.
Normalized data is when attributes in the
table depend only on the primary key.
Chapter 4 36
Chapter 4 37
Chapter 4 38
Relationships between Tables
in Normalized Database
Chapter 4 39
A database query is a request view certain,
selected data from a database.
Chapter 4 40
Structured query language (SQL) is the
most popular query language used to request
Query by example (QBE) is a graphical
grid or template that a user fills out to
construct a sample or description of the data
Chapter 4 41
The SELECT Statement
Used for queries on tables in a relational database
Parts of the SELECT statement:
List the columns (and expressions) that should be returned from the query
Indicate the table(s) or view(s) from which data will be obtained
Indicate the conditions under which a row will be included in the result
Indicate categorization of results
Indicate the conditions under which a category (group) will be included
Sorts the result according to specified criteria
Chapter 4 42
Suppose we have these tables
Chapter 4 43
Query Example #1
Get Names of Employees in
Chapter 4 44
Query Example #2
Get Names and Salaries of
Chapter 4 45
Query Example #3
Get Summary Information
(average salaries of different job titles)
Chapter 4 46
Query Example #4
Get Related Data from Multiple Tables
(This kind of query is called a JOIN)
Chapter 4 47
Chapter 4 48
Data Life Cycle
Chapter 4 49
Data warehouse is a repository of historical
data organized by subject to support decision
makers in the organization and include:
Online analytical processing which involves
the analysis of accumulated data by end users;
Multidimensional data structure which allows
data to be represented in a three-dimensional
matrix (or data cube).
Chapter 4 50
Benefits of Data Warehousing
End users can access data quickly and easily
via Web browsers because they are located in
End users can conduct extensive analysis
with data in ways that may not have been
End users have a consolidated view of
Chapter 4 51
Data Marts & Data Mining
Data mart is a small data warehouse,
designed for the end-user needs in a strategic
business unit (SBU) or a department.
Data mining involves searching for valuable
business information in a large database, data
warehouse, or data mart.
Used to predict trends and behaviors.
Identify previously unknown patterns.
Chapter 4 52
Data Warehouse Framework and
Chapter 4 53
Chapter 4 54
Converted to Multidimensional
Chapter 4 55
Chapter 4 56
Online Analytical Processing
Enables mangers and analysts to interactively
examine and manipulate large amounts of
detailed and consolidated data from many
Chapter 4 57
Consolidation – aggregation of data
Drill-down – detail data that comprise
Slice and Dice – ability to look at the
database from different viewpoints
Chapter 4 58
Online Demo of OLAP Technology
Chapter 4 59
be visualized as
Chapter 4 60
OLAP can be Done in Excel:
Three dimensions (page, row, column)
This enables slice-and-dice, allowing user to view any
combination of variables for each dimension
Summary information can be viewed for any
combination of variables
This enables consolidation (aggregation of data)
For any given piece of summary information, user
can get details
This enables drill-down, obtaining details of aggregate
Chapter 4 61
Analyzing the data in a data warehouse or data mart to reveal
hidden patterns and trends in historical business activity
Chapter 4 62
Data Mining Uses
Perform “market-basket analysis” to identify new
Find root causes to quality or manufacturing
Prevent customer attrition and acquire new
Cross-sell to existing customers.
Profile customers with more accuracy.
Chapter 4 63
Data Mining Applications
Retailing and sales. Predict sales, prevent theft and fraud,
determine correct inventory levels and distribution
Banking. Forecast levels of bad loans, fraudulent credit card
use, predict credit card spending by new customers, etc.
Manufacturing and production. Predict machinery
failures, find key factors to help optimize manufacturing
Insurance. Forecast claim amounts, medical coverage costs,
predict which customers will buy new insurance policies.
Chapter 4 64
Data Mining Applications
Policework. Track crime patterns, locations,
criminal behavior; identify attributes to assist in
solving criminal cases.
Health care. Correlate demographics of patients
with critical illnesses, develop better insight to
identify and treat symptoms and their causes.
Marketing. Classify customer demographics to
predict how customers will respond to mailing or
buy a particular product.
Chapter 4 65
Chapter 4 66
Data Visualization Systems
Graphically represent complex data using
interactive three-dimensional forms such as
charts, graphs, and maps
Data visualization tools help users to
interactively sort, subdivide, combine, and
organize data while it is in its graphical form.
Chapter 4 67
Chapter 4 68
Chapter 4 69
Chapter 4 70
Data Visualization Technologies
Geographic Information Systems (GIS) is a
computer-based system for capturing, integrating,
manipulating and displaying data using digitized
maps. (example: GoogleMaps)
Find locations for new restaurants.
Emerging GIS applications integrated with global
positioning systems (GPSs).
Virtual Reality is interactive, computer-generated,
three-dimensional graphics delivered to the user
through a head-mounted display.
Chapter 4 71
Chapter 4 72
Chapter 4 73
Chapter 4 74
Knowledge management (KM) is a process that
helps organizations manipulate important
knowledge that is part of the organization’s
memory, usually in an unstructured format.
Knowledge is information that is contextual,
relevant and actionable; information in action.
Intellectual capital (or intellectual assets) is
another term often used for knowledge.
Chapter 4 75
Explicit knowledge deals with more objective, rational and
Tacit knowledge is the cumulative store of subjective or
Knowledge management systems (KMSs) use modern
information technologies – Internet, intranets, extranets, data
warehouses - to systemize, enhance and expedite intrafirm
and interfirm knowledge management.
Best practices are the most effective and efficient ways of
doing things, readily available to a wide range of employees.
Chapter 4 76
Knowledge Management System
Chapter 4 77
Knowledge Management System
Create knowledge. Determine new ways.
Capture knowledge. Identify as valuable.
Refine knowledge. Make it actionable.
Store knowledge. Store in a reasonable format.
Manage knowledge. Verify it is relevant, accurate.
Disseminate knowledge. Made available.
Chapter 4 78