Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Quality In - Quality Out

VIEWS: 26 PAGES: 31

									Quality In -Quality Out
Data Administration Management Association September 14, 1999
Elaine Stricklett
1500 N. Beauregard St. Alexandria, VA 22311 (703) 671-0700

1

AGENDA
 Introduction  Examples of Data Quality Problems  What is Data Quality Anyway?  Types and Causes of Dirty Data  Solving Data Quality Problems  Maintaining Data Quality Over Time  Summary

DAMA September 14, 1999

2

Introduction
“Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?”
(T. S. Eliot)

“Where is the information we have lost in data?”
Brown)

(John Seely

Data

Information

Knowledge

Wisdom

DAMA September 14, 1999

3

Introduction
Information?

DAMA September 14, 1999

4

Introduction
 Quality of data is important to our perception of the quality of service  Data quality problems are all around us  The costs of poor data quality are high  A Data Warehouse will not solve your data quality problems
The Data Warehouse will expose your data quality problems!
DAMA September 14, 1999

5

Examples of Data Quality Problems
 The Year 2000 “bug”  Navy reports differed on the number of submarines in the Gulf War  California’s motor-voter law  “Simple” request for information on two customers took months to satisfy  Insurance company discovery that 80% of its medical claims were for a “broken leg”  96,000 IRS refund checks were returned as “undeliverable” due to bad addresses in 1992.
DAMA September 14, 1999

6

What is Data Quality Anyway?
 Accuracy  Completeness  Consistency  Timeliness
Data Quality can best be defined as “fitness for use.” Tayi and Ballou

DAMA September 14, 1999

7

Management Properties of Data
 We talk about the “management of data” as though it were akin to the management of people, money, other resources  Many of the Properties of Data create specific Data Management requirements  This creates implications for Data Quality

Source: Thomas C. Redman

DAMA September 14, 1999

8

Data Management Implications for Data Quality
 Ease with which data can be copied and created  Inherent “uniqueness” of data  Data are intangible - properties are abstract  “Out of sight, out of mind”

DAMA September 14, 1999

9

Poor Data Quality Impacts Every Level of the Organization
 Strategic
– Less effective strategic business decisions

 Tactical
– Compromised decision-making – Inability to reengineer – Mistrust between internal organizations

 Operational
– – – – Customer dissatisfaction Increased costs Lowered employee job satisfaction and morale Loss of revenue
10

DAMA September 14, 1999

Types and Causes of Dirty Data
Some data is more likely to get dirty than other data. In order of cleanliness:  Quantities  Encoded data  Structured text  Free text

Source: Joe Celko

DAMA September 14, 1999

11

Types and Causes of Dirty Data
         Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys Non-unique identifiers

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Dummy Values
 Example: Social Security Number = 99999-9999  Causes: – improper data entry training and incentives – inadequate edit checks

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Absence of Data
 Example: Mortgage Loan Department accesses Consumer Loan information about a customer. Sex and ethnicity information are missing

 Cause: Differing business needs

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Multipurpose Fields
 Example: “Kitchen Sink” records (redefines) – Same field means different things  Cause: More than one department shares the same production system

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Cryptic Data
 Example: “Kitchen sink” fields that are used for many purposes  Cause: Purpose of fields expanded; values changed

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Contradicting Data
 Example: Property address on field shows California ZIP code and New Jersey as the city and state address or phone number with Maryland area code

 Cause: Improper edits and lazy data entry practices

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Inappropriate Use of Address Lines
 Example:
– Address Line 1: – Address Line 2: – Address Line 3: Alex – Address Line 4: Acton Burnell Consulting Group* 1500 N. Beauregard Street,

andria, VA 33211

 Cause: Data entry errors

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Violation of Business Rules
 Example: An adjustable rate mortgage loan where the value of the minimum interest rate is higher than the value of the maximum interest rate

 Causes: Improper data requirements and data modeling

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Reused Primary Keys
 Example:
– Bank Branch 84 closes – Transfers the loans it services to Branch 213 – “84” is assigned to a new branch

 Causes: Improper data requirements determination, data modeling, and policies regarding the assignment of primary keys

Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Non-unique Identifiers
 Example: Bank Branch located at 10 Main Street may be identified as Branch 65 in the Loan and Investor systems and as Branch 89 in the Savings system

 Causes: Improper data requirements determination, data modeling, and policies regarding the assignment of primary keys
Source: Larissa Moss

DAMA September 14, 1999

Source: Larissa Moss

Solving Data Quality Problems
Data Quality Audit

PerAuditSM

DAMA September 14, 1999

22

Solving Data Quality Problems
 Detect and Correct
 Process Control and Improvement  Process (Re-) Design  Prevent Future Problems
DAMA September 14, 1999

23

Detect and Correct
 Corroborate information  Database bashing  Business rule - data comparison
A person with one watch always knows what time it is; a person with two watches is never sure. Mark Twain

DAMA September 14, 1999

24

Process Control and Improvement
Sample Activity Map - ABC Toy Company Order Processing
A1 Place Order

Shipping
A23-Receive Merchandise A27-Receive Invoice

Accounting
A28 Pay Invoice

Customer

A9-Receive Notice of Denial

Forward Denial

Field Sales Representatives

A2 Take Order

A4 Transmit Order

A8-Review Denial Reason

Override Denial

Customer Service

A3 Take Order

A5 Enter Order

A11-Prioritize Pending Shipments

A15-Re-Prioritize Pending Shipments

A12-Generate Backorder

A16-Release Order

A10-Review Pending shipments

A13-Allocate Inventory

A14-Review Shortages

A17-Print Picking & Packing Slip

Warehouse
A18-Pull Merchandise A19 Ensure Pulled Order Correctly A20 Verify Count A21 Pack Order A22 Ship Order

Accounting

A6 - Review Customer Credit

Bad Credit

A7 Deny Order

Daily Ship Plan

A24 Add Freight Charges

A25 Generate Invoice

A26 Mail Invoice

A29-Receive Payment

Credit OK

D1 - Order File

D2-Inventory File

D3-Shipment File

Information Support

D4-Receivables File

DAMA September 14, 1999

25

Process (Re-)Design
 Design appropriate edits into new (or reengineered) processes  Minimize manual processes  Minimize hand-offs  Assign responsibility and accountability

DAMA September 14, 1999

26

Prevent Future Errors
 Promote a healthy Information Ecology  Begin with data accuracy at the source  Develop and promulgate a clear understanding of business rules  Manage data resources using a “living” enterprise data architecture  Establish data stewardship and accountability for data  Establish a Data Quality Assurance Program
DAMA September 14, 1999

27

Starting a Data Quality Assurance Program
 Know the required elements of successful change
– – – – Pressure for change Clear, shared vision Capacity for change Actionable first steps

 Focus carefully  Start small A Data Quality Program is the  Move quickly means for responding to business
need, not an end in itself.
DAMA September 14, 1999

28

The Data Quality Assurance Program: Roles and Responsibilities
 Leaders
– – – – CIO Chief Quality Officer Chief Financial Officer Data Administrators

 Process Owners
– Owners of the cross-functional activities that create, store, and use data and information – Data Stewards

 Information Professionals
– Responsible for data and information life cycle, quality dimensions, properties as assets – Data Quality Auditors – Information System Managers – Data Warehouse Manager Source: Thomas Redman
DAMA September 14, 1999

29

Maintaining Data Quality Over Time
 Senior management leadership and commitment is critical  Maintain frequent and clear communication about the program  Recognize and manage risks  Recognize the characteristics of data as business assets  Manage data quality processes  Apply Deming’s Fourteen Points for [Data] Quality Management
(Source: Thomas Redman and Larry English)
DAMA September 14, 1999

30

Conclusion
 Data quality problems abound  They affect all levels of the organization as well as customers  The nature of data makes the problems more difficult to solve  Changing the data quality processes achieves the most sustainable results

DAMA September 14, 1999

31


								
To top