Information Management Framework
Data Quality
30 Jan 2003
1
What is quality
Quality
is dynamic concept that is continuously changing to respond to changing customer requirements Defined in 3 ways:
Conformance to specifications (DQA) Fitness for use (Surveys)
2
Quality issues
Problems
can result from:
Human error Machine error Process error
3
Purpose
Internal quality checks
Data Quality Assessment
User Feedback Analysis and Fitness for use
Data Collection
Data Store
Data Access
Data Entry Verification
Historic data
Approval to publish (vetting)
Access and Use Archive and Disposal
4
Collection Information lifecycle phases
Storage
Conformance to specifications: Quality Plan Data Quality Assessments
30 Jan 2003
5
Data Quality Assessments
Start Original use
Define Business Requirements
Data Quality management cycle
Obtain Feedback
Use/analyse data
Define Purpose of Data
Quality Investigation Module
Update Metadata
Quality Plan (Benchmarks)
Define Business/data rules
Data Quality Assessment processes
Update Data Management Plan
Update Benchmarks
System Spec Standards Prior audits
6
DQ Assessment and Remediation Process
Data Remediation Part 3 Data Management Plan Approval / Priority Process
Audit Recommendations
Data Store
Data Access
Historic data
Collection Information lifecycle phases
Storage
Access & Use
Archive/Disposal 7
Recording quality - ANZLIC
Quality
Linage
Positional accuracy Attribute accuracy Logical consistancy
Completeness
Total Input Errors
Vertical accuracy Horizonal accuracy
Attribute consistancy Spatial Coverage Temporal Coverage
Classification
8
Business rules
Each
business rule should have an expected outcome (benchmark) Business rules need to align to quality ANZLIC elements
9
Findings - DQ Processes
The
processes and guidelines are good! The Data Management Plan is important
Needs to be completed by all data sets prior to Assessment
Benchmarks
for quality established with Data Managers before DQA
10
Soil Profile
Very
large and varied data set (millions of soil properties) Where Data exists - is mostly good Many missing values Data Transformation Errors
Data on forms different to values in database Missing values set to default values in load program. 11
Data Analysis – Soil Properties
Examples
of problems:
Location Accuracy - Invalid grid references for a grid zone Mandatory Fields missing data
Nature of Exposure - 1269 records missing value
If Horizon Code begins with 'B' And ACS Order is 'SO' (Sodosol) Then pH >= 5.5 238 records in error.
Logical Inconsistencies
12
Data Analysis – Ground Water
Minimal
spatial data (point locations
only) Data where present is mostly good Many missing values
13
Examples
of problems
Invalid Key fields
Work Number of non standard format
Location Accuracy
Invalid grid references for a grid zone
Jobs completed before they started Hole depth of 36km Work Type Code - 1503 records missing value.
14
Logical Inconsistencies
Mandatory Fields missing data
Data Analysis – Ground Water
Region Region Name Database Code 10 20 30 40
Issues:
GW Licenses in LAS 3622 2000 3201 1912
GW Licenses in GDS 3420 1280 3162 1807
GW Licenses not in GDS 202 720
Percentage Missing 5%
No Load or creation date in database (only update date)
Hunter 36% North Coast 39 1% 5% Murrumbidgee 105
Sydney - South Coast
50
60 70 80
2350 911 1439 61% Impossible to apply date based business rules Lower Murray / Darling 84 42 42 50% Lachlan 1913 1371 542 28% GW licenses mandatory from 2001 onwards. Macquarie - Western 2345 2002 343 14% Murray
90 Barwon Logical
Inconsistencies:
4526
4445
81
2%
License Form A received and no GDS record (1000’s) Needs investigation
15
Data Analysis
Action Lists
Generated for each data set
Improving data quality goes beyond the identifying, measuring and fixing the data in the IT systems. Improve data capture
– – – – Train entry staff Replace entry processes Provide meaningful feedback Change motivations to encourage quality
16
Scope of Remedies
Add defensive checkers, Periodic DQ asssessments, Data cleansing
Data Quality Reporting
Data
Quality Portal
General DQ information Statistical Reporting and Monitoring
Data
Quality Exception Reporting
Management of Data Quality issues
17
Fitness for use - User needs covered later in day
30 Jan 2003
18
Improving quality
30 Jan 2003
19
Ways of improving quality
Tackle
quality at source, not downstream in the lifecycle Training data collectors in importance on getting it right Continual improvement with quality method
20
Links among Process Groups in a Phase
Planning Process
Controlling process (check)
Executing process (do)
(Arrows represent flow of information)
Closing process
( PMBOK 2000 Fig 3-1 p31)
21
22
23