Docstoc
EXCLUSIVE OFFER FOR DOCSTOC USERS
Try the all-new QuickBooks Online for FREE.  No credit card required.

Data quality assessment guidelines

Document Sample
Data quality assessment guidelines Powered By Docstoc
					Draft
Department of Infrastructure, Planning and Natural Resources Natural Resource Information Systems

Data quality assessment guidelines
Information Management Framework (IMF)
IMP02/10.1.3 Data quality assurance and management standards

IMF701

June 2003

Draft

Document Release Information
Reviewers
Name Jane Deller-Smith Mick Dwyer Grant Robinson Trev Mount Jonathan Doig Sue Irvine Fred De Closey Role SIP Project Representative Data Custodian Representative CNR Data Administrator Representative NRIC Editorial Reviewer Regional Reviewer 7/3/2003 Signature Date

Project approvals
Name Neil Bennett Role GM, Natural Resource Information Systems GM, Information Management & Technology (Chief Information Officer) Group GM, Natural Resource Products Signature Date

Audience
Role Operational Data Custodian and Data Managers Corporate Data Administrator Responsibility To help in planning and execution of Data Quality Assessments To review and maintain this document.

Related documents
Document Name How to write a business rule Location http://imf.dsnr/binarydata/IMF703.pdf

History
Version v2d2 Date 16 June 2003 Author-Editor(s) Adrian Richardson Notes Changes made after ALTIS comments in carrying out first DQA.

File details
Filename File server location Online location IMF701_DatQltyAssessGuide_v2d2.doc \\PAR01\DATA1\GROUP\IMTIIC\CORPORAT\IMT\COORD\NRIS\Projects\SIP\10.1_Data_Ma nagement\10.1.3 Data Quality Assurance\Guidelines and Templates\ http://imf.dsnr/binarydata/IMF701.pdf

Draft

Contents
1 BACKGROUND ........................................................................................................................ 4 1.1 1.2 1.3 1.4 2 PURPOSE OF DOCUMENT............................................................................................. 4 WHAT IS A DATA QUALITY ASSESSMENT?................................................................. 4 SUPPORTING THE BUSINESS PROCESS .................................................................... 5 DESCRIBING QUALITY ................................................................................................... 7

DATA QUALITY ASSESSMENT PROCESSES ...................................................................... 8 2.1 2.2 PRE-ASSESSMENT......................................................................................................... 8 PLANNING PHASE .......................................................................................................... 9 2.2.1 2.2.2 2.2.3 2.3 2.3.1 2.3.2 2.3.3 2.4 2.5 2.4.1 2.5.1 Define Assessment........................................................................................... 9 Design tests and expected results ................................................................... 9 Access Data ..................................................................................................... 9 Compare expected with actual ....................................................................... 10 Collate and Analyse Findings ......................................................................... 10 Produce Recommendations ........................................................................... 10 Project Closure ............................................................................................... 11 Assessment outcomes ................................................................................... 11

EXECUTION PHASE...................................................................................................... 10

CLOSURE PHASE ......................................................................................................... 11 POST ASSESSMENT..................................................................................................... 11

3 4

GLOSSARY ............................................................................................................................ 12 REFERENCES........................................................................................................................ 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

1 Background
1.1 Purpose of Document
This document is aimed at Data Managers wanting to design and carry out a Data Quality Assessment (DQA). The Data Quality Assessment Guideline provides a guideline, which is to be used during the implementation of the DSNR Information Management Framework (IMF). In addition to this document, a number of work instructions and templates are available from the IMF Intranet site to help carry out a DQA.

1.2 What is a Data Quality Assessment?
Information has a number of operational lifecycle activities, referred to as the information lifecycle (OIT, 2002 p4). The Data Quality Assessment is a separate process to any Quality Assurance (QA), or data verification. There latter are undertaken within all lifecycle phases, such the QA during data collection while the Data Quality Assessment is carried out in order to ascertain the quality of the data currently in ‘stock’. (See Figure 1) A Data Quality Assessment: • Compares the expected quality of the data, as set out in the Data Quality Plan and the actual data quality. • Provides recommendations on how to improve data quality when quality has failed to meet the benchmark. • Is separate to any verification process carried out within an individual phase of the Information lifecycle. Such as the QA process utilised in collection of data. • Ideally is carried out by an independent body external to information processes.

IMF701_DatQltyInvestGuide_v2d2.doc

Page 4 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

Data Quality Audit and Remediation Process Data Quality Audit and Remediation Process
Data Remediation Part 3 Data Management Plan Approval / Priority Process

Data Quality Assessment

Audit Recommendations

Data Collection

Data Store

Data Access

Historic data

Collection Information lifecycle phases

Storage

Access & Use

Archive/Disposal

Figure 1. Data Quality Assessment within Information Lifecycle Phases

1.3 Supporting the business process
Regular DQAs will benefit business units by ensuring the following: • • • Data custodians can properly assess the quality of the data they provide The gap between actual and expected quality is known allowing plans to be developed to align these gaps Management and clients understand the current quality and limitations of any dataset

When implemented, the information management framework (IMF) will ensure that information supports business requirements. Figure 2. Data Quality Assessment within the business process shows the IMF modules/processes that are designed to ensure that data quality meets the business requirements.

IMF701_DatQltyInvestGuide_v2d2.doc

Page 5 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

Some business requirements for collecting and managing data are as follows:

Increased need for data quality practices

Routine Work – covers routine monitoring programs and management of natural resources [incl. Land, water, soil, veg. etc.] Research and Development - includes research projects, development and implementation of new techniques, ideas and equipment Management and Analysis - used for tasks not involving data collection; includes system management tasks, tasks collating data from other tasks, for future planning Supports, or is required, to carry out a statutory function

The business requirements are defined by the use of the data and will change over time. Data might have any number of uses but, when defining the expected quality for the Assessment criteria, the primary use should be used.

Data Quality management cycle
Start Original use

Define Business Requirements

Obtain Feedback

Use/analyse data

Define Purpose of Data

Quality Investigation Module
Update Metadata

Quality Plan (Benchmarks)

Define Business/data rules

Data Quality Assessment processes

Update Data Management Plan

Update Benchmarks
System Spec Standards Prior audits

Figure 2. Data Quality Assessment within business processes

IMF701_DatQltyInvestGuide_v2d2.doc

Page 6 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

1.4 Describing quality
The Data Quality elements in this assessment guideline align with ANZLIC metadata guidelines (ANZLIC 2001 p85). This is to facilitate up to date and accurate ANZLIC metadata for datasets Accordingly, all tests should utilise the terminology of the ANZLIC metadata quality elements and each element should be addressed by at least one test. Help in deciding what Quality Element a business rule relates to can be found in ‘How to write a business rule’(IMF703) 1. Lineage 1.1. Source of Data 1.2. Processing steps 2. Positional accuracy 2.1. Determining accuracy requires comparison of a recorded position against the actual position as defined by a known datum. Positional accuracy would be determined by how close the represented position of a feature is in relationship to its actual position on the earth. This can be done through comparisons with aerial photographs or similar methods. 3. Attribute accuracy 3.1. Determining accuracy requires comparison of recorded entry against the actual as defined by predefined standards. Attribute accuracy can include a classification method used to assign values and then how well attributes conform to the classification (described as a percentage % accuracy = 100 x (Number of Accurate instances / Total Number of instances) 4. Logical consistency 4.1. How well does the data fit within logical rules of data structure. 4.2. Attribute logical consistency entails the testing of two or more functionally related attributes. The value for one attribute determines the valid values for its related attributes. (If X then Y) 4.3. Feature logical consistency is the testing for feature to feature relationships that are consistent with known or expected rules. For instance, Dryland Salinity must occur on land. 5. Completeness 5.1. Completeness of spatial coverage tests expected spatial coverage against actual coverage either as areas missed or stations covered 5.2. Completeness of temporal coverage is for time series data when there are gaps in the time recordings 5.3. Completeness of classification examines how exhaustive is the classification system and are there generalisations 5.4. Completeness of verification examines the verification method for the data 5.5. Completeness of attribution examines if each record is complete Other classifications that can be used: 6. Currency 6.1. Beginning date 6.2. End date (processing time between collection and storage if ongoing collection) 7. Status 7.1. Maintenance and update frequency 7.2. Progress The final report might also present these broken up by regions or individual data collectors if appropriate.
IMF701_DatQltyInvestGuide_v2d2.doc Page 7 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

2 Data Quality Assessment Processes
Each Data Quality Assessment (DQA) is an entirely separate project from all other projects. Establish a DQA according to Departmental Project Management Processes (PMP) http://projectoffice.imt.dlwc/guide/guide-default.cfm which must be followed. The internal processes of a Data Quality Assessment project are shown in Figure 3 below.

Data Quality Assessment process diagram 2.2 Planning
2.1 Pre-Audit

2.3 Execution

2.4 Closure

2.2.1 Define audit

2.3.1 Compare items with actual

2.4.1 Project closure

2.5 Post Audit 2.2.2 Determine tests and expected results 2.3.2 Collate and analyse findings

2.2.3 Access data

2.3.3 Produce recommendations

Figure 3. Data Quality Assessment Process diagram 3

2.1 Pre-Assessment
The quality benchmark, or expectations, should have been established based on the defined purpose for the data. It is recommended that the Data Management Plan (DMP) be completed before any DQA to ensure that the purpose of the data better understood.

IMF701_DatQltyInvestGuide_v2d2.doc

Page 8 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

2.2 Planning phase
2.2.1 Define Assessment
A DQA will only reveal the quality of a single dataset at the date of extraction or analysis. Only actively managed (or live) data should be investigated. 1. Inputs 1.1. Dataset name 1.2. Data Management Plan 1.3. Prior plans and lessons learnt 1.4. Data Quality Assessment Guidelines 2. Tools and Techniques 2.1. Project management processes 2.2. Tools or systems to be utilised for data extraction/mining 3. Outputs 3.1. Project Plan
Dataset name Prior plans

2.2.1 Define assessment

Project Plan

DQA Guideline

DMP

2.2.2 Design tests and expected results
Document a specific test for each business rule before data extraction. The guideline Developing data quality testing scripts provides information on this process. This includes some generic tests which can be applied to all data. 1. Inputs 1.1. Prior test scripts 1.2. Quality Plan (benchmarks) 1.3. How to develop data quality test scripts & process discovery 2. Tools and Techniques 2.1. Assessment test template 3. Outputs 3.1. Test script
How to Develop DQ Test Scripts Prior test scripts

Quality Plan benchmarks

2.2.2 Determine tests and expected results

Test Scripts

2.2.3 Access Data
How the data is to be queried/retrieved/extracted is fundamental to the Assessment. The tool is defined in the Assessment plan. 1. Inputs 1.1. Assessment Plan 1.2. Test Scripts 2. Tools and Techniques 2.1. SQL Scripting 2.2. Data extraction or direct query 3. Outputs 3.1. Access to data
IMF701_DatQltyInvestGuide_v2d2.doc Page 9 of 13

Assessm ent Plan 2.2.3 Access data Test Scripts Data for Asses

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

2.3 Execution phase
2.3.1 Compare expected with actual
1. Inputs 1.1. Data 1.2. Test script 2. Tools and Techniques 2.1. Data manipulation tool/spreadsheet 2.2. Execute testing 3. Outputs 3.1. Test results
Data 2.3.1 Compare items with actual Test Scripts Test Results

2.3.2 Collate and Analyse Findings
1. Inputs 1.1. Test results 2. Tools and Techniques 2.1. Review where actual fails to meet expected 2.2. Review where improvement/decline from previous Assessment 2.3. Review outstanding actions from previous Assessment 2.4. Meetings with Assessment team and data custodian to decide corrective action 3. Outputs 3.1. Action Sheets (DMP Part 3 Template)

2.3.3 Produce Recommendations
1. Inputs 1.1. Action Sheets 1.2. Assessment Report Template 2. Tools and Techniques 2.1. Priority estimating/setting 2.2. Cost estimating 2.3. Peer review 2.4. Management signoff (including Executive Data Custodian signoff) 3. Outputs 3.1. Assessment Report

IMF701_DatQltyInvestGuide_v2d2.doc

Page 10 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

2.4 Closure Phase
2.4.1 Project Closure
4. Inputs 4.1. Assessment Plan 4.2. Actual resources used 5. Tools and Techniques 5.1. Compare expected vs Actual Costs 5.2. Compare expected vs Actual Time 5.3. Compare expected vs Actual Resources used 5.4. Peer review 5.5. Management sign off (including Executive Data Custodian sign off) 6. Outputs 6.1. Lessons Learnt/Hindsight Report

2.5 Post Assessment
2.5.1 Assessment outcomes
The Assessment provides recommendations on how to improve data quality. Each business unit must then develop the recommendations into action/project plans for implementation. Actions to be carried out post assessment include: 1. Update metadata records. 2. Update Data Management Plan (Part 3) with approved actions rising from Data Quality Assessment 3. Update Benchmarks and business rules 4. File records of Assessment (ensuring later access) 5. Provide/store Approved Assessment Report on intranet site

IMF701_DatQltyInvestGuide_v2d2.doc

Page 11 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

3 Glossary
A full Glossary of terms can be found http://imf.dsnr/glossary/glossary-terms.cfm Archive Disposal DSNR DMP DQA Information lifecycle IMF Instance KRA Live data System Post Project, post actively management data stored for posterity and later reuse (cf disposal). Post Project, post actively management data which is no longer recoverable Department of Sustainable Natural Resources Data Management Plan Data Quality Assessment Comprises several phases: Collection, Storage, Access, Use and Disposal of information and is a continual process. Information Management Framework A single record within a dataset Key Result Area Refers to data where it is being actively managed. Taken from the point of where data is stored after verification until it is archived. A group of applications, including input/output devices and a database, used to help manage data through the information lifecycle (see above). Verification is a comparison against standards but is usually carried out at the end of the process to ensure quality of individual lifecycle phases.

Verification Data

IMF701_DatQltyInvestGuide_v2d2.doc

Page 12 of 13

Department of Infrastructure, Planning & Natural Resources

Data Quality Assessment Guide

Draft

4 References
ANZLIC, ANZLIC Metadata Guidelines Version 2,Feb 2001) [online] available from http://www.anzlic.org.au/asdi/metgidv2.pdf [accessed 30 Jan 2003] International Organisation for Standardisation. ISO 8402-1994. Quality Management and Quality Assurance, Geneva, Switzerland: ISO Press Natural Resource Information Management Strategy (NRIMS) Data Management Planning Guidelines. [online] Available from: http://www.nrims.nsw.gov.au/policies/plan_guide.html [accessed 28 Jan 2003] NSW Department of Information Technology and Management, Office of Information Technology.(OIT), Information Management Framework Guideline, (May 2002) [online] Available from: http://www.oit.nsw.gov.au/pages/4.3.14-IM-Framework.htm [accessed 30 Dec 2002]

IMF701_DatQltyInvestGuide_v2d2.doc

Page 13 of 13


				
DOCUMENT INFO
Shared By:
Stats:
views:1028
posted:2/6/2008
language:English
pages:13
user002 user002
About