Data quality assessment guidelines 
Draft Department of Infrastructure, Planning and Natural Resources Natural Resource Information Systems Data quality assessment guidelines Information Management Framework (IMF) IMP02/10.1.3 Data quality assurance and management standards IMF701 June 2003Draft Document Release Information Reviewers Name Role Signature Date Jane Deller-Smith SIP Project Representative Mick Dwyer Data Custodian Representative Grant Robinson CNR Trev Mount Data Administrator Representative Jonathan Doig NRIC Sue Irvine Editorial Reviewer Fred De Closey Regional Reviewer 7/3/2003 Project approvals Name Role Signature Date Neil Bennett GM, Natural Resource Information Systems GM, Information Management & Technology (Chief Information Officer) Group GM, Natural Resource Products Audience Role Responsibility Operational Data Custodian and Data Managers To help in planning and execution of Data Quality Assessments Corporate Data Administrator To review and maintain this document. Related documents Document Name Location How to write a business rule http://imf.dsnr/binarydata/IMF703.pdf History Version Date Author-Editor(s) Notes v2d2 16 June 2003 Adrian Richardson Changes made after ALTIS comments in carrying out first DQA. File details Filename IMF701_DatQltyAssessGuide_v2d2.doc File server location \\PAR01\DATA1\GROUP\IMTIIC\CORPORAT\IMT\COORD\NRIS\Projects\SIP\10.1_Data_Ma nagement\10.1.3 Data Quality Assurance\Guidelines and Templates\ Online location http://imf.dsnr/binarydata/IMF701.pdfDraft Contents 1 BACKGROUND....................................................................................................................... 4 1.1 PURPOSE OF DOCUMENT............................................................................................. 4 1.2 WHAT IS A DATA QUALITY ASSESSMENT?................................................................. 4 1.3 SUPPORTING THE BUSINESS PROCESS.................................................................... 5 1.4 DESCRIBING QUALITY ................................................................................................... 7 2 DATA QUALITY ASSESSMENT PROCESSES...................................................................... 8 2.1 PRE-ASSESSMENT......................................................................................................... 8 2.2 PLANNING PHASE .......................................................................................................... 9 2.2.1 Define Assessment........................................................................................... 9 2.2.2 Design tests and expected results ................................................................... 9 2.2.3 Access Data ..................................................................................................... 9 2.3 EXECUTION PHASE...................................................................................................... 10 2.3.1 Compare expected with actual ....................................................................... 10 2.3.2 Collate and Analyse Findings......................................................................... 10 2.3.3 Produce Recommendations ........................................................................... 10 2.4 CLOSURE PHASE ......................................................................................................... 11 2.4.1 Project Closure ............................................................................................... 11 2.5 POST ASSESSMENT..................................................................................................... 11 2.5.1 Assessment outcomes ................................................................................... 11 3 GLOSSARY........................................................................................................................... 12 4 REFERENCES....................................................................................................................... 13Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 4 of 13 Draft 1 Background 1.1 Purpose of Document This document is aimed at Data Managers wanting to design and carry out a Data Quality Assessment (DQA). The Data Quality Assessment Guideline provides a guideline, which is to be used during the implementation of the DSNR Information Management Framework (IMF). In addition to this document, a number of work instructions and templates are available from the IMF Intranet site to help carry out a DQA. 1.2 What is a Data Quality Assessment? Information has a number of operational lifecycle activities, referred to as the information lifecycle (OIT, 2002 p4). The Data Quality Assessment is a separate process to any Quality Assurance (QA), or data verification. There latter are undertaken within all lifecycle phases, such the QA during data collection while the Data Quality Assessment is carried out in order to ascertain the quality of the data currently in ‘stock’. (See Figure 1) A Data Quality Assessment: • Compares the expected quality of the data, as set out in the Data Quality Plan and the actual data quality. • Provides recommendations on how to improve data quality when quality has failed to meet the benchmark. • Is separate to any verification process carried out within an individual phase of the Information lifecycle. Such as the QA process utilised in collection of data. • Ideally is carried out by an independent body external to information processes.Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 5 of 13 Draft Data Quality Audit and Remediation Process Data Quality Audit and Remediation Process Data Store Audit Recommendations Data Quality Assessment Approval /Priority Process Data Collection Data Access Data Remediation Part 3 Data Management Plan Historic data Storage Access & Use Archive/Disposal Collection Information lifecycle phases Figure 1. Data Quality Assessment within Information Lifecycle Phases 1.3 Supporting the business process Regular DQAs will benefit business units by ensuring the following: • Data custodians can properly assess the quality of the data they provide • The gap between actual and expected quality is known allowing plans to be developed to align these gaps • Management and clients understand the current quality and limitations of any dataset When implemented, the information management framework (IMF) will ensure that information supports business requirements. Figure 2. Data Quality Assessment within the business process shows the IMF modules/processes that are designed to ensure that data quality meets the business requirements.Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 6 of 13 Draft Some business requirements for collecting and managing data are as follows: The business requirements are defined by the use of the data and will change over time. Data might have any number of uses but, when defining the expected quality for the Assessment criteria, the primary use should be used. Quality Investigation Module Update Metadata Data Quality Assessment processes Prior audits Standards Use/analyse data Define Business Requirements Define Business/data rules Quality Plan (Benchmarks) System Spec Data Quality management cycle Obtain Feedback Start Update Data Management Plan Original use Define Purpose of Data Update Benchmarks Figure 2. Data Quality Assessment within business processes Routine Work – covers routine monitoring programs and management of natural resources [incl. Land, water, soil, veg. etc.] Research and Development -includes research projects, development and implementation of new techniques, ideas and equipment Management and Analysis -used for tasks not involving data collection; includes system management tasks, tasks collating data from other tasks, for future planning Supports, or is required, to carry out a statutory function Increased need for data quality practicesDepartment of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 7 of 13 Draft 1.4 Describing quality The Data Quality elements in this assessment guideline align with ANZLIC metadata guidelines (ANZLIC 2001 p85). This is to facilitate up to date and accurate ANZLIC metadata for datasets Accordingly, all tests should utilise the terminology of the ANZLIC metadata quality elements and each element should be addressed by at least one test. Help in deciding what Quality Element a business rule relates to can be found in ‘How to write a business rule’(IMF703) 1. Lineage 1.1. Source of Data 1.2. Processing steps 2. Positional accuracy 2.1. Determining accuracy requires comparison of a recorded position against the actual position as defined by a known datum. Positional accuracy would be determined by how close the represented position of a feature is in relationship to its actual position on the earth. This can be done through comparisons with aerial photographs or similar methods. 3. Attribute accuracy 3.1. Determining accuracy requires comparison of recorded entry against the actual as defined by predefined standards. Attribute accuracy can include a classification method used to assign values and then how well attributes conform to the classification (described as a percentage % accuracy = 100 x (Number of Accurate instances /Total Number of instances) 4. Logical consistency 4.1. How well does the data fit within logical rules of data structure. 4.2. Attribute logical consistency entails the testing of two or more functionally related attributes. The value for one attribute determines the valid values for its related attributes. (If X then Y) 4.3. Feature logical consistency is the testing for feature to feature relationships that are consistent with known or expected rules. For instance, Dryland Salinity must occur on land. 5. Completeness 5.1. Completeness of spatial coverage tests expected spatial coverage against actual coverage either as areas missed or stations covered 5.2. Completeness of temporal coverage is for time series data when there are gaps in the time recordings 5.3. Completeness of classification examines how exhaustive is the classification system and are there generalisations 5.4. Completeness of verification examines the verification method for the data 5.5. Completeness of attribution examines if each record is complete Other classifications that can be used: 6. Currency 6.1. Beginning date 6.2. End date (processing time between collection and storage if ongoing collection) 7. Status 7.1. Maintenance and update frequency 7.2. Progress The final report might also present these broken up by regions or individual data collectors if appropriate.Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 8 of 13 Draft 2 Data Quality Assessment Processes Each Data Quality Assessment (DQA) is an entirely separate project from all other projects. Establish a DQA according to Departmental Project Management Processes (PMP) http://projectoffice.imt.dlwc/guide/guide-default.cfm which must be followed. The internal processes of a Data Quality Assessment project are shown in Figure 3 below. 2.4 Closure 2.3 Execution 2.2 Planning 2.2.2 Determine tests and expected results 2.2.1 Define audit 2.3.1 Compare items with actual 2.3.3 Produce recommendations 2.3.2 Collate and analyse findings 2.4.1 Project closure 2.2.3 Access data 2.1 Pre-Audit 2.5 Post Audit Data Quality Assessment process diagram Figure 3. Data Quality Assessment Process diagram 3 2.1 Pre-Assessment The quality benchmark, or expectations, should have been established based on the defined purpose for the data. It is recommended that the Data Management Plan (DMP) be completed before any DQA to ensure that the purpose of the data better understood.Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 9 of 13 Draft 2.2.1 Define assessment DQA Guideline Prior plans DMP Dataset name Project Plan 2.2.2 Determine tests and expected results Quality Plan benchmarks Test Scripts How to Develop DQ Test Scripts Prior test scripts 2.2 Planning phase 2.2.1 Define Assessment A DQA will only reveal the quality of a single dataset at the date of extraction or analysis. Only actively managed (or live) data should be investigated. 1. Inputs 1.1. Dataset name 1.2. Data Management Plan 1.3. Prior plans and lessons learnt 1.4. Data Quality Assessment Guidelines 2. Tools and Techniques 2.1. Project management processes 2.2. Tools or systems to be utilised for data extraction/mining 3. Outputs 3.1. Project Plan 2.2.2 Design tests and expected results Document a specific test for each business rule before data extraction. The guideline Developing data quality testing scripts provides information on this process. This includes some generic tests which can be applied to all data. 1. Inputs 1.1. Prior test scripts 1.2. Quality Plan (benchmarks) 1.3. How to develop data quality test scripts & process discovery 2. Tools and Techniques 2.1. Assessment test template 3. Outputs 3.1. Test script 2.2.3 Access Data How the data is to be queried/retrieved/extracted is fundamental to the Assessment. The tool is defined in the Assessment plan. 1. Inputs 1.1. Assessment Plan 1.2. Test Scripts 2. Tools and Techniques 2.1. SQL Scripting 2.2. Data extraction or direct query 3. Outputs 3.1. Access to data 2.2.3 Access data Test Scripts Data for Asses Assessm ent PlanDepartment of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 10 of 13 Draft 2.3 Execution phase 2.3.1 Compare expected with actual 1. Inputs 1.1. Data 1.2. Test script 2. Tools and Techniques 2.1. Data manipulation tool/spreadsheet 2.2. Execute testing 3. Outputs 3.1. Test results 2.3.2 Collate and Analyse Findings 1. Inputs 1.1. Test results 2. Tools and Techniques 2.1. Review where actual fails to meet expected 2.2. Review where improvement/decline from previous Assessment 2.3. Review outstanding actions from previous Assessment 2.4. Meetings with Assessment team and data custodian to decide corrective action 3. Outputs 3.1. Action Sheets (DMP Part 3 Template) 2.3.3 Produce Recommendations 1. Inputs 1.1. Action Sheets 1.2. Assessment Report Template 2. Tools and Techniques 2.1. Priority estimating/setting 2.2. Cost estimating 2.3. Peer review 2.4. Management signoff (including Executive Data Custodian signoff) 3. Outputs 3.1. Assessment Report Test Results 2.3.1 Compare items with actual Test Scripts DataDepartment of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 11 of 13 Draft 2.4 Closure Phase 2.4.1 Project Closure 4. Inputs 4.1. Assessment Plan 4.2. Actual resources used 5. Tools and Techniques 5.1. Compare expected vs Actual Costs 5.2. Compare expected vs Actual Time 5.3. Compare expected vs Actual Resources used 5.4. Peer review 5.5. Management sign off (including Executive Data Custodian sign off) 6. Outputs 6.1. Lessons Learnt/Hindsight Report 2.5 Post Assessment 2.5.1 Assessment outcomes The Assessment provides recommendations on how to improve data quality. Each business unit must then develop the recommendations into action/project plans for implementation. Actions to be carried out post assessment include: 1. Update metadata records. 2. Update Data Management Plan (Part 3) with approved actions rising from Data Quality Assessment 3. Update Benchmarks and business rules 4. File records of Assessment (ensuring later access) 5. Provide/store Approved Assessment Report on intranet siteDepartment of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 12 of 13 Draft 3 Glossary A full Glossary of terms can be found http://imf.dsnr/glossary/glossary-terms.cfm Archive Post Project, post actively management data stored for posterity and later reuse (cf disposal). Disposal Post Project, post actively management data which is no longer recoverable DSNR Department of Sustainable Natural Resources DMP Data Management Plan DQA Data Quality Assessment Information lifecycle Comprises several phases: Collection, Storage, Access, Use and Disposal of information and is a continual process. IMF Information Management Framework Instance A single record within a dataset KRA Key Result Area Live data Refers to data where it is being actively managed. Taken from the point of where data is stored after verification until it is archived. System A group of applications, including input/output devices and a database, used to help manage data through the information lifecycle (see above). Verification Data Verification is a comparison against standards but is usually carried out at the end of the process to ensure quality of individual lifecycle phases.Department of Infrastructure, Planning & Natural Resources Data Quality Assessment Guide IMF701_DatQltyInvestGuide_v2d2.doc Page 13 of 13 Draft 4 References ANZLIC, ANZLIC Metadata Guidelines Version 2,Feb 2001) [online] available from http://www.anzlic.org.au/asdi/metgidv2.pdf [accessed 30 Jan 2003] International Organisation for Standardisation. ISO 8402-1994. Quality Management and Quality Assurance, Geneva, Switzerland: ISO Press Natural Resource Information Management Strategy (NRIMS) Data Management Planning Guidelines. [online] Available from: http://www.nrims.nsw.gov.au/policies/plan_guide.html [accessed 28 Jan 2003] NSW Department of Information Technology and Management, Office of Information Technology.(OIT), Information Management Framework Guideline, (May 2002) [online] Available from: http://www.oit.nsw.gov.au/pages/4.3.14-IM-Framework.htm [accessed 30 Dec 2002]