Data Quality

Description

Data Quality Methodology - Informatica

Reviews
Shared by: ganeshdw
Stats
views:
560
rating:
not rated
reviews:
0
posted:
6/28/2009
language:
English
pages:
0
Velocity v8 Data Quality Methodology Data Quality Executive Summary There are important considerations that organizations must be aware of when organizing a project that involves data quality. The goal here is to set forth principles that apply to most data quality projects. Experience has shown that adhering to these principles maximizes the potential for success in data quality projects. Most organizations realize that data quality shortcomings have a negative effect on the organization’s planning and performance. Increasingly, organizations are instituting data quality programs to address the known shortcomings; discover hidden data quality issues; and to create a consistent, ongoing process for monitoring and improving data quality throughout the organization. Data quality projects originate from a variety of sources, including: ● ● ● ● ● ● ● Development of an Integration Competency Center (ICC). Master data management/customer data integration (MDM/CDI) projects. Data integration projects driven by mergers or acquisitions. Data migration from older to newer systems. Data warehousing. Quality based initiatives such as Six Sigma. Addressing specific data quality issues that have been identified by management as adversely affecting performance. Regardless of the impetus, data quality projects will succeed or fail based on these common factors: ● ● ● The commitment of business owners to the success of the project. The effort expended to discover data quality issues as the project is designed. The presence of a data governance process to analyze data quality issues and approve general rules for dealing with the issues. The effective documentation of data quality business rules. ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 2 of 439 ● The establishment of scorecards and other metrics to measure data quality issues over time. The realization that data quality projects must have a continuing life cycle to discover new data quality issues that were not present when the project commenced and to prevent previous data quality issues from creeping into the data over time. ● Business Drivers The impetus for a data quality project has a greater impact on the scope of the project than it has on the critical aspects for project success. When the data quality initiative derives from a broader initiative such as ICC, MDM, or CDI, then the outcome of the project is likely to have an enterprise impact. In other words, the results of the project will potentially have a higher degree of visibility and financial impact. On the other hand, a data quality project that derives from a data integration project related to a merger or acquisition may be perceived as having only limited impact and perhaps less visibility. Nevertheless, even a department level project should be analyzed for its potential impact across the organization. ● Is the data that is scoped for the project used outside the confines of the project? Is the type of data common to other data sources not covered by the project? Is there a reasonable likelihood that the project will serve as a model for future efforts or similar projects? Are there potential side effects to the project that extend beyond the confines of the project? ● ● ● If the answer to any of these questions is “yes”, then the project likely has a broader impact than what was originally perceived. Key Success Factors Commitment of Business Owners to the Project INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 3 of 439 A strong commitment by the project’s business owners to time, resources and focus is the most critical factor that leads to the success of a data quality project. While in some cases IT may “own” the project, IT never owns the data. The business owns the data and the business must live with the data daily. Decisions about even trivial aspects of data quality business rules can have impacts that are not necessarily easy to foresee without analysis and input from the business owners. Participation by key business personnel throughout the project lifecycle will help to avert unintended surprises and unhappy end users. This requires a commitment and a sense of priority that must be driven downward from the Project Sponsor. Gaining the appropriate level of commitment requires an understanding throughout the business that data quality issues have real costs and that correcting data quality issues and maintaining a high level of data quality has ongoing benefits. Some examples include: ● Reducing customer support call times by achieving a single view of the customer. Avoiding returned shipments. Detecting regulatory compliance risks. Improving the accuracy of BI metrics. Increasing customer satisfaction. Gaining a clearer picture of supplier and customer interactions. Creating reusable data quality business rules. ● ● ● ● ● ● These are all demonstrable bottom line benefits. Effort to Discover Data Quality Issues during the Design Phase With most data quality initiatives, there are several known data quality issues at the onset of the project. However, not all issues will be known unless effort is expended to profile and analyze the data during the design phase. Informatica Data Explorer (IDE) and Informatica Data Quality (IDQ) both provide profiling and analysis capabilities critical for project success. It is almost always easier and cheaper to discover problems early in the design phase than it is to retrofit development to address problems discovered during implementation. Consequently, a significant portion of the overall project should be devoted to profiling and analysis. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 4 of 439 Some aspects of profiling can be performed quickly and efficiently: ● ● ● ● Data type discovery and enforcement on a column by column basis. Patterns in the data (e.g., phone number fields) Percentage of populated records. Fields with a limited set of valid inputs. Others can require more detailed analysis: ● ● Slugged values (e.g., “Training” as a customer name). Repurposed fields (e.g., including non-address information in an address field). End user adaptations (e.g., strings such as “#NOTE#”, “Do not ship”, etc.) Units of measure out of bounds. ● ● In the hands of an experienced user, a very high percentage of these sorts of problems can be discovered during profiling and analysis. This upfront investment in analysis will reduce development and testing times, minimize load errors and reduce project risk. Data Governance Process A data governance committee is an ideal forum for discussing and resolving decisions about data quality business rules. The committee might consist of the Project Sponsor, the Business Project Manager, the Technical Project Manager, the Quality Assurance Manager, the Test Manager, the User Acceptance Test Lead, one or more Business Analysts, one or more Data Stewards, one or more End Users, and one or more Data Quality Developers. Other team members may be adjunct members brought into meetings as their skill sets are needed. Data governance is critical to the success of data quality initiatives. A data governance committee facilitates communication between the business users (who best know the data and the impact of business rules) and the technical personnel charged with implementing and testing business rules. Data governance also serves as a forum for the prioritization of data quality business rules. Nearly every data quality project has limitations on time and resources that prevent all known data quality issues from being addressed. A data governance committee is well positioned to assess both the business and the technical impacts of INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 5 of 439 particular data quality issues in order to prioritize them intelligently. Effective Documentation of Data Quality Business Rules The data governance committee should be charged with approving and documenting all business rules developed for the project. The documentation should include a description of the field(s) governed by the rule, the detail for the rule and brief rationale for the rule. This will serve as input for the implementation of the necessary data quality plans. The data quality business rule documentation also plays into development of the test plans. As test plans are developed, the testing organization can flag business rules that may be too ambiguous to permit definitive testing. Clear documentation also facilitates reusability of business rules. Using Scorecards and other Metrics to Measure Data Quality As with data integration development, data quality development involves frequent test processing; often followed by tweaking data quality plans to implement all required business rules. One way to measure the effectiveness of this process is to create scorecards that measure key data quality indicators from one run to the next. IDQ reporting allows scorecards to be created with relative ease. These scorecards do not cease to have utility when the project reaches a “go live” state. Data quality is an ongoing discipline, not a project discipline. Continual monitoring and re-analysis is necessary if an organization is to achieve and maintain high levels of data quality. The Continuing Life Cycle of Data Quality It is a truism with data quality projects that as soon as a data quality process has completed, the quality of the data begins to decline. Thus it is essential that a data quality program not go fallow when the project goes live. A successful data quality system must continue to maintain controls, monitoring and profiling to ensure that data quality does not deteriorate over time. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 6 of 439 Last updated: 20-May-08 16:35 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 7 of 439 Roles ● Velocity Roles and Responsibilities Business Analyst Business Project Manager Data Architect Data Integration Developer Data Quality Developer Data Steward/Data Quality Steward Database Administrator (DBA) End User Project Sponsor Quality Assurance Manager Technical Project Manager Test Engineer Test Manager User Acceptance Test Lead ● ● ● ● ● ● ● ● ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 8 of 439 Velocity Roles and Responsibilities The following pages describe the roles used throughout this Guide, along with the responsibilities typically associated with each. Please note that the concept of a role is distinct from that of an employee or full time equivalent (FTE). A role encapsulates a set of responsibilities that may be fulfilled by a single person in a part-time or fulltime capacity, or may be accomplished by a number of people working together. The Velocity Guide refers to roles with an implicit assumption that there is a corresponding person in that role. For example, a task description may discuss the involvement of "the DBA" on a particular project, however, there may be one or more DBAs, or a person whose part-time responsibility is database administration. In addition, note that there is no assumption of staffing level for each role -- that is, a small project may have one individual filling the role of Data Integration Developer, Data Architect, and Database Administrator, while large projects may have multiple individuals assigned to each role. In cases where multiple people represent a given role, the singular role name is used, and project planners can specify the actual allocation of work among all relevant parties. For example, the methodology always refers to the Technical Architect, when in fact, there may be a team of two or more people developing the Technical Architecture for a very large development effort. Data Integration Project - Sample Organization Chart Last updated: 20-May-08 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 9 of 439 Business Analyst The primary role of the Business Analyst (sometimes known as the Functional Analyst) is to represent the interests of the business in the development of the data integration solution. The secondary role is to function as an interpreter for business and technical staff, translating concepts and terminology and generally bridging gaps in understanding. Under normal circumstances, someone from the business community fills this role, since deep knowledge of the business requirement is indispensable. Ideally, familiarity with the technology and the development life-cycle allows the individual to function as the communications channel between technical and business users. Reports to: ● Business Project Manager Responsibilities: ● Ensures that the delivered solution fulfills the needs of the business (should be involved in decisions related to the business requirements) Assists in determining the data integration system project scope, time and required resources Provides support and analysis of data collection, mapping, aggregation and balancing functions Performs requirements analysis, documentation, testing, ad-hoc reporting, user support and project leadership Produces detailed business process flows, functional requirements specifications and data models and communicates these requirements to the design and build teams Conducts cost/benefit assessments of the functionality requested by end-users Prioritizes and balances competing priorities Plans and authors the user documentation set ● ● ● ● ● ● ● Qualifications/Certifications ● Possesses excellent communication skills, both written and verbal INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 10 of 439 ● ● ● Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable) ● Recommended Training ● ● ● ● ● ● Interview/workshop techniques Project Management Data Analysis Structured analysis UML or other business design methodology Data Warehouse Development Last updated: 09-Apr-07 15:20 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 11 of 439 Business Project Manager The Business Project Manager has overall responsibility for the delivery of the data integration solution. As such, the Business Project Manager works with the project sponsor, technical project manager, user community, and development team to strike an appropriate balance of business needs, resource availability, project scope, schedule, and budget to deliver specified requirements and meet customer satisfaction. Reports to: ● Project Sponsor Responsibilities: ● ● ● ● Develops and manages the project work plan Manages project scope, time-line and budget Resolves budget issues Works with the Technical Project Manager to procure and assign the appropriate resources for the project Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis, documentation, ad-hoc reporting and project leadership ● ● ● Qualifications/Certifications ● ● ● ● ● ● ● Translates strategies into deliverables Prioritizes and balances competing priorities Possesses excellent communication skills, both written and verbal Results oriented team player Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Has knowledge of the tools and technologies used in the data integration INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 12 of 439 solution ● Holds certification in industry vertical knowledge (if applicable) Recommended Training ● Project Management Last updated: 06-Apr-07 17:55 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 13 of 439 Data Architect The Data Architect is responsible for the delivery of a robust scalable data architecture that meets the business goals of the organization. The Data Architect develops the logical data models, and documents the models in Entity-Relationship Diagrams (ERD). The Data Architect must work with the Business Analysts and Data Integration Developers to translate the business requirements into a logical model. The logical model is captured in the ERD, which then feeds the work of the Database Administrator, who designs and implements the physical database. Depending on the specific structure of the development organization, the Data Architect may also be considered a Data Warehouse Architect, in cooperation with the Technical Architect. This role involves developing the overall Data Warehouse logical architecture, specifically the configuration of the data warehouse, data marts, and an operational data store or staging area if necessary. The physical implementation of the architecture is the responsibility of the Database Administrator. Reports to: ● Technical Project Manager Responsibilities: ● Designs an information strategy that maximizes the value of data as an enterprise asset Maintains logical/physical data models Coordinates the metadata associated with the application Develops technical design documents Develops and communicates data standards Maintains Data Quality metrics Plans architectures and infrastructures in support of data management processes and procedures Supports the build out of the Data Warehouse, Data Marts and operational data store Effectively communicates with other technology and product team members ● ● ● ● ● ● ● ● Qualifications/Certifications INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 14 of 439 ● ● Strong understanding of data integration concepts Understanding of multiple data architectures that can support a Data Warehouse Ability to translate functional requirements into technical design specifications Ability to develop technical design documents and test case documents Experience in optimizing data loads and data transformations Industry vertical experience is essential Project Solution experience is desired Has had some exposure to Project Management Has worked with Modeling Packages Has experience with at least one RDBMS Strong Business Analysis and problem solving skills Familiarity with Enterprise Architecture Structures (Zachman/TOGAF) ● ● ● ● ● ● ● ● ● ● Recommended Training ● ● Modeling Packages Data Warehouse Development Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 15 of 439 Data Integration Developer The Data Integration Developer is responsible for the design, build, and deployment of the project's data integration component. A typical data integration effort usually involves multiple Data Integration Developers developing the Informatica mappings, executing sessions, and validating the results. Reports to: ● Technical Project Manager Responsibilities: ● Uses the Informatica Data Integration platform to extract, transform, and load data Develops Informatica mapping designs Develops Data Integration Workflows and load processes Ensures adherence to locally defined standards for all developed components Performs data analysis for both Source and Target tables/columns Provides technical documentation of Source and Target mappings Supports the development and design of the internal data integration framework Participates in design and development reviews Works with System owners to resolve source data issues and refine transformation rules Ensures performance metrics are met and tracked Writes and maintains unit tests Conduct QA Reviews Performs production migrations ● ● ● ● ● ● ● ● ● ● ● ● Qualifications/Certifications ● ● Understands data integration processes and how to tune for performance Has SQL experience INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 16 of 439 ● ● Possesses excellent communications skills Has the ability to develop work plans and follow through on assignments with minimal guidance Has Informatica Data Integration Platform experience Is an Informatica Certified Designer Has RDBMS experience Has the ability to work with business and system owners to obtain requirements and manage expectations ● ● ● ● Recommended Training ● ● ● ● ● ● ● ● ● Data Modeling PowerCenter – Level I & II Developer PowerCenter - Performance Tuning PowerCenter - Team Based Development PowerCenter - Advanced Mapping Techniques PowerCenter - Advanced Workflow Techniques PowerCenter - XML Support PowerCenter - Data Profiling PowerExchange Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 17 of 439 Data Quality Developer The Data Quality Developer (DQ Developer) is responsible for designing, testing, deploying, and documenting the project's data quality procedures and their outputs. The DQ Developer provides the Data Integration Developer with all relevant outputs and results from the data quality procedures, including any ongoing procedures that will run in the Operate phase or after project-end. The DQ Developer must provide the Business Analyst with the summary results of data quality analysis as needed during the project. The DQ Developer must also document at a functional level how the procedures work within the data quality applications. The primary tasks associated with this role are to use Informatica Data Quality and Informatica Data Explorer to profile the project source data, define or confirm the definition of the metadata, cleanse and accuracy-check the project data, check for duplicate or redundant records, and provide the Data Integration Developer with concrete proposals on how to proceed with the ETL processes. Reports to: ● Technical Project Manager Responsibilities: ● ● ● Profile source data and determine all source data and metadata characteristics Design and execute Data Quality Audit Present profiling/audit results, in summary and in detail, to the business analyst, the project manager, and the data steward Assist the business analyst/project manager/data steward in defining or modifying the project plan based on these results Assist the Data Integration Developer in designing source-to-target mappings Design and execute the data quality plans that will cleanse, de-duplicate, and otherwise prepare the project data for the Build phase Test Data Quality plans for accuracy and completeness Assist in deploying plans that will run in a scheduled or batch environment Document all plans in detail and hand-over documentation to the customer Assist in any other areas relating to the use of data quality processes, such as unit testing ● ● ● ● ● ● ● Qualifications/Certifications INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 18 of 439 ● ● ● ● Has knowledge of the tools and technologies used in the data quality solution Results oriented team player Possesses excellent communication skills, both written and verbal Must be able to work effectively with both business and technical stakeholders Recommended Training ● ● ● ● ● Data Quality Workbench I & II Data Explorer Level I PowerCenter Level I Developer Basic RDBMS Training Data Warehouse Development Last updated: 15-Feb-07 17:34 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 19 of 439 Data Steward/Data Quality Steward The Data Steward owns the data and associated business and technical rules on behalf of the Project Sponsor. This role has responsibility for defining and maintaining business and technical rules, liaising with the business and technical communities, and resolving issues relating to the data. The Data Steward will be the primary contact for all questions relating to the data, its use, processing and quality. In essence, this role formalizes the accountability for the management of organizational data. Typically the Data Steward is a key member of a Data Stewardship Committee put into place by the Project Sponsor. This committee will include business users and technical staff such as Application Experts. There is often an arbitration element to the role where data is put to different uses by separate groups of users whose requirements have to be reconciled. Reports to: ● Business Project Manager Responsibilities: ● ● ● ● ● ● ● ● ● Records the business use for defined data Identifies opportunities to share and re-use data Decides upon the target data quality metrics Monitors the progress towards, and tuning of, data quality target metrics Oversees data quality strategy and remedial measures Participates in the enforcement of data quality standards Enters, maintains and verifies data changes Ensures the quality, completeness and accuracy of data definitions Communicates concerns, issues and problems with data to the individuals that can influence change Researches and resolves data issues ● Qualifications/Certifications INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 20 of 439 ● ● Possesses strong analytical and problem solving skills Has experience in managing data standardization in a large organization, including setting and executing strategy Previous industry vertical experience is essential Possesses excellent communication skills, both written and verbal Exhibits effective negotiating skills Displays meticulous attention to detail Must be able to work effectively with both business and technical stakeholders Works independently with minimal supervision Project solution experience is desirable ● ● ● ● ● ● ● Recommended Training ● ● Data Quality Workbench Level I Data Explorer Level I Last updated: 15-Feb-07 17:34 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 21 of 439 Database Administrator (DBA) The Database Administrator (DBA) in a Data Integration Solution is typically responsible for translating the logical model (i.e., the ERD) into a physical model for implementation in the chosen DBMS, implementing the model, developing volume and capacity estimates, performance tuning, and general administration of the DBMS. In many cases, the project DBA also has useful knowledge of existing source database systems. In most cases, a DBA's skills are tied to a particular DBMS, such as Oracle or Sybase. As a result, an analytic solution with heterogeneous sources/targets may require the involvement of several DBAs. The Project Manager and Data Warehouse Administrator are responsible for ensuring that the DBAs are working in concert toward a common solution. Reports to: ● Technical Project Manager Responsibilities: ● ● ● Plans, implements and supports enterprise databases Establishes and maintains database security and integrity controls Delivers database services while managing to policies, procedures and standards Tests and implements new technical solutions Monitors and supports the database infrastructure (including clients) Develops volume and capacity estimates Proposes and implements enhancements to improve performance and reliability Provides operational support of databases, including backup and recovery Develops programs to migrate data between systems Works to resolve technical issues Contributes to technical and system architectural planning Supports data integration developers in troubleshooting performance issues Collaborates with other Departments (i.e., Network Administrators) to identify and resolve performance issues ● ● ● ● ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 22 of 439 Qualifications/Certifications ● ● ● ● ● ● ● ● Experience in database administration, backup and recovery Expertise in database configuration and tuning Appreciation of DI tool-set and associated tools Experience in developing and supporting ETL real-time and batch processes Strategic planning and system analysis Strong analytical and communication skills Able to work effectively with both business and technical stakeholders Ability to work independently with minimal supervision Recommended Training ● DBMS Administration Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 23 of 439 End User The End User is the ultimate "consumer" of the data in the data warehouse and/or data marts. As such, the end user represents a key customer constituent (management is another), and must therefore be heavily involved in the development of a data integration solution. Specifically, a representative of the End User community must be involved in gathering and clarifying the business requirements, developing the solution and User Acceptance Testing (if applicable). Reports to: ● Business Project Manager Responsibilities: ● ● ● ● Gathers and clarifies business requirements Reviews technical design proposals Participates in User Acceptance testing Provides feedback on the user experience Qualifications/Certifications ● ● Strong understanding of the business' processes Good communication skills Recommended Training ● ● Data Analyzer - Quickstart Data Analyzer - Report Development Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 24 of 439 Project Sponsor The Project Sponsor is typically a member of the business community rather than an IT/IS resource. This is important because the lack of business sponsorship is often a contributing cause of systems implementation failure. The Project Sponsor often initiates the effort, serves as project champion, guides the Project Managers in understanding business priorities, and reports status of the implementation to executive leadership. Once an implementation is complete, the Project Sponsor may also serve as "chief evangelist", bringing word of the successful implementation to other areas within the organization. Reports to: ● Executive Leadership Responsibilities: ● ● ● ● Provides the business sponsorship for the project Champions the project within the business Initiates the project effort Guides the Project Managers in understanding business requirements and priorities Assists in determining the data integration system project scope, time, budget and required resources Reports status of the implementation to executive leadership ● ● Qualifications/Certifications ● Has industry vertical knowledge Recommended Training ● N/A Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 25 of 439 Quality Assurance Manager The Quality Assurance (QA) Manager ensures that the original intent of the business case is achieved in the actual implementation of the analytic solution. This involves leading the efforts to validate the integrity of the data throughout the data integration processes, and ensuring that the utlimate data target has been accurately derived from the source data. The QA Manager can be a member of the IT organization, but serve as a liaison to the business community (i.e., the Business Analysts and End Users). In situations where issues arise with regard to the quality of the solution, the QA Manager works with project management and the development team to resolve them. Depending upon the test approach taken by the project team, the QA Manager may also serve as the Test Manager. Reports to: ● Technical Project Manager Responsibilities: ● Leads the effort to validate the integrity of the data through the data integration processes Ensures that the data contained in the data integration solution has been accurately derived from the source data Develops and maintains quality assurance plans and test requirements documentation Verifies compliance to commitments contained in quality plans Works with the project management and development teams to resolve issues Participates in the enforcement of data quality standards Communicates concerns, issues and problems with data Participates in the testing and post-production verification Together with the Technical Lead and the Repository Administrator, articulates the development standards Advises on the development methods to ensure that quality is built in Designs the QA and standards enforcement strategy Together with the Test Manager, coordinates the QA and Test strategies ● ● ● ● ● ● ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 26 of 439 ● Manages the implementation of the QA strategy Qualifications/Certifications ● ● ● Industry vertical knowledge Solid understanding of the Software Development Life Cycle Experience in quality assurance performance, auditing processes, best practices and procedures Experience with automated testing tools Knowledge of Data Warehouse and Data Integration enterprise environments Able to work effectively with both business and technical stakeholders ● ● ● Recommended Training ● ● ● ● PowerCenter Level I Developer Infomatica Data Explorer Informatica Data Quality Workbench Project Management Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 27 of 439 Technical Project Manager The Technical Project Manager has overall responsibility for managing the technical resources within a project. As such, he/she works with the project sponsor, business project manager and development team to assign the appropriate resources for a project within the scope, schedule, and budget and to ensure that project deliverables are met. Reports to: ● Project Sponsor or Business Project Manager Responsibilities: ● ● ● ● ● ● Defines and implements the methodology adopted for the project Liaises with the Project Sponsor and Business Project Manager Manages project resources within the project scope, time-line and budget Ensures all business requirements are accurate Communicates project progress to Project Sponsor(s) Is responsible for ensuring delivery on commitments and ensuring that the delivered solution fulfills the needs of the business Performs requirements analysis, documentation, ad-hoc reporting and resource leadership ● Qualifications/Certifications ● ● ● ● Translates strategies into deliverables Prioritizes and balances competing priorities Must be able to work effectively with both business and technical stakeholders Has knowledge of the tools and technologies used in the data integration solution Holds certification in industry vertical knowledge (if applicable) ● Recommended Training INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 28 of 439 ● ● ● ● Project Management Techniques PowerCenter Developer Level I PowerCenter Administrator Level I Data Analyzer Introduction Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 29 of 439 Test Engineer The Test Engineer is responsible for completion of test plans and their execution. During test planning, the Test Engineer works with the Testing Manager/Quality Assurance Manager to finalize the test plans and to ensure that the requirements are testable. The Test Engineer is also responsible for complete execution including design and implementing test scripts, test suites of test cases, and test data. The Test Engineer should be able to demonstrate knowledge of testing techniques and to provide feedback to developers. He/She uses the procedures as defined in the test strategy to execute, report results and progress of test execution and to escalate testing issues as appropriate. Reports to: ● Test Manager (or Quality Assurance Manager) Responsibilities: ● ● Provides input to the test plan and executes it Carries out requested procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans, test requirements documentation, test cases and test scripts Verifies compliance to commitments contained in the test plans Escalates issues and works to resolve them Participates in testing and post-production verification efforts Executes test scripts and documents and provides the results to the test manager Provides feedback to developers Investigates and resolves test failures ● ● ● ● ● ● ● Qualifications/Certifications ● ● ● Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 30 of 439 environments ● ● Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Industry vertical knowledge ● Recommended Training ● ● ● ● PowerCenter Developer Level I &II Data Analyzer Introduction SQL Basics Data Quality Workbench Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 31 of 439 Test Manager The Test Manager is responsible for coordinating all aspects of test planning and execution. During test planning, the Test Manager becomes familiar with the business requirements in order to develop sufficient test coverage for all planned functionality. He/she also develops a test schedule that fits into the overall project plan. Typically, the Test Manager works with a development counterpart during test execution; the development manager schedules and oversees the completion of fixes for bugs found during testing. The test manager is also responsible for the creation of the test data set. An integrated test data set is a valuable project resource in its own right; apart from its obvious role in testing, the test data set is very useful to the developers of integration and presentation components. In general, separate functional and volume test data sets will be required. In most cases, these should be derived from the production environment. It may also be necessary to manufacture a data set which triggers all the business rules and transformations specified for the application. Finally, the Test Manager must continually advocate adherence to the Test Plans. Projects at risk of delayed completion often sacrifice testing at the expense of a highquality end result. Reports to: ● Technical Project Manager (or Quality Assurance Manager) Responsibilities: ● ● Coordinates all aspects of test planning and execution Carries out procedures to ensure that Data Integration systems and services meet organization standards and business requirements Develops and maintains test plans, test requirements documentation, test cases and test scripts Develops and maintains test data sets Verifies compliance to commitments contained in the test plans Works with the project management and development teams to resolve issues Communicates concerns, issues and problems with data ● ● ● ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 32 of 439 ● ● ● Leads testing and post-production verification efforts Executes test scripts and documents and publishes the results Investigates and resolves test failures Qualifications/Certifications ● ● ● Solid understanding of the Software Development Life Cycle Experience with automated testing tools Strong knowledge of Data Warehouse and Data Integration enterprise environments Experience in a quality assurance and testing environment Experience in developing and executing test cases and in setting up complex test environments Experience in classifying, tracking and verifying bug fixes Industry vertical knowledge Able to work effectively with both business and technical stakeholders Project management ● ● ● ● ● ● Recommended Training ● ● ● PowerCenter Developer Level I Data Analyzer Introduction Data Explorer Last updated: 01-Feb-07 18:51 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 33 of 439 User Acceptance Test Lead The User Acceptance Test Lead is responsible for leading the final testing and gaining final approval from the business users. The User Acceptance Test Lead interacts with the End Users and the design team during the development effort to ensure the inclusion of all the user requirements within the original defined scope. He/ she then validates that the deployed solution meets the final user requirements. Reports to: ● Business Project Manager Responsibilities: ● ● Gathers and clarifies business requirements Interacts with the design team and end users during the development efforts to ensure inclusion of users requirements within the defined scope Reviews technical design proposals Schedules and leads the user acceptance test effort Provides test script/case training to the user acceptance test team Reports on test activities and results Validates that the deployed solution meets the final user requirements ● ● ● ● ● Qualifications/Certifications ● ● ● ● Experience planning and executing user acceptance testing Strong understanding of the business' processes Knowledge of the project solution Excellent communication skills Recommended Training ● N/A INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 34 of 439 Last updated: 12-Jun-07 16:06 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 35 of 439 Phase 1: Manage 1 Manage ● 1.1 Define Project r 1.1.2 Build Business Case ● 1.2 Plan and Manage Project r 1.2.1 Establish Project Roles 1.2.2 Develop Project Estimate 1.2.3 Develop Project Plan 1.2.4 Manage Project r r r INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 36 of 439 Phase 1: Manage Description Managing the development of a data integration solution requires extensive planning. A well-defined, comprehensive plan provides the foundation from which to build a project solution. The goal of this phase is to address the key elements required for a solid project foundation. These elements include: ● Scope - Clearly defined business objectives. The measurable, businessrelevant outcomes expected from the project should be established early in the development effort. Then, an estimate of the expected Return on Investment (ROI) can be developed to gauge the level of investment and anticipated return. The business objectives should also spell out a complete inventory of business processes to facilitate a collective understanding of these processes among project team members. Planning/Managing - The project plan should detail the project scope as well as its objectives, required work efforts, risks, and assumptions. A thorough, comprehensive scope can be used to develop a work breakdown structure (WBS) and establish project roles for summary task assignments. The plan should also spell out the change and control process that will be used for the project. Project Close/Wrap-Up - At the end of each project, the final step is to obtain project closure. Part of this closure is to ensure the completeness of the effort and obtain sign-off for the project. Additionally, a project evaluation will help in retaining lessons learned and assessing the success of the overall effort. ● ● Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 37 of 439 Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Production Supervisor (Approve) Project Sponsor (Primary) Quality Assurance Manager (Approve) Technical Architect (Primary) Technical Project Manager (Primary) Considerations None Best Practices None Sample Deliverables None Last updated: 20-May-08 18:53 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 38 of 439 Phase 1: Manage Task 1.1 Define Project Description This task entails constructing the business context for the project, defining in business terms the purpose and scope of the project as well as the value to the business (i.e., the business case). Prerequisites None Roles Business Analyst (Primary) Business Project Manager (Primary) Project Sponsor (Primary) Considerations There are no technical considerations during this task; in fact, any discussion of implementation specifics should be avoided at this time. The focus here is on defining the project deliverable in business terms with no regard for technical feasibility. Any discussion of technologies is likely to sidetrack the strategic thinking needed to develop the project objectives. Best Practices None Sample Deliverables Project Definition INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 39 of 439 Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 40 of 439 Phase 1: Manage Subtask 1.1.2 Build Business Case Description Building support and funding for a data integration solution nearly always requires convincing executive IT management of its value to the business. The best way to do this, if possible, is to actually calculate the project's estimated return on investment (ROI) through a business case that calculates ROI. ROI modeling is valuable because it: ● ● Supplies a fundamental cost-justification framework for evaluating a data integration project. Mandates advance planning among all appropriate parties, including IT team members, business users, and executive management. Helps organizations clarify and agree on the benefits they expect, and in that process, helps them set realistic expectations for the data integration solution or the data quality initiative. ● In addition to traditional ROI modeling on data integration initiatives, quantitative and qualitative ROI assessments should also include assessments of data quality. Poor data quality costs organizations vast sums in lost revenues. Defective data leads to breakdowns in the supply chain, poor business decisions, and inferior customer relationship management. Moreover, poor quality data can lead to failures in compliance with industry regulations and even to outright project failure at the IT level. It is vital to acknowledge data quality issues at an early stage in the project. Consider a data integration project that is planned and resourced meticulously but that is undertaken on a dataset where the data is of a poorer quality than anyone realized. This can lead to the classic “code-load-explode” scenario, wherein the data breaks down in the target system due to a poor understanding of the data and metadata. What is worse, a data integration project can succeed from an IT perspective but deliver little if any business value if the data within the system is faulty. For example, a CRM system containing a dataset with a large quantity of redundant or inaccurate records is likely to be of little value to the business. Often an organization does not realize it has data quality issues until it is too late. For this reason, data quality should be a consideration in ROI modeling for all data integration projects – from the beginning. For more details on how to quantify business value and associated data integration project cost, please see Assessing the Business Case. Prerequisites 1.1.1 Establish Business Project Scope Roles Business Project Manager (Secondary) Considerations The Business Case must focus on business value and, as much as possible, quantify that value. The business beneficiaries are primarily responsible for assessing the project benefits, while technical considerations drive the cost assessments. These two assessments - benefits and costs - form the basis for determining overall ROI to the business. Building the Business Case Step 1 - Business Benefits When creating your ROI model, it is best to start by looking at the expected business benefit of implementing the data integration solution. Common business imperatives include: ● ● Improving decision-making and ensuring regulatory compliance. Modernizing the business to reduce costs. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 41 of 439 ● ● ● Merging and acquiring other organizations. Increasing business profitability. Outsourcing non-core business functions to be able to focus on your company’s core value proposition. Each of these business imperatives requires support via substantial IT initiatives. Common IT initiatives include: ● ● ● ● ● Business intelligence initiatives. Retirement of legacy systems. Application consolidation initiatives. Establishment of data hubs for customer, supplier, and/or product data. Business process outsourcing (BPO) and/or Software as a Service (SaaS). For these IT initiatives to be successful, you must be able to integrate data from a variety of disparate systems. The form of those data integration projects may vary. You may have a: ● ● ● ● Data Warehousing project, which enables new business insight usually through business intelligence. Data Migration project, where data sources are moved to enable a new application or system. Data Consolidation project, where certain data sources or applications are retired in favor of another. Master Data Management project, where multiple data sources come together to form a more complex, master view of the data. Data Synchronization project, where data between two source systems need to stay perfectly consistent to enable different applications or systems. B2B Data Transformation project, where data from external partners is transformed to internal formats for processing by internal systems and responses are transformed back to partner appropriate formats. Data Quality project, where the goals are to cleanse data and to correct errors such as duplicates, missing information, mistyped information and other data deficiencies. ● ● ● Once you have established the heritage of your data integration project back to its origins in the business imperatives, it is important to estimate the value derived from the data integration project. You can estimate the value by asking questions such as: ● ● ● ● What is the business goal of this project? Is this relevant? What are the business metrics or key performance indicators associated with this goal? How will the business measure the success of this initiative? How does data accessibility affect the business initiative? Does having access to all of your data improve the business initiative? How does data availability affect the business initiative? Does having data available when it’s needed improve the business initiative? How does data quality affect the business initiative? Does having good data quality improve the business initiative? Conversely, what is the potential negative impact of having poor data quality on the business initiative? How does data auditability affect the business? Does having an audit trail of your data improve the business initiative from a compliance perspective? How does data security affect the business? Does ensuring secure data improve the business initiative? ● ● ● ● After asking the questions above, you’ll start to be able to equate business value, in a monetary number, with the data integration project. Remember to not only estimate the business value over the first year after implementation, but also over the course of time. Most business cases and associated ROI models factor in expected business value for at least three years. If you are still struggling with estimating business value with the data integration initiative, see the table below that outlines common business value categories and how they relate to various data integration initiatives: Business Value Category INCREASE REVENUE Explanation Typical Metrics Data Integration Examples INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 42 of 439 New Customer Acquisition Lower the costs of acquiring new customers - cost per new customer acquisition - cost per lead - # new customers acquired/month per sales rep or per office/store - Marketing analytics - Integration of third party data (from credit bureaus, directory services, salesforce.com, etc.) Cross-Sell / Up-Sell Increase penetration and sales - % cross-sell rate within existing customers - # products/customer - % share of wallet - customer lifetime value - Single view of customer across all products, channels - Marketing analytics & customer segmentation - Customer lifetime value analysis - Sales/agent productivity dashboard - Sales & demand analytics - Customer master data integration - Demand chain synchronization - Data sharing across design, development, production and marketing/sales teams - Data sharing with third parties e. g. contract manufacturers, channels, marketing agencies - Cross-geography/cross-channel pricing visibility - Differential pricing analysis and tracking - Promotions effectiveness analysis Sales and Channel Increase sales productivity, Management and improve visibility into demand - sales per rep or per employee - close rate - revenue per transaction New Product / Service Delivery Accelerate new product/service - # new products launched/year introductions, and improve "hit - new product/service launch time rate" of new offerings - new product/service adoption rate Pricing / Promotions Set pricing and promotions to stimulate demand while improving margins - margins - profitability per segment - cost-per-impression, cost-per-action LOWER COSTS Supply Chain Management Lower procurement costs, increase supply chain visibility, and improve inventory management - purchasing discounts - inventory turns - quote-to-cash cycle time - demand forecast accuracy - product master data integration - demand analysis - cross-supplier purchasing history - cross-enterprise inventory rollup - scheduling and production synchronization Production & Service Delivery Lower the costs to manufacture - production cycle times products and/or deliver services - cost per unit (product) - cost per transaction (service) - straight-through-processing rate Lower distribution costs and improve visibility into distribution chain - distribution costs per unit - average delivery times - delivery date reliability Logistics & Distribution - integration with third party logistics management and distribution partners Invoicing, Collections and Fraud Prevention Improve invoicing and collections efficiency, and detect/prevent fraud - # invoicing errors - DSO (days sales outstanding) - % uncollectible - % fraudulent transactions - End-of-quarter days to close - Financial reporting efficiency - Asset utilization rates - invoicing/collections reconciliation - fraud detection Financial Management Streamline financial management and reporting - Financial data warehouse/ reporting - Financial reconciliation - Asset management/tracking INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 43 of 439 MANAGE RISK Compliance Risk(e. Prevent compliance outages to -# negative audit/inspection findings g. SEC/SOX/Basel avoid investigations, penalties, - probability of compliance lapse II/PCI) and negative impact on brand - cost of compliance lapses (fines, recovery costs, lost business) - audit/oversight costs - Financial reporting - Compliance monitoring & reporting Financial/Asset Risk Management Improve risk management of key assets, including financial, commodity, energy or capital assets - errors & omissions - probability of loss - expected loss - safeguard and control costs - Risk management data warehouse - Reference data integration - Scenario analysis - Corporate performance management - Resiliency and automatic failover/recovery for all data integration processes Business Reduce downtime and lost Continuity/ business, prevent loss of key Disaster Recovery data, and lower recovery costs Risk - mean time between failure (MTBF) - mean time to recover (MTTR) - recovery time objective (RTO) - recover point objective (RPO -- data loss) Step 2 – Calculating the Costs Now that you have estimated the monetary business value from the data integration project in Step 1, you will need to calculate the associated costs with that project in Step 2. In most cases, the data integration project is inevitable – one way or another the business initiative is going to be accomplished – so it is best to compare two alternative cost scenarios. One scenario would be implementing that data integration with tools from Informatica, while the other scenario would be implementing the data integration project without Informatica’s toolset. Some examples of benchmarks to support the case for Informatica lowering the total cost of ownership (TCO) on data integration and data quality projects are outlined below: Benchmarks from Industry Analysts, Consultants, and Authors Forrester Research, "The Total Economic Impact of Deploying Informatica PowerCenter", 2004 The average savings of using a data integration/ETL tool vs. hand coding: • 31% in development costs • 32% in operations costs • 32% in maintenance costs • 35% in overall project life-cycle costs Gartner, "Integration Competency Center: Where Are Companies Today?", 2005 • The top-performing third of Integration Competency Centers (ICCs) will save an average of: • 30% in data interface development time and costs • 20% in maintenance costs INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 44 of 439 • The top-performing third of ICCs will achieve 25% reuse of integration components Larry English, Improving Data Warehouse and Business Information Quality, Wiley Computer Publishing, 1999. • "The business costs of non-quality data, including irrecoverable costs, rework of products and services, workarounds, and lost and missed revenue may be as high as 10 to 25 percent of revenue or total budget of an organization." • "Invalid data values in the typical customer database averages around 15 to 20 percent… Actual data errors, even though the values may be valid, may be 25 to 30 percent or more in those same databases." • "Large organizations often have data redundantly stored 10 times or more." Ponemon Institute-- Study of costs incurred by 14 companies that had security breaches affecting between 1,500 to 900,000 consumer records • Total costs to recover from a breach averaged $14 million per company, or $140 per lost customer record • Direct costs for incremental, out-of-pocket, unbudgeted spending averaged $5 million per company, or $50 per lost customer for outside legal counsel, mail notification letters, calls to individual customers, increased call center costs and discounted product offers • Indirect costs for lost employee productivity averaged $1.5 million per company, or $15 per customer record • Opportunity costs covering loss of existing customers and increased difficulty in recruiting new customers averaged $7.5 million per company, or $75 per lost customer record. • Overall customer loss averaged 2.6 percent of all customers and ranged as high as 11 percent In addition to lowering cost of implementing a data integration solution, Informatica adds value to the ROI model by mitigating risk in the data integration project. In order to quantify the value of risk mitigation, you should consider the cost of project overrun and the associated likelihood of overrun when using Informatica vs. when you don’t use Informatica for your data integration project. An example analysis of risk mitigation value is below: INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 45 of 439 Step 3 – Putting it all Together Once you have calculated the three year business/IT benefits and the three year costs of using PowerCenter vs. not using PowerCenter, put all of this information into a format that is easy-to-read for IT and line of business executive management. The following isa sample summary of an ROI model: INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 46 of 439 For data migration projects it is frequently necessary to prove that using Informatica technology for the data migration efforts has benefits over traditional means. To prove the value, three areas should be considered: 1. Informatica Software can reduce the overall project timeline by accelerating migration development efforts. 2. Informatica delivered migrations will have lower risk due to ease of maintenance, less development effort, higher quality of data, and increased project management tools with the metadata driven solution. 3. Availability of lineage reports as to how the data was manipulated by the data migration process and by whom. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:09 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 47 of 439 Phase 1: Manage Task 1.2 Plan and Manage Project Description This task incorporates the initial project planning and management activities as well as project management activities that occur throughout the project lifecycle. It includes the initial structure of the project team and the project work steps based on the business objectives and the project scope, and the continuing management of expectations through status reporting, issue tracking and change management. Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Primary) Technical Project Manager (Primary) Considerations In general, project management activities involve reconciling trade-offs between business requests as to functionality and timing with technical feasibility and budget considerations. This often means balancing between sensitivity to project goals and INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 48 of 439 concerns ("being a good listener") on the one hand, and maintaining a firm grasp of what is feasible ("telling the truth") on the other. The tools of the trade, apart from strong people skills (especially, interpersonal communication skills), are detailed documentation and frequent review of the status of the project effort against plan, of the unresolved issues, and of the risks regarding enlargement of scope ("change management"). Successful project management is predicated on regular communication of these project aspects with the project manager, and with other management and project personnel. For data migration projects there is often a project management office (PMO) in place The PMO is typically found in high dollar, high profile projects such as implementing a new ERP system that will often cost in the millions of dollars. It is important to identify the roles and gain the understanding of the PMO as to how these roles are needed and will intersect with the broader system implementation. More specifically, these roles will have responsibility beyond the data migration, so the resource requirements for the Data Migration must be understood and guaranteed as part of the larger effort overseen by the PMO. For B2B projects, technical considerations typically play an important role. The format of data received from partners (and replies sent to partners) forms a key consideration in overall business operations and has a direct impact on the planning and scoping of changes. Informatica recommends having the Technical Architect directly involved throughout the process. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:13 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 49 of 439 Phase 1: Manage Subtask 1.2.1 Establish Project Roles Description This subtask involves defining the roles/skill sets that will be required to complete the project. This is a precursor to building the project team and making resource assignments to specific tasks. Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Primary) Considerations The Business Project Scope established in 1.1.1 Establish Business Project Scope provides a primary indication of the required roles and skill sets. The following types of questions are useful discussion topics and help to validate the initial indicators: ● What are the main tasks/activities of the project and what skills/roles are needed to accomplish them? How complex or broad in scope are these tasks? This can indicate the level of skills needed. What responsibilities will fall to the company resources and which are offloaded to a consultant? Who (i.e. company resource or consultant) will provide the project management? Who will have primary responsibility for infrastructure requirements? ...for data architecture? ...for documentation? ... for testing? ...for deployment/training/support? ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 50 of 439 ● How much development and testing will be involved? This is a definitional activity and very distinct from the later assignment of resources. These roles should be defined as generally as possible rather than attempting to match a requirement with a resource at hand. After the project scope and required roles have been defined, there is often pressure to combine roles due to limited funding or availability of resources. There are some roles that inherently provide a healthy balance with one another, and if one person fills both of these roles, project quality may suffer. The classic conflict is between development roles and highly procedural or operational roles. For example, a QA Manager or Test Manager or Lead should not be the same person as a Project Manager or one of the development team. The QA Manager is responsible for determining the criteria for acceptance of project quality and managing quality-related procedures. These responsibilities directly conflict with the developer’s need to meet a tight development schedule. For similar reasons, development personnel are not ideal choices for filling such operational roles as Metadata Manager, DBA, Network Administrator, Repository Administrator, or Production Supervisor. Those roles require operational diligence and adherence to procedure as opposed to ad hoc development. When development roles are mixed with operational roles, resulting ‘shortcuts’ often lead to quality problems in production systems. Tip Involve the Project Sponsor. Before defining any roles, be sure that the Project Sponsor is in agreement as to the project scope and major activities, as well as the level of involvement expected from company personnel and consultant personnel. If this agreement has not been explicitly accomplished, review the project scope with the Project Sponsor to resolve any remaining questions. In defining the necessary roles, be sure to provide the Sponsor with a full description of all roles, indicating which will rely on company personnel and which will use consultant personnel. This sets clear expectations for company involvement and indicates if there is a need to fill additional roles with consultant personnel if the company does not have personnel available in accordance with the project timing. The Role Descriptions in Roles provides typical role definitions. The Project Role Matrix can serve as a starting point for completing the project-specific roles matrix. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 51 of 439 Best Practices None Sample Deliverables Project Definition Project Role Matrix Work Breakdown Structure Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 52 of 439 Phase 1: Manage Subtask 1.2.2 Develop Project Estimate Description Once the overall project scope and roles have been defined, details on project execution must be developed. These details should answer the questions of what must be done, who will do it, how long it will take, and how much will it cost. The objective of this subtask is to develop a complete WBS and, subsequently, a solid project estimate. Two important documents required for project execution are the: ● Work Breakdown Structure (WBS), which can be viewed as a list of tasks that must be completed to achieve the desired project results. (See Developing a Work Breakdown Structure (WBS) for more details) Project Estimate, which, at this time, focuses solely on development costs without consideration for hardware and software liabilities. ● Estimating a project is never an easy task, and often becomes more difficult as project visibility increases and there is an increasing demand for an "exact estimate". It is important to understand that estimates are never exact. However, estimates are useful for providing a close approximation of the level of effort required by the project. Factors such as project complexity, team skills, and external dependencies always have an impact on the actual effort required. The accuracy of an estimate largely depends on the experience of the estimator (or estimators). For example, an experienced traveller who frequently travels the route between his/her home or office and the airport can easily provide an accurate estimate of the time required for the trip. When the same traveller is asked to estimate travel time to or from an unfamiliar airport however, the estimation process becomes much more complex, requiring consideration of numerous factors such as distance to the airport, means of transportation, speed of available transportation, time of day that the travel will occur, expected weather conditions, and so on. The traveller can arrive at a valid overall estimate by assigning time estimates to each factor, then summing the whole. The resulting estimate however, is not likely to be nearly as accurate as the one based on knowledge gained through experience. The same holds true for estimating INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 53 of 439 the time and resources required to complete development on a data integration solution project. Prerequisites None Roles Business Project Manager (Primary) Data Integration Developer (Secondary) Data Quality Developer (Secondary) Data Transformation Developer (Secondary) Presentation Layer Developer (Secondary) Project Sponsor (Approve) Technical Architect (Secondary) Technical Project Manager (Secondary) Considerations An accurate estimate depends greatly on a complete and accurate Work Breakdown Structure. Having the entire project team review the WBS when it is near completion helps to ensure that it includes all necessary project tasks. Project deadlines often slip because some tasks are overlooked and, therefore, not included in the initial estimates. Sample Data Requirements for B2B Projects For B2B projects (and non B2B projects that have significant unstructured or semistructured data transformation requirements) the actual creation and subsequent QA of transformations relies on having sufficient samples of input and output data; and specifications for data formats. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 54 of 439 When estimating for projects that use Informatica’s B2B Data Transformation, estimates should include sufficient time to allow for the collection and assembly of sample data, any cleansing of sample data required (for example to conform to HIPAA or financial privacy regulations), and for any data analysis or metadata discovery to be performed on the sample data. By their nature, the full authoring of B2B data transformations cannot be completed (or in some cases proceed) without the availability of adequate sample data both for input to transformations and for comparison purposes during the quality assurance process. Best Practices None Sample Deliverables None Last updated: 20-May-08 19:17 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 55 of 439 Phase 1: Manage Subtask 1.2.3 Develop Project Plan Description In this subtask, the Project Manager develops a schedule for the project using the agreed-upon business project scope to determine the major tasks that need to be accomplished and estimates of the amount of effort and resources required. Prerequisites None Roles Business Project Manager (Primary) Project Sponsor (Approve) Technical Project Manager (Secondary) Considerations The initial project plan is based on agreements-to-date with the Project Sponsor regarding project scope, estimation of effort, roles, project timelines and any understanding of requirements. Updates to the plan (as described in Developing and Maintaining the Project Plan) are typically based on changes to scope, approach, priorities, or simply on more precise determinations of effort and of start and/or completion dates as the project unfolds. In some cases, later phases of the project, like System Test (or "alpha"), Beta Test and Deployment, are represented in the initial plan as a single set of activities, and will be more fully defined as the project progresses. Major activities (e.g., System Test, Deployment, etc.) typically involve their own full-fledged planning processes once the technical design is completed. At that time, additional activities may be added to the project plan to allow for more detailed tracking of those project activities. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 56 of 439 Perhaps the most significant message here is that an up-to-date plan is critical for satisfactory management of the project and for timely completion of its tasks. Keeping the plan updated as events occur and client understanding or needs and expectations change requires an on-going effort. The sooner the plan is updated and changes communicated to the Project Sponsor and/or company management, the less likely that expectations will be frustrated to a problematic level. Best Practices Data Migration Velocity Approach Sample Deliverables Project Roadmap Work Breakdown Structure Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 57 of 439 Phase 1: Manage Subtask 1.2.4 Manage Project Description In the broadest sense, project management begins before the project starts and continues until its completion and perhaps beyond. The management effort includes: ● ● Managing the project beneficiary relationship(s), expectations and involvement Managing the project team, its make-up, involvement, priorities, activities and schedule Managing all project issues as they arise, whether technical, logistical, procedural, or personal. ● In a more specific sense, project management involves being constantly aware of, or preparing for, anything that needs to be accomplished or dealt with to further the project objectives, and making sure that someone accepts responsibility for such occurrences and delivers in a timely fashion. Project management begins with pre-engagement preparation and includes: ● Project Kick-off, including the initial project scope, project organization, and project plan Project Status and reviews of the plan and scope Project Content Reviews, including business requirements reviews and technical reviews Change Management as scope changes are proposed, including changes to staffing or priorities Issues Management Project Acceptance and Close ● ● ● ● ● Prerequisites None INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 58 of 439 Roles Business Project Manager (Primary) Project Sponsor (Review Only) Technical Project Manager (Primary) Considerations In all management activities and actions, the Project Manager must balance the needs and expectations of the Project Sponsor and project beneficiaries with the needs, limitations and morale of the project team. Limitations and specific needs of the team must be communicated clearly and early to the Project Sponsor and/or company management to mitigate unwarranted expectations and avoid an escalation of expectation-frustration that can have a dire effect on the project outcome. Issues that affect the ability to deliver in any sense, and potential changes to scope, must be brought to the Project Sponsor's attention as soon as possible and managed to satisfactory resolution. In addition to "expectation management", project management includes Quality Assurance for the project deliverables. This involves soliciting specific requirements with subsequent review of deliverables that include in addition to the data integration solution documentation, user interfaces, knowledge-transfer and testing procedures. Best Practices None Sample Deliverables Issues Tracking Project Review Meeting Agenda Project Status Report Scope Change Assessment INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 59 of 439 Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 60 of 439 Phase 2: Analyze 2 Analyze ● 2.1 Define Business Drivers, Objectives and Goals 2.2 Define Business Requirements r ● 2.2.1 Define Business Rules and Definitions 2.2.2 Establish Data Stewardship r ● 2.3 Define Business Scope r 2.3.1 Identify Source Data Systems 2.3.2 Determine Sourcing Feasibility 2.3.3 Determine Target Requirements r r ● ● ● 2.6 Determine Technical Readiness 2.7 Determine Regulatory Requirements 2.8 Perform Data Quality Audit r 2.8.1 Perform Data Quality Analysis of Source Data 2.8.2 Report Analysis Results to the Business r INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 61 of 439 Phase 2: Analyze Description Increasingly, organizations demand faster, better, and cheaper delivery of data integration and business intelligence solutions. Many development failures and project cancellations can be traced to an absence of adequate upfront planning and scope definition. Inadequately defined or prioritized objectives and project requirements foster scenarios where project scope becomes a moving target as requirements may change late in the game, requiring repeated rework of design or even development tasks. The purpose of the Analyze Phase is to build a solid foundation for project scope through a deliberate determination of the business drivers, requirements, and priorities that will form the basis of the project design and development. Once the business case for a data integration or business intelligence solution is accepted and key stakeholders are identified, the process of detailing and prioritizing objectives and requirements can begin - with the ultimate goal of defining project scope and, if appropriate, a roadmap for major project stages. Prerequisites None Roles Application Specialist (Primary) Business Analyst (Primary) Business Project Manager (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 62 of 439 Data Steward/Data Quality Steward (Primary) Database Administrator (DBA) (Primary) Legal Expert (Primary) Metadata Manager (Primary) Project Sponsor (Secondary) Security Manager (Primary) System Administrator (Primary) Technical Architect (Primary) Technical Project Manager (Primary) Considerations Functional and technical requirements must focus on the business goals and objectives of the stakeholders, and must be based on commonly agreed-upon definitions of business information. The initial business requirements are then compared to feasibility studies of the source systems to help the prioritization process that will result in a project roadmap and rough timeline. This sets the stage for incremental delivery of the requirements so that some important needs are met as soon as possible, thereby providing value to the business even though there may be a much longer timeline to complete the entire project. In addition, during this phase it can be valuable to identify the available technical metadata as a way to accelerate the design and improve its quality. A successful Analyze Phase can serve as a foundation for a successful project. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 63 of 439 Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 64 of 439 Phase 2: Analyze Task 2.1 Define Business Drivers, Objectives and Goals Description In many ways, the potential for success of any data integration/business intelligence solution correlates directly to the clarity and focus of its business scope. If the business objectives are vague, there is a much higher risk of failure or, at least, of a less-thandirect path to likely limited success. Business Drivers The business drivers explain why the solution is needed and is being recommended at a particular time by identifying the specific business problems, issues, or increased business value that the project is likely to resolve or deliver. Business drivers may include background information necessary to understand the problems and/or needs. There should be clear links between the project’s business drivers and the company’s underlying business strategies. Business Objectives Objectives are concrete statements describing what the project is trying to achieve. Objectives should be explicitly defined so that they can be evaluated at the conclusion of a project to determine if they were achieved. Objectives written for a goal statement are nothing more than a deconstruction of the goal statement into a set of necessary and sufficient objective statements. That is, every objective must be accomplished to reach the goal, and no objective is superfluous. Objectives are important because they establish a consensus between the project sponsor and the project beneficiaries regarding the project outcome. The specific deliverables of an IT project, for instance, may or may not make sense to the project sponsor. However, the business objectives should be written so they are understandable by all of the project stakeholders. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 65 of 439 Business Goals Goal statements provide the overall context for what the project is trying to accomplish. They should align with the company's stated business goals and strategies. Project context is established in a goal statement by stating the project's object of study, its purpose, its quality focus, and its viewpoint. Characteristics of a well-defined goal should reference the project's business benefits in terms of cost, time, and/or quality. Because goals are high-level statements, it may take more than one project to achieve a stated goal. If the goal's achievement can be measured, it is probably defined at too low a level and may actually be an objective. If the goal is not achievable through any combination of projects, it is probably too abstract and may be a vision statement. Every project should have at least one goal. It is the agreement between the company and the project sponsor about what is going to be accomplished by the project. The goal provides focus and serves as the compass for determining if the project outcomes are appropriate. In the project management life cycle, the goal is bound by a number of objective statements. These objective statements clarify the fuzzy boundary of the goal statement. Taken as a pair, the goal and objectives statements define the project. They are the foundation for project planning and scope definition. Prerequisites None Roles Business Project Manager (Review Only) Project Sponsor (Review Only) Considerations Business Drivers The business drivers must be defined using business language. Identify how the project is going to resolve or address specific business problems. Key components when identifying business drivers include: ● Describe facts, figures, and other pertinent background information to support the existence of a problem. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 66 of 439 ● Explain how the project resolves or helps to resolve the problem in terms familiar to the business. Show any links to business goals, strategies, and principles. ● Large projects often have significant business and technical requirements that drive the project's development. Consider explaining the origins of the significant requirements as a way of explaining why the project is needed. Business Objectives Before the project starts, define and agree on the project objectives and the business goals they define. The deliverables of the project are created based on the objectives not the other way around. A meeting between all major stakeholders is the best way to create the objectives and gain a consensus on them at the same time. This type of meeting encourages discussion among participants and minimizes the amount of time involved in defining business objectives and goals. It may not be possible to gather all the project beneficiaries and the project sponsor together at the same time so multiple meetings may have to be arranged with the results summarized. While goal statements are designed to be vague, a well-worded objective is Specific, Measurable, Attainable/Achievable, Realistic and Time-bound (SMART). ● ● ● ● Specific: An objective should address a specific target or accomplishment. Measurable: Establish a metric that indicates that an objective has been met. Attainable: If an objective cannot be achieved, then it's probably a goal. Realistic: Limit objectives to what can realistically be done with available resources. Time-bound: Achieve objectives within a specified time frame. ● At a minimum, make sure each objective contains four parts, as follows: ● ● ● ● An outcome - describe what the project will accomplish. A time frame - the expected completion date of the project. A measure - metric(s) that will measure success of the project. An action - how to meet the objective. The business objectives should take into account the results of any data quality investigations carried out before or during the project. If the project source data quality INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 67 of 439 is low, then the project's ability to achieve its objectives may be compromised. If the project has specific data-related objectives, such as regulatory compliance objectives, then a high degree of data quality may be an objective in its own right. For this reason, data quality investigations (such as a Data Quality Audit) should be carried out as early as is feasible in the project life-cycle. See 2.8 Perform Data Quality Audit. Generally speaking, the number of objectives comes down to how much business investment is going to be made in pursuit of the project's goals. High investment projects generally have many objectives. Low investment projects must be more modest in the objectives they pursue. There is considerable discretion in how granular a project manager may get in defining objectives. High-level objectives generally need a more detailed explanation and often lead to more definition in the project's deliverables to obtain the objective. Lower level, detailed objectives tend to require less descriptive narrative and deconstruct into fewer deliverables to obtain. Regardless of the number of objectives identified, the priority should be established by ranking the objectives with their respective impacts, costs, and risks. Business Goals The goal statement must also be written in business language so that anyone who reads it can understand it without further explanation. The goal statement should: ● ● ● Be short and to the point. Provide overall context for what the project is trying to accomplish. Be aligned to business goals in terms of cost, time and quality. Smaller projects generally have a single goal. Larger projects may have more than one goal, which should also be prioritized. Since the goal statement is meant to be succinct, regardless of the number of goals a project has, the goal statement should always be brief and to the point. Best Practices None Sample Deliverables None Last updated: 18-May-08 17:36 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 68 of 439 Phase 2: Analyze Task 2.2 Define Business Requirements Description A data integration/business intelligence solution development project typically originates from a company's need to provide management and/or customers with business analytics or to provide business application integration. As with any technical engagement, the first task is to determine clear and focused business requirements to drive the technology implementation. This requires determining what information is critical to support the project objectives and its relation to important strategic and operational business processes. Project success will be based on clearly identifying and accurately resolving these informational needs with the proper timing. The goal of this task is to ensure the participation and consensus of the project sponsor and key beneficiaries during the discovery and prioritization of these information requirements. Prerequisites None Roles Business Project Manager (Primary) Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Project Sponsor (Approve) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 69 of 439 Considerations In a data warehouse/business intelligence project, there can be strategic or tactical requirements. Strategic Requirements ● The customer management is typically interested in strategic questions that often include a significant timeframe. For example, ‘How has the turnover of product ‘x’ increased over the last year?’ or, 'What is the revenue of area ‘a’ in January of this year as compared to last year?’. Answers to strategic questions provide company executives with the information required to build on the company strengths and/or to eliminate weaknesses. Strategic requirements are typically implemented through a data warehouse type project with appropriate visualization tools. Tactical Requirements ● The tactical requirements serve the ‘day to day’ business. Operational level employees want solutions to enable them to manage their on-going work and solve immediate problems. For instance, a distributor running a fleet of trucks has an unavailable driver on a particular day. They would want to answer questions such as, 'How can the delivery schedule be altered in order to meet the delivery time of the highest priority customer?' Answers to these questions are valid and pertinent for only a short period of time in comparison to the strategic requirements. Tactical requirements are often implemented via operational data integration. Best Practices None Sample Deliverables None Last updated: 02-May-08 12:05 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 70 of 439 Phase 2: Analyze Subtask 2.2.1 Define Business Rules and Definitions Description A business rule is a compact and simple statement that represents some important aspect of a business process or policy. By capturing the rules of the business—the logic that governs its operation—systems can be created that are fully aligned with the needs of the organization. Business rules stem from the knowledge of business personnel and constrain some aspect of the business. From a technical perspective, a business rule expresses specific constraints on the creation, updating, and removal of persistent data in an information system. For example, a new bank account cannot be created unless the customer has provided an adequate proof of identification and address. Prerequisites None Roles Data Quality Developer (Secondary) Data Steward/Data Quality Steward (Primary) Legal Expert (Approve) Metadata Manager (Primary) Security Manager (Approve) Considerations Formulating business rules is an iterative process, often stemming from statements of INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 71 of 439 policy in an organization. Rules are expressed in natural language. The following set of guidelines follow best practices and provide practical instructions on how to formulate business rules: ● Start with a well-defined and agreed upon set of unambiguous definitions captured in a definitions repository. Re-use existing definitions if available. Use meaningful and precise verbs to connect the definitions captured above. Use standard expressions to constrain business rules, such as must, must not, only if, no more than, etc. For example, the total commission paid to broker ABC can be no more than xy% of the total revenue received for the sale of widgets. Use standard expressions for derivation business rules like "x is calculated from/", "summed from", etc. For example, "the departmental commission paid is calculated as the total commission multiplied by the departmental rollup rate." ● ● ● The aim is to define atomic business rules, that is, rules that cannot be decomposed further. Each atomic business rule is a specific, formal statement of a single term, fact, derivation, or constraint on the business. The components of business rules, once formulated, provide direct inputs to a subsequent conceptual data modeling and analysis phase. In this approach, definitions and connections can eventually be mapped onto a data model and constraints and derivations can be mapped onto a set of rules that are enforced in the data model. Best Practices None Sample Deliverables Business Requirements Specification Last updated: 01-Feb-07 18:43 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 72 of 439 Phase 2: Analyze Subtask 2.2.2 Establish Data Stewardship Description Data stewardship is about keeping the business community involved and focused on the goals of the project being undertaken. This subtask outlines the roles and responsibilities that key personnel can assume within the framework of an overall stewardship program. This participation should be regarded as ongoing because stewardship activities need to be performed at all stages of a project lifecycle and continue through the operational phase. Prerequisites None Roles Business Analyst (Secondary) Business Project Manager (Primary) Data Steward/Data Quality Steward (Secondary) Project Sponsor (Approve) Considerations A useful mix of personnel to staff a stewardship committee may include: ● ● ● ● An executive sponsor A business steward A technical steward A data steward Executive Sponsor INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 73 of 439 ● ● ● ● Chair of the data stewardship committee Ultimate point of arbitration Liaison to management for setting and reporting objectives Should be recruited from project sponsors or management Technical Steward ● ● ● ● Member of the data stewardship committee Liaison with technical community Reference point for technical-related issues and arbitration Should be recruited from the technical community with a good knowledge of the business and operational processes Business Steward ● ● ● ● Member of the data stewardship committee Liaison with business users Reference point for business-related issues and arbitration Should be recruited from the business community Data Steward ● ● Member of the data stewardship committee Balances data and quality targets set by the business with IT/project parameters Responsible for all issues relating to the data, including defining and maintaining business and technical rules and liaising with the business and technical communities Reference point for arbitration where data is put to different uses by separate groups of users whose requirements have to be reconciled ● ● The mix of personnel for a particular activity should be adequate to provide expertise in each of the major business areas that will be undertaken in the project. The success of the stewardship function relies on the early establishment and distribution of standardized documentation and procedures. These should be distributed to all of the team members working on stewardship activities. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 74 of 439 The data stewardship committee should be involved in the following activities: ● ● ● ● Arbitration Sanity checking Preparation of metadata Support Arbitration Arbitration means resolving data contention issues, deciding which is the best data to use, and determining how this data should best be transformed and interpreted so that it remains meaningful and consistent. This is particularly important during the phases where ambiguity needs to be resolved, for example, when conformed dimensions and standardized facts are being formulated by the analysis teams. Sanity Checking There is a role for the data stewardship committee to check the results and ensure that the transformation rules and processes have been applied correctly. This is a key verification task and is particularly important in evaluating prototypes developed in the Analyze Phase , during testing, and after the project goes live. Preparation of Metadata The data stewardship committee should be actively involved in the preparation and verification of technical and business metadata. Specific tasks are: ● ● ● ● ● ● Determining the structure and contents of the metadata Determining how the metadata is to be collected Determining where the metadata is to reside Determining who is likely to use the metadata Determining what business benefits are provided Determining how the metadata is to be acquired Depending on the tools used to determine the metadata (for example, PowerCenter Profiling option, Informatica Data Explorer), the Data Steward may take a lead role in this activity. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 75 of 439 ● Business metadata - The purpose of maintaining this type of information is to clarify context, aid understanding, and provide business users with the ability to perform high level searches for information. Business metadata is used to answer questions such as: How does this division of the enterprise calculate revenue?" ● Technical metadata - The purpose of maintaining this type of information is for impact analysis, auditing, and source-target analysis. Technical metadata is used to perform analysis such as: “What would be the impact of changing the length of a field from 20 to 30 characters and what systems would be affected?” Support The data stewardship committee should be involved in the inception and preparation of training of the user community by answering questions about data and the tools available to perform analytics. During the Analyze Phase the team would provide inputs to induction training programs prepared for system users when the project goes live. Such programs should include, for example, technical information about how to query the system and semantic information about the data that is retrieved. New Functionality The data stewardship committee needs to assess any major additions to functionality. The assessment should consider return on investment, priority, and scalability in terms of new hardware/software requirements. There may be a need to perform this activity during the Analyze Phase if functionality that was initially overlooked is to be included in the scope of the project. After the project has gone live, this activity is of key importance because new functionality needs to be assessed for ongoing development. Best Practices None Sample Deliverables None INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 76 of 439 Last updated: 15-Feb-07 17:55 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 77 of 439 Phase 2: Analyze Task 2.3 Define Business Scope Description The business scope forms the boundary that defines where the project begins and ends. Throughout the project discussions about the business requirements and objectives, it may appear that everyone views the project scope in the same way. However, there is commonly confusion about what falls inside the boundary of a specific project and what does not. Developing a detailed project scope and socializing it with your project team, sponsors, and key stakeholders is critical. Prerequisites None Roles Informatica Velocity v6 (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Steward/Data Quality Steward (Secondary) Metadata Manager (Primary) Project Sponsor (Secondary) Technical Architect (Primary) Technical Project Manager (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 78 of 439 Considerations The primary consideration in developing the business scope is balancing the highpriority needs of the key beneficiaries with the need to provide results within the nearterm. The Project Manager and Business Analysts need to determine the key business needs and determine the feasibility of meeting those needs to establish a scope that provides value, typically within a 60 to 120 day time-frame. Quick WINS are accomplishments in a relatively short time, without great expense and with a positive outcome - they can be included in the business scope. WINS stand for Ways to Implement New Solutions. Tip As a general rule, involve as many project beneficiaries as possible in the needs assessment and goal definition. A "forum" type of meeting may be the most efficient way to gather the necessary information since it minimizes the amount of time involved in individual interviews and often encourages useful dialog among the participants. However, it is often difficult to gather all of the project beneficiaries and the project sponsor together for any single meeting, so you may have to arrange multiple meetings and summarize the input for the various participants. A common mistake made by project teams is to define the project scope only in general terms. This lack of definition causes managers and key beneficiaries throughout the company to make assumptions related to their own processes or systems falling inside or outside of the scope of the project. Then later, after significant work has been completed by the project team, some managers are surprised to learn that their assumptions were not correct, resulting in problems for the project team. Other project teams report problems with "scope creep" as their project gradually takes on more and more work. The safest rule is “the more detail, the better” along with details regarding what related elements are not within scope or will be delayed to a later effort. Best Practices None Sample Deliverables None Last updated: 18-May-08 17:35 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 79 of 439 Phase 2: Analyze Subtask 2.3.1 Identify Source Data Systems Description Before beginning any work with the data, it is necessary to determine precisely what data is required to support the data integration solution. In addition, the developers must also determine what source systems house the data, where the data resides in the source systems, and how the data is accessed. In this subtask, the development project team needs to validate the initial list of source systems and source formats and obtain documentation from the source system owners describing the source system schemas. For relational systems, the documentation should include Entity-Relationship diagrams (E-R diagrams) and data dictionaries, if available. For file based data sources (e.g., unstructured, semi-structured and complex XML) documentation may also include data format specifications for both internal and public (in the case of open data format standards) and any deviations from public standards. The development team needs to carefully review the source system documentation to ensure that it is complete (i.e., specifies data owners and dependencies) and current. The team also needs to ensure that the data is fully accessible to the developers and analysts that are building the data integration solution. Prerequisites None Roles Business Analyst (Primary) Data Architect (Primary) Data Integration Developer (Primary) Data Quality Developer (Primary) Data Transformation Developer (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 80 of 439 Considerations In determining the source systems for data elements, it is important to request copies of the source system data to serve as samples for further analysis. This is a requirement in 2.8.1 Perform Data Quality Analysis of Source Data , but is also important at this stage of development. As data volumes in the production environment are often large, it is advisable to request a subset of the data for evaluation purposes. However, requesting too small of a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that truly exist. Another important element of the source system analysis is to determine the life expectancy of the source system itself. Try to determine if the source system is likely to be replaced or phased out in the foreseeable future. As companies merge, or technologies and processes improve, many companies upgrade or replace their systems. This can present challenges to the team as the primary knowledge of those systems may be replaced as well. Understanding the life expectancy of the source system will play a crucial part in the design process. For example, assume you are building a customer data warehouse for a small bank. The primary source of customer data is a system called Shucks, and you will be building a staging area in the warehouse to act as a landing area for all of the source data. After your project starts, you discover that the bank is being bought out by a larger bank and that Shucks will be replaced within three months by the larger bank's source of customer data: a system called Grins. Instead of having to redesign your entire data warehouse to handle the new source system, it may be possible to design a generic staging area that could fit any customer source system instead of building a staging area based on one specific source system. Assuming that the bulk of your processing occurs after the data has landed in the staging area, you can minimize the impact of replacing source systems by designing a generic staging area that would essentially allow you to plug in the new source system. Designing this type of staging area however, takes a large amount of planning and adds time to the schedule, but will be well worth the effort because the warehouse is now able to handle source system changes. For Data Migration, the source systems that are in scope should be understood at the start of the project. During the Analyze Phase these systems should be confirmed and communicated to all key stakeholders. If there is a disconnect between which systems are in and out of scope it is important to document and analyze the impact. Identifying new source systems may exponentially increment the amount of resources needed on the project and require re-planning. Make a point to over-communicate what systems are in-scope. INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 81 of 439 Best Practices None Sample Deliverables None Last updated: 20-May-08 19:28 INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 82 of 439 Phase 2: Analyze Subtask 2.3.2 Determine Sourcing Feasibility Description Before beginning to work with the data, it is necessary to determine precisely what data is required to support the data integration solution. In addition, the developers must determine: ● ● ● what source systems house the data. where the data resides in the source systems. how the data is accessed. Take care to focus only on data that is within the scope of the requirements. Involvement of the business community is important in order to prioritize the business data needs based upon how effectively the data supports the users' top priority business problems. Determining sourcing feasibility is a two-stage process, requiring: ● ● A thorough and high-level understanding of the candidate source systems. A detailed analysis of the data sources within these source systems. Prerequisites None Roles Application Specialist (Primary) Business Analyst (Primary) Data Architect (Primary) INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 83 of 439 Data Quality Developer (Primary) Metadata Manager (Primary) Considerations In determining the source systems for data elements, it is important to request copies of the source system data to serve as samples for further analysis. Because data volumes in the production environment are often large, it is advisable to request a subset of the data for evaluation purposes. However, requesting too small a subset can be dangerous in that it fails to provide a complete picture of the data and may hide any quality issues that exist. Particular care needs to be taken when archived historical data (e.g., data archived on tapes) or syndicated data sets (i.e., externally provided data such as market research) is required as a source to the data integration application. Additional resources and procedures may be required to sample and analyze these data sources. Candidate Source System Analysis A list of business data sources should have been prepared during the business requirements phase. This list typically identifies 20 or more types of data that are required to support the data integration solution and may include, for example, sales forecasts, customer demographic data, product information (e.g., categories and classifiers), and financial information (e.g., revenues, commissions, and budgets). The candidate source systems (i.e., where the required data can be found) can be identified based on this list. There may be a single source or multiple sources for the required data. Types of source include: ● Operational sources — The systems an organization uses to run its business. It may be any combination of the ERP and legacy operational systems. Strategic sources — The data may be sourced from existing strategic decision support systems; for example, executive information systems. External sources — Any information source provided to the organization by an external entity, such as Nielsen marketing data or Dun & Bradstreet. ● ● INFORMATICA CONFIDENTIAL Velocity v8 Methodology - Data Quality 84 of 439 The following checklist can help to evaluate the suitability of data sources, which can be particularly important for resolving contention amongst the various sources. ●The followin3c1 / 3h a r7Dnl,12 0 0 T h 3Do

Related docs
data quality plan
Views: 6  |  Downloads: 3
saskatchewan data quality committee meeting
Views: 0  |  Downloads: 0
schooling-quality-data
Views: 2  |  Downloads: 0
Data Quality
Views: 74  |  Downloads: 6
Data
Views: 15  |  Downloads: 0
Data Quality Plan
Views: 1  |  Downloads: 0
Graphics for Data Quality
Views: 4  |  Downloads: 0
Data Quality Statement
Views: 7  |  Downloads: 0
Data Quality Templates
Views: 1  |  Downloads: 0
DATA QUALITY STATEMENT
Views: 8  |  Downloads: 0