"Incremental Load in Datastage"
Building a data Warehouse in a Manufacturing Environment Brian Laks Alexey Leontovich What is a Data Warehouse? • Transactional data which has been reorganized specifically for fast data retrieval. • A system which answers Business Intelligence (BI) questions. • A set of tools to query, present, and analyze information. Advantages of Data Warehouse • Allows to query multiple databases in a unified way. • Queries do not impact transactional performance. • Quickly answers Business Intelligence questions. • Generates answers to questions you may not have thought of. Steps to Creating Data Warehouse Requirements Gather Data Experts List Document Requirements Data Model Logical Design Data Design Schema Is all Data Y ETL Logic Scripts ETL Available Test Data Test Scripts Is Data Y Generate Report Accurate NO Report Descriptions Does Report Answer BI Test And Publish Question? Reports Reports Gathering Requirements • In the classic software engineering sense, this is the most important step • Identify Stakeholders and Experts • Identify Key Business Intelligence questions from the top down. Example BI Questions from Manufacturing: • What types of defects are for a particular part number? • What are the top 10 defects for a particular Manufacturing Order? • Which operators are finding the defects? • Which Site has the most defects Designing the Data Warehouse • Fact Tables: Grow quickly. Represent Key BI information. – What Kind of work was done. – What kinds of problems were found • Dimensions: Slowly growing, represent information about a fact or several fact tables – When was the work done – Who did the work Convert ER Diagram in OLTP System to OLAP Cubes Operator Defect Test Repair Instanc e Tracked Unit Manufacturing History Order Part Number D D D e e e f f f e e e c c c t t t s s s P art Number Manufacturing Operator Order Data Warehouse Schema • Star Schema Employee Dimension Station Dimension -employee key -Station key – Single fact with many -Employee Name -shift Defect Fact -Factory -division dimensions -Pay Rate Table -Employee Key -Line -Operation -Part key -Route key Time -Time Key Route • Snowflake schema Dimension -Station Key Dimension -Time key -MO Key -Route key -Hour -Defect Code -Part Number adds additional -Shift -Day -Repair Code -Route Step -Operation -Week dimensions about -Quarter MO Dimension dimensions. Part Dimension -Part key -MO key -Part key -Part Number -MO start Date -Part Revision -Promised Date Schema Guidelines • KISS: Simple means faster • Dimensions can have relationships between them • Fact tables need not be normalized, but caution should be used. • Relationships can exist between Fact tables but not recommended. • Should be designed with the BI questions in mind. Extracting Data from OLTP DB • Extracting required data from Online Transactional Processes may be slow. – System may currently be in production – OLTP DB’s are designed for fast transactions, not necessarily fast queries. – Queries become quite complex as Fact and Dimensional data is extracted simultaneously. Many joins typically exist in these queries • Data is loaded into a temporary location. Incremental Load Approach • Must identify what has been already extracted, to prevent duplication. • Older data can be further aggregated. • Must provide rollback function in case of failure. • Can update records much closer to real time with smaller and more frequent loads. Transformations • Changing of data from one type to another. • Processing some data to mean something different. • Often applications are written which manipulate the data and do the transformations. • Data can be tested and validated in the temporary Location. Loading the OLAP DB Always need to check for duplication of records. Rollback method may be beneficial. Complex system of updating existing records may be needed. Additional aggregation after loading. Testing • Data Accuracy. • Data retrieval speed. • Does data allow answering of BI questions. Reporting • Reports from properly designed data warehouse are: – Accurate – Timely – Fast – Understandable – Flexible (slice and dice) • These reports answer the BI questions Analytical Reporting / ETL Tools • Common Reporting – Excel – Crystal Reports • Enterprise Reporting and Analysis / ETL – Cognos – Business Objects • ETL – Data Junction – Ascential DataStage – Ab Initio – Informatica – Data Mirror – SQL server, Oracle, DB2/400