Incremental Load in Datastage by zkw15238

VIEWS: 172 PAGES: 20

More Info
									Building a data Warehouse in a
 Manufacturing Environment

          Brian Laks
       Alexey Leontovich
   What is a Data Warehouse?
• Transactional data which has been
  reorganized specifically for fast data
• A system which answers Business
  Intelligence (BI) questions.
• A set of tools to query, present, and
  analyze information.
Advantages of Data Warehouse
• Allows to query multiple databases in a
  unified way.
• Queries do not impact transactional
• Quickly answers Business Intelligence
• Generates answers to questions you may
  not have thought of.
Steps to Creating Data Warehouse
   Gather                                                   Data Experts List

                                                                                Data Model
               Logical Design

                                         Data Design                                                            Schema

                                          Is all Data   Y                                                     ETL Logic Scripts

                                                                                    Test Data                      Test Scripts

                                                                                     Is Data
                                                                                                Y   Generate Report

                                                                     NO                                                                           Report

                                                                                                     Does Report
                                                                                                      Answer BI              Test And Publish
                                                                                                      Question?                  Reports

     Gathering Requirements
• In the classic software engineering sense,
  this is the most important step
• Identify Stakeholders and Experts
• Identify Key Business Intelligence
  questions from the top down.
      Example BI Questions from
• What types of defects are for a particular part
• What are the top 10 defects for a particular
  Manufacturing Order?
• Which operators are finding the defects?
• Which Site has the most defects
Designing the Data Warehouse
• Fact Tables: Grow quickly. Represent Key
  BI information.
  – What Kind of work was done.
  – What kinds of problems were found
• Dimensions: Slowly growing, represent
  information about a fact or several fact
  – When was the work done
  – Who did the work
    Convert ER Diagram in OLTP
      System to OLAP Cubes

                 Defect             Test         Repair

           Tracked                Unit                   Manufacturing
           History                                       Order

                                                         Part Number

D                         D                                D
e                         e                                e
f                         f                                f
e                         e                                e
c                         c                                c
t                         t                                t
s                         s                                s
    P art Number              Manufacturing                            Operator
     Data Warehouse Schema
• Star Schema               Employee
                            -employee key                          -Station key
  – Single fact with many   -Employee Name
                            -shift                Defect Fact
    dimensions              -Pay Rate             Table
                                                  -Employee Key
                                                  -Part key
                                                  -Route key
                            Time                  -Time Key        Route

• Snowflake schema          Dimension             -Station Key     Dimension
                            -Time key             -MO Key          -Route key
                            -Hour                 -Defect Code     -Part Number
  adds additional           -Shift
                                                  -Repair Code     -Route Step
  dimensions about          -Quarter
                                                              MO Dimension
  dimensions.                           Part Dimension
                                        -Part key
                                                              -MO key
                                                              -Part key
                                        -Part Number          -MO start Date
                                        -Part Revision        -Promised Date
        Schema Guidelines
• KISS: Simple means faster
• Dimensions can have relationships
  between them
• Fact tables need not be normalized, but
  caution should be used.
• Relationships can exist between Fact
  tables but not recommended.
• Should be designed with the BI questions
  in mind.
 Extracting Data from OLTP DB
• Extracting required data from Online
  Transactional Processes may be slow.
  – System may currently be in production
  – OLTP DB’s are designed for fast transactions,
    not necessarily fast queries.
  – Queries become quite complex as Fact and
    Dimensional data is extracted simultaneously.
    Many joins typically exist in these queries
• Data is loaded into a temporary location.
   Incremental Load Approach
• Must identify what has been already
  extracted, to prevent duplication.
• Older data can be further aggregated.
• Must provide rollback function in case of
• Can update records much closer to real
  time with smaller and more frequent loads.
• Changing of data from one type to another.
• Processing some data to mean something different.
• Often applications are written which manipulate the data
  and do the transformations.
• Data can be tested and validated in the temporary
       Loading the OLAP DB
Always need to check for duplication of
Rollback method may be beneficial.
Complex system of updating existing
  records may be needed.
Additional aggregation after loading.
• Data Accuracy.
• Data retrieval speed.
• Does data allow answering of BI
• Reports from properly designed data
  warehouse are:
  – Accurate
  – Timely
  – Fast
  – Understandable
  – Flexible (slice and dice)
• These reports answer the BI questions
Analytical Reporting / ETL Tools
• Common Reporting
   – Excel
   – Crystal Reports
• Enterprise Reporting and Analysis / ETL
   – Cognos
   – Business Objects
   –   Data Junction
   –   Ascential DataStage
   –   Ab Initio
   –   Informatica
   –   Data Mirror
   –   SQL server, Oracle, DB2/400

To top