Incremental Load in Datastage by zkw15238

VIEWS: 172 PAGES: 20

More Info
									Building a data Warehouse in a
 Manufacturing Environment

          Brian Laks
       Alexey Leontovich
   What is a Data Warehouse?
• Transactional data which has been
  reorganized specifically for fast data
  retrieval.
• A system which answers Business
  Intelligence (BI) questions.
• A set of tools to query, present, and
  analyze information.
Advantages of Data Warehouse
• Allows to query multiple databases in a
  unified way.
• Queries do not impact transactional
  performance.
• Quickly answers Business Intelligence
  questions.
• Generates answers to questions you may
  not have thought of.
Steps to Creating Data Warehouse
                                Requirements
   Gather                                                   Data Experts List
                                 Document
Requirements



                                                                                Data Model
               Logical Design


                                         Data Design                                                            Schema




                                          Is all Data   Y                                                     ETL Logic Scripts
                                                                     ETL
                                           Available



                                                                                    Test Data                      Test Scripts




                                                                                     Is Data
                                                                                                Y   Generate Report
                                                                                    Accurate



                                                                     NO                                                                           Report
                                                                                                                                                Descriptions

                                                                                                     Does Report
                                                                                                      Answer BI              Test And Publish
                                                                                                      Question?                  Reports




                                                                                                                                                  Reports
     Gathering Requirements
• In the classic software engineering sense,
  this is the most important step
• Identify Stakeholders and Experts
• Identify Key Business Intelligence
  questions from the top down.
      Example BI Questions from
          Manufacturing:
• What types of defects are for a particular part
  number?
• What are the top 10 defects for a particular
  Manufacturing Order?
• Which operators are finding the defects?
• Which Site has the most defects
Designing the Data Warehouse
• Fact Tables: Grow quickly. Represent Key
  BI information.
  – What Kind of work was done.
  – What kinds of problems were found
• Dimensions: Slowly growing, represent
  information about a fact or several fact
  tables
  – When was the work done
  – Who did the work
    Convert ER Diagram in OLTP
      System to OLAP Cubes
                                              Operator

                 Defect             Test         Repair
                                    Instanc
                                    e


           Tracked                Unit                   Manufacturing
           History                                       Order


                                                         Part Number




D                         D                                D
e                         e                                e
f                         f                                f
e                         e                                e
c                         c                                c
t                         t                                t
s                         s                                s
    P art Number              Manufacturing                            Operator
                                 Order
     Data Warehouse Schema
• Star Schema               Employee
                            Dimension
                                                                   Station
                                                                   Dimension
                            -employee key                          -Station key
  – Single fact with many   -Employee Name
                            -shift                Defect Fact
                                                                   -Factory
                                                                   -division
    dimensions              -Pay Rate             Table
                                                  -Employee Key
                                                                   -Line
                                                                   -Operation
                                                  -Part key
                                                  -Route key
                            Time                  -Time Key        Route

• Snowflake schema          Dimension             -Station Key     Dimension
                            -Time key             -MO Key          -Route key
                            -Hour                 -Defect Code     -Part Number
  adds additional           -Shift
                            -Day
                                                  -Repair Code     -Route Step
                                                                   -Operation
                            -Week
  dimensions about          -Quarter
                                                              MO Dimension
  dimensions.                           Part Dimension
                                        -Part key
                                                              -MO key
                                                              -Part key
                                        -Part Number          -MO start Date
                                        -Part Revision        -Promised Date
        Schema Guidelines
• KISS: Simple means faster
• Dimensions can have relationships
  between them
• Fact tables need not be normalized, but
  caution should be used.
• Relationships can exist between Fact
  tables but not recommended.
• Should be designed with the BI questions
  in mind.
 Extracting Data from OLTP DB
• Extracting required data from Online
  Transactional Processes may be slow.
  – System may currently be in production
  – OLTP DB’s are designed for fast transactions,
    not necessarily fast queries.
  – Queries become quite complex as Fact and
    Dimensional data is extracted simultaneously.
    Many joins typically exist in these queries
• Data is loaded into a temporary location.
   Incremental Load Approach
• Must identify what has been already
  extracted, to prevent duplication.
• Older data can be further aggregated.
• Must provide rollback function in case of
  failure.
• Can update records much closer to real
  time with smaller and more frequent loads.
              Transformations
• Changing of data from one type to another.
• Processing some data to mean something different.
• Often applications are written which manipulate the data
  and do the transformations.
• Data can be tested and validated in the temporary
  Location.
       Loading the OLAP DB
Always need to check for duplication of
  records.
Rollback method may be beneficial.
Complex system of updating existing
  records may be needed.
Additional aggregation after loading.
                Testing
• Data Accuracy.
• Data retrieval speed.
• Does data allow answering of BI
  questions.
                 Reporting
• Reports from properly designed data
  warehouse are:
  – Accurate
  – Timely
  – Fast
  – Understandable
  – Flexible (slice and dice)
• These reports answer the BI questions
Analytical Reporting / ETL Tools
• Common Reporting
   – Excel
   – Crystal Reports
• Enterprise Reporting and Analysis / ETL
   – Cognos
   – Business Objects
• ETL
   –   Data Junction
   –   Ascential DataStage
   –   Ab Initio
   –   Informatica
   –   Data Mirror
   –   SQL server, Oracle, DB2/400

								
To top