Best Practices of ETL Processes Controlling2

Document Sample
Best Practices of ETL Processes Controlling2 Powered By Docstoc
					     Best Practices
of ETL Processes Control




      best practices of ETL processes controlling   1
Agenda

   ETL Process Control Challenges
   Prototype on Open Source Solution
   Cases Study and analysis
   Opportunities To Partner
   Discussion & Questions




           best practices of ETL processes controlling   2
ETL Cost Analysis -                   Data Warehouse Institute




         best practices of ETL processes controlling       3
Outline of the ETL Process
control Challenge
 The complexity of ETL is rapidly increasing
 Features inter-related parallel processes
 Job delays on one of the paths can cause
  cascading problems
 Multi-platform and multi-site processing
  adds to problem
 Mix of technology in data-sources adds to
  the challenge
 Manual intervention is expensive and is oft
  en too slow

          best practices of ETL processes controlling   4
Requirement of ETL Platform

                            Business Data         Reversible          Fine Transaction
Business Data Safety         Consistency                                   Control




                           Alien Program           Supporting         Controllable
                            Cooperation            Multiple OS
   ETL Operation
                                    Multi-site                 Error Scope
                                     running                     Control



                                 Decentralizing        Utilize What
                                 Development            We Have
                                                                             Base on
 ETL Development                                                              J2EE
                                 Better Logical         Standard
                                 Layer Dividing         Workflow



                       best practices of ETL processes controlling                       5
Goal – ETL application Integration


  Lower ETL Cost
  More ETL Flexibility
  Wider Open Architecture




         best practices of ETL processes controlling   6
Prototype of ETL Control




      best practices of ETL processes controlling   7
Index

   Prototype implementation Cases
   Prototype elements
   Model Hierarchy
   Model Relation
   Software Architecture
   ETL development flow



           best practices of ETL processes controlling   8
Product Base on the Prototype


 ETL Manager 1.10 release Version
   1999 – 2005 (Nantian Software Company)
   Working as Back-End product for Banks
     Data Center System of China Post Deposit
      Bank head office
     Data warehouse System of China Post
      Deposit Bank head office
     Data Warehouse System of China
      Communication Bank of China Tianjing
      Province
    ……


          best practices of ETL processes controlling   9
Prototype Elements

 System (System layer)
 Job (Application layer)
 Action (Executing layer)




         best practices of ETL processes controlling   10
Layer Dividing




      best practices of ETL processes controlling   11
Model Hierarchy
                                      ETL Manager

      Action

 JOB

Sys




                 …                          …                  …


               OLTP A                  OLTP B                 OLTP C


                best practices of ETL processes controlling            12
Elements Relation




                           Dependence
                           Priority of Queue
                           Data Consistency




      best practices of ETL processes controlling   13
Relation Model




      best practices of ETL processes controlling   14
Working Sample




     best practices of ETL processes controlling   15
Software Architecture




      best practices of ETL processes controlling   16
Development Model




      best practices of ETL processes controlling   17
Driving Model




                     Implementing approach
                     work flow engine




      best practices of ETL processes controlling   18
Fundamental Techniques

 J2EE Architecture
   Application Server
     JBOSS
     Apache
     Tomcat
   Portalet
   XML configuration
 Workflow Engine – Open Source
   Pentaho BI platform

          best practices of ETL processes controlling   19
Platform Architecture of Pentaho




       best practices of ETL processes controlling   20
Cases Study




best practices of ETL processes controlling   21
      Case No.1 – ETL Cooperation
                                              ETL                                 AIX/Oracle
                                              (B)      Day
Linux / Oracle   Management                                            Billing
    /J2EE         System B                                            System A



                                                              ETL    Day
                                                              (A)                  Day-End
                                                                                   Checking
                          ETL            Month                                     Process
Productive                (D)       5/10/15/20/25/30                        Time Limitation < 1 hour
 Systems


 DW                           ETL        Month
                              (E)   5/10/15/20/25/30                  ETL   Day
                                                                      (C)




                                                       ETL
                                                       (F)
                                     Banking BI
                                     Application
                                                         Data Warehouse System

                              best practices of ETL processes controlling                      22
Continue…

   ETL Process                                                   ETL Process

              A                                                           D
      Table Flag                                                          Time



                                    Controller
ETL Process             Call                           Load process       ETL Process

       B                                                         UNIX Shell      E
  SQL Export
   Process


       ETL Process
                                                            ETL Process

                  C                                                   F
                  FTP Mission                           PROC C
     DW
                      best practices of ETL processes controlling                       23
Continue

 Solution
   Run ETL processes on Multi-sites
   Control them in one virtual platform
 Benefit
   Enable fine ETL process logical control
   Less Monitor and Maintenance
   Wider opening Architecture




            best practices of ETL processes controlling   24
      Case No.2 – ETL Development
                New York                      Beijing                   Toronto
                                                                                  AIX/Oracle
                Financial                     Billing                Management
                 System                       System                   System
                    A                            B                       C




                Sybase                        Oracle 9i                MS SQL
Productive SA                            SA                        SA Server 2000
 Systems


 DW

                   ETL                           ETL
                   (A)                           (B)                      ETL
                                                                          (C)
                                              ETL Developer
                                                 Group

                      Sybase       Banking BI
                        IQ         Application
                                                          Data Warehouse System

                            best practices of ETL processes controlling                        25
      Continue…
                  New York                     Beijing                  Toronto
                                                                                  AIX/Oracle
                  Financial                   Billing                Management
                   System                     System                   System
                      A                          B                       C


                              ETL BCP                   ETL                       ETL
                              (A) Script                (B) Procedure             (C)    DTS
                  Sybase                      Oracle 9i             SA MS SQL
Productive   SA                            SA                         Server 2000
 Systems


 DW


                                                            ETL
                                                           Control
                              ETL Developer



                        Sybase       Banking BI
                          IQ         Application
                                                         Data Warehouse System

                              best practices of ETL processes controlling                      26
Continue

 Solution
   Develop ETL processes on Multi-sites/OS/
    Environment/DB
   Control them in one virtual platform
 Benefit
   Utilize your people’s skill on-hand
   Save money for software purchasing




            best practices of ETL processes controlling   27
      Case No.2 – Data Consistency
                 ETL A                      ETL B                        ETL C




  Table
 Update
  Jobs

                                                                                   Reversible
 ETL Process

DW Star Schema



                 Table A                   Table B                       Table C




                                  CUBE                                   Report


                           best practices of ETL processes controlling                    28
Continue

 Solution
   Divide ETL logical layers, enable ETL rever
    sibility for three layers
   Control them in one virtual platform
 Benefit
   Keep business data consistency
   Save system resource




            best practices of ETL processes controlling   29
Opportunity for partner




     best practices of ETL processes controlling   30
Opportunity of Researching




                                 Purpose
                                 Volunteers
                                 Demo for Open
                                  source
                                 Benefit



       best practices of ETL processes controlling   31
Opportunities for volunteers

 Seeking developers willing to
  participate in further Open Source
  Development
 Chance to be part of a unique
  development initiative
 Opportunity for mutual benefits




         best practices of ETL processes controlling   32
Discussion & Questions




http://roseparadise.blogspot.com/20
  06/02/orlando-fl-november-10-200
  5-today.html
               Thanks!



        best practices of ETL processes controlling   33