Install Guide Pentaho BI Data Integration by krabah

VIEWS: 5,255 PAGES: 23

More Info
									Global Open Versity Vancouver Canada                     Install Guide Pentaho Data Integration with MySQL v1.2




                             Global Open Versity
             Pentaho Business Intelligence BI Suite Training Manual

                                                 Part II

       Install Guide Pentaho Data Integration (Kettle) with MySQL

                                               Kefa Rabah
                            Global Open Versity, Vancouver Canada
                                  krabah@globalopenversity.org
                                    www.globalopenversity.org

Table of Contents                                                                                 Page No.


INSTALL GUIDE PENTAHO DATA INTEGRATION (KETTLE) WITH MYSQL                                                      2

Introduction                                                                                                    2

Background Information                                                                                          2

Part 1: Starting MySQL Server                                                                                   3

Part 2: Download & Install Pentaho Data Integration (Kettle)                                                    4

Part 4: Hands-On Lab Assignment 1                                                                            21

References                                                                                                   21

Part 5: Need More Training on Windows                                                                        21
  Data Warehousing and BI Principles using Pentaho BI                                                        22

Other Related Training                                                                                       22

Part 6: Hands-on Labs Assignments                                                                            22




A GOV Open Access Technical Academic Publications
Enhancing education & empowering people worldwide through eLearning in the 21st Century

                                                                                                                1
© April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org                         BM301 - Data Warehousing & Business Intelligence Principles
Global Open Versity Vancouver Canada                     Install Guide Pentaho Data Integration with MySQL v1.2




                           Global Open Versity
           Pentaho Business Intelligence BI Suite Training Manual

                                                 Part II

        Install Guide Pentaho Data Integration (Kettle) with MySQL

By Kefa Rabah, krabah@globalopenversity.org               Sept., 13 2010                     GTS Institute


Introduction
The Pentaho BI Project is Open Source application software for enterprise reporting, OLAP
analysis, dashboard, data mining, workflow and ETL capabilities for Business Intelligence (BI)
platform that have mad it the world’s leading and most widely deployed open source BI suite. It
also offers self-service dashboard design for business users and cloud computing support for IT.
In Part I of this guide we showed you how to install Pentaho Business Intelligence BI Suite CE
server with MySQL, Report Designer CE, and Design Studio CE on a Linux machine. It also
included how to setup Pentaho Data Integration (Kettle). In this second part of the series, we’ll
continue working with Pentaho Data Integration and show you how to build a simple input-output
transformation using your own data source from MySQL database. This guide assumes you have
some basic knowledge of Linux and MySQL.

Background Information
Data integration focuses mainly on databases. A database is an organized collection of data. It's
similar to a file system, which is an organizational structure for files so they're easy to find,
access, and manipulate.

Pentaho Data Integration (PDI) is a powerful, metadata-driven ETL tool designed to bridge the
gap between business and IT. Kettle is an acronym for "Kettle E.T.T.L. Environment." Kettle is
designed to help you with your ETTL needs, which include the Extraction, Transformation,
Transportation and Loading of data.

Kettle itself is part of Pentaho BI applications suite. It is an independent project initiated by Matt
Casters until acquired by Pentaho in 2006. Ever since, Kettle is also known as Pentaho Data
Integration (PDI). Matt himself still leads the PDI project development in Pentaho.

Kettle comprise of 4 applications:

    •   Spoon - graphical designer for designing job and transformation schemes. It is based on
        swing.
    •   Pan - script that is used to execute transformation scheme in .ktr xml file form or from
        a repository.
    •   Kitchen - script that is used to execute job scheme in .kjb xml file form or from a
        repository.
    •   Carte - a temporary web server which is used to execute job/transformation in cluster /
        parallel

Spoon is a graphical user interface that allows you to design transformations and jobs that can be
run with the Kettle tools — Pan and Kitchen. Pan is a data transformation engine that performs a
multitude of functions such as reading, manipulating, and writing data to and from various data

                                                                                                                2
© April 2007, Kefa Rabah, Global Open Versity, Vancouver Canada

www.globalopenversity.org                         BM301 - Data Warehousing & Business Intelligence Principles
Global Open Versity Vancouver Canada                     Install Guide Pentaho Data Integration with MySQL v1.2



sources. Kitchen is a program that executes jobs designed by Spoon in XML or in a database
repository. Jobs are usually scheduled in batch mode to be run automatically at regular intervals.

Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle
database repository. Pan or Kitchen can then read the data to execute the steps described in the
transformation or to run the job. In summary, Pentaho Data Integration makes data warehouses
easier to build, update, and maintain

E.T.L. and Datawarehousing - being an ETL tool, Kettle is an environment that's designed to:

    •   collect data from a variety of sources (extraction)
    •   move and modify data (transport and transform) while cleansing, denormalizing,
        aggregating and enriching it in the process
    •   frequently (typically on a daily basis) store data (loading) in the final target destination,
        which is usually a large, dimensionally modeled database called a data warehouse



Part 1: Starting MySQL Server
								
To top