Informatica Shortcuts

Document Sample
Informatica Shortcuts Powered By Docstoc
					                  __ sorting the tables in ascending or descending and
                     aslo to obtain Distinct records.
  RANK            __ Top or bottom 'N' analysis .

                  __ Join two different sources cmng from different and
  JOINER             same location .
  FILTER          __ filters the rows that do not meet the condition.

 ROUTER           __ It is useful to test multiple conditions .

                  __ To perform group calculation such as count ,
AGGREGATOR           max , min , sum , avg (mainly to perform
                     calculation or multiple rows or group)
                  __ Reads cobol files ( denormalized format).
                     Split a single row into multiple rows.
                  __ It performs many tasks such as override default
                     sql query , filtering records , join data from two or
                     more table etc
                     Represents the flatfile or relational data.

  UNION           __ It merges data from multiple sources similar to
                     the UNION ALL SQL statement to combine the
                     results from two or more SQL statements. Similar
                     to the UNION ALL statement, the Union
                     transformation does not remove duplicate rows.
             __      You can use the Expression transformation to
EXPRESSION           calculate values in a single row before you write
                     to the target.
             __      Use a Lookup transformation in a mapping to look
 LOOK UP             up data in a flat file or a relational table, view, or
             __      stored procedures to automate tasks that are too
  STORED             complicated for standard SQL statements.You can
PROCEDURE            call by using Stored Procedure Transformation.

             __      When you add an XML source definition to a
                     mapping, you need to connect it to an XML Source
                     Qualifier transformation.
  UPDATE          __ To flag rows for insert, delete, update, or reject..
                  Data Warehouse Architecture


     Star schema architecture is the simplest
     data warehouse design. The main feature
     of a star schema is a table at the center,
     called the fact table and the dimension
     tables which allow browsing of specific
     categories, summarizing, drill-downs and
     specifying criteria.
     Typically, most of the fact tables in a star
     schema are in database third normal form,
     while dimensional tables are de-normalized
     (second normal form).

     Fact table
The fact table is not a typical relational
database table as it is de-normalized on
purpose - to enhance query response
times. The fact table typically contains
records that are ready to explore, usually
with ad hoc queries. Records in the fact
table are often referred to as events, due
to the time-variant nature of a data
warehouse environment.
The primary key for the fact table is a
composite of all the columns except
numeric values / scores (like QUANTITY,
TURNOVER, exact invoice date and time).
Typical fact tables in a global enterprise
data warehouse are (apart for those, there
may be some company or business specific
fact tables):

sales fact table - contains all details
regarding sales
orders fact table - in some cases the table
can be split into open orders and historical
orders. Sometimes the values for historical
orders are stored in a sales fact table.
budget fact table - usually grouped by
month and loaded once at the end of a
forecast fact table - usually grouped by
month and loaded daily, weekly or
inventory fact table - report stocks, usually
refreshed daily

Dimension table
Nearly all of the information in a typical
fact table is also present in one or more
dimension tables. The main purpose of
maintaining Dimension Tables is to allow
browsing the categories quickly and easily.
The primary keys of each of the dimension
tables are linked together to form the
composite primary key of the fact table. In
a star schema design, there is only one de-
normalized table for a given dimension.
Typical dimension tables in a data
warehouse are:

time dimension table
customers dimension table
products dimension table
key account managers (KAM) dimension
sales office dimension table
Star schema example
An example of a star schema architecture
is depicted below.


Snowflake schema architecture is a more complex variation of a star
schema design. The main difference is that dimensional tables in a
snowflake schema are normalized, so they have a typical relational
database design.
Snowflake schemas are generally used when a dimensional table becomes
very big and when a star schema can’t represent the complexity of a data
structure. For example if a PRODUCT dimension table contains millions of
rows, the use of snowflake schemas should significantly improve
performance by moving out some data to other table (with BRANDS for
The problem is that the more normalized the dimension table is, the more
complicated SQL joins must be issued to query them. This is because in
order for a query to be answered, many tables need to be joined and
aggregates generated.

An example of a snowflake schema architecture is depicted below.
For each star schema or snowflake schema it is possible to construct a
fact constellation schema.
This schema is more complex than star or snowflake architecture, which is
because it contains multiple fact tables. This allows dimension tables to be
shared amongst many fact tables.
That solution is very flexible, however it may be hard to manage and
The main disadvantage of the fact constellation schema is a more
complicated design because many variants of aggregation must be
In a fact constellation schema, different fact tables are explicitly assigned
to the dimensions, which are for given facts relevant. This may be useful
in cases when some facts are associated with a given dimension level and
other facts with a deeper dimension level.

Use of that model should be reasonable when for example, there is a sales
fact table (with details down to the exact date and invoice header id) and
a fact table with sales forecast which is calculated based on month, client
id and product id.

In that case using two different fact tables on a different level of grouping
is realized through a fact constellation model.
            Source System
   A database, application, file, or other
 storage facility from which the data in a
       data warehouse is derived.
The definition of the relationship and data
 flow between source and target objects.
               Meta data
    Data that describes data and other
structures, such as objects, business rules,
 and processes. For example, the schema
  design of a data warehouse is typically
stored in a repository as meta data, which
 is used to generate scripts used to build
   and populate the data warehouse. A
      repository contains meta data.
             Staging Area
 A place where data is processed before
        entering the warehouse.
 The process of resolving inconsistencies
 and fixing the anomalies in source data,
   typically as part of the ETL process.
 The process of manipulating data. Any
    manipulation beyond copying is a
    transformation. Examples include
 cleansing, aggregating, and integrating
       data from multiple sources.
     The process of moving copied or
 transformed data from a source to a data
             Target System
   A database, application, file, or other
 storage facility to which the "transformed
source data" is loaded in a data warehouse.
  Figure 1.12 : Sample ETL Process Flow
   Informatica is a powerful ETL tool from
Informatica Corporation, a leading provider
of enterprise data integration software and
               ETL softwares.
  The important Informatica Components
             • Power Exchange
              • Power Center
          • Power Center Connect
             • Power Channel
           • Metadata Exchange
             • Power Analyzer
               • Super Glue
In Informatica, all the Metadata information
 about source systems, target systems and
      transformations are stored in the
Informatica repository. Informatica's Power
Center Client and Repository Server access
     this repository to store and retrieve
 Note: To know more about Metadata and
    its significance, please click here.
          Source and Target
       Consider a Bank that has got many
    branches throughout the world. In each
    branch data may be stored in different
     source systems like oracle, sql server,
  terradata, etc. When the Bank decides to
 integrate its data from several sources for
  its management decisions, it may choose
one or more systems like oracle, sql server,
      terradata, etc. as its data warehouse
        target. Many organisations prefer
     Informatica to do that ETL process,
  because Informatica is more powerful in
designing and building data warehouses. It
can connect to several sources and targets
   to extract meta data from sources and
 targets, transform and load the data into
              target systems.
Guidelines to work with Informatica Power
     • Repository: This is where all the
      metadata information is stored in the
      Informatica suite. The Power Center
        Client and the Repository Server
         would access this repository to
            retrieve, store and manage
     • Power Center Client: Informatica
       client is used for managing users,
          identifiying source and target
           systems definitions, creating
        mapping and mapplets, creating
        sessions and run workflows etc.
    • Repository Server: This repository
            server takes care of all the
      connections between the repository
          and the Power Center Client.
   • Power Center Server: Power Center
         server does the extraction from
       source and then loading data into
   • Designer: Source Analyzer, Mapping
       Designer and Warehouse Designer
      are tools reside within the Designer
       wizard. Source Analyzer is used for
        extracting metadata from source
   Mapping Designer is used to create
 mapping between sources and targets.
Mapping is a pictorial representation about
 the flow of data from source to target.
Warehouse Designer is used for extracting
metadata from target systems or metadata
  can be created in the Designer itself.
 •   Data Cleansing: The PowerCenter's
    data cleansing technology improves
     data quality by validating, correctly
          naming and standardization of
       address data. A person's address
          may not be same in all source
    systems because of typos and postal
    code, city name may not match with
           address. These errors can be
      corrected by using data cleansing
      process and standardized data can
      be loaded in target systems (data
• Transformation: Transformations help
           to transform the source data
       according to the requirements of
        target system. Sorting, Filtering,
    Aggregation, Joining are some of the
           examples of transformation.
   Transformations ensure the quality of
   the data being loaded into target and
         this is done during the mapping
          process from source to target.
• Workflow Manager: Workflow helps to
   load the data from source to target in
    a sequential manner. For example, if
    the fact tables are loaded before the
   lookup tables, then the target system
      will pop up an error message since
    the fact table is violating the foreign
           key validation. To avoid this,
     workflows can be created to ensure
     the correct flow of data from source
                     to target.
  • Workflow Monitor: This monitor is
   helpful in monitoring and tracking the
       workflows created in each Power
                  Center Server.
       • Power Center Connect: This
    component helps to extract data and
        metadata from ERP systems like
        IBM's MQSeries, Peoplesoft, SAP,
         Siebel etc. and other third party
     • Power Center Exchange: This
    component helps to extract data and
        metadata from ERP systems like
        IBM's MQSeries, Peoplesoft, SAP,
         Siebel etc. and other third party
            Power Exchange
  Informatica Power Exchange as a stand
 alone service or along with Power Center,
    helps organizations leverage data by
avoiding manual coding of data extraction
programs. Power Exchange supports batch,
    real time and changed data capture
   options in main frame(DB2, VSAM, IMS
etc.,), mid range (AS400 DB2 etc.,), and for
  relational databases (oracle, sql server,
  db2 etc) and flat files in unix, linux and
              windows systems.
            Power Channel
  This helps to transfer large amount of
encrypted and compressed data over LAN,
 WAN, through Firewalls, tranfer files over
                 FTP, etc.
         Meta Data Exchange
 Metadata Exchange enables organizations
  to take advantage of the time and effort
already invested in defining data structures
within their IT environment when used with
Power Center. For example, an organization
 may be using data modeling tools, such as
    Erwin, Embarcadero, Oracle designer,
 Sybase Power Designer etc for developing
data models. Functional and technical team
 should have spent much time and effort in
       creating the data model's data
   structures(tables, columns, data types,
   procedures, functions, triggers etc). By
   using meta deta exchange, these data
   structures can be imported into power
     center to identifiy source and target
 mappings which leverages time and effort.
 There is no need for informatica developer
to create these data structures once again.
            Power Analyzer
Power Analyzer provides organizations with
 reporting facilities. PowerAnalyzer makes
     accessing, analyzing, and sharing
enterprise data simple and easily available
to decision makers. PowerAnalyzer enables
to gain insight into business processes and
      develop business intelligence.
 With PowerAnalyzer, an organization can
     extract, filter, format, and analyze
corporate information from data stored in a
  data warehouse, data mart, operational
 data store, or otherdata storage models.
 PowerAnalyzer is best with a dimensional
data warehouse in a relational database. It
can also run reports on data in any table in
 a relational database that do not conform
         to the dimensional model.
               Super Glue
Superglue is used for loading metadata in a
  centralized place from several sources.
 Reports can be run against this superglue
           to analyze meta data.
              Power Mart
   Power Mart is a departmental version of
   Informatica for building, deploying, and
     managing data warehouses and data
  marts. Power center is used for corporate
enterprise data warehouse and power mart
 is used for departmental data warehouses
   like data marts. Power Center supports
      global repositories and networked
   repositories and it can be connected to
    several sources. Power Mart supports
  single repository and it can be connected
to fewer sources when compared to Power
Center. Power Mart can extensibily grow to
an enterprise implementation and it is easy
     for developer productivity through a
            codeless environment.
   Note:This is not a complete tutorial on
  Informatica. We will add more Tips and
 Guidelines on Informatica in near future.
    Informatica - Transformations
  In Informatica, Transformations help to
transform the source data according to the
   requirements of target system and it
   ensures the quality of the data being
            loaded into target.
 Transformations are of two types: Active
              and Passive.
        Active Transformation
 An active transformation can change the
number of rows that pass through it from
source to target i.e it eliminates rows that
       do not meet the condition in
       Passive Transformation
A passive transformation does not change
the number of rows that pass through it i.e
      it passes all rows through the
  Transformations can be Connected or
     Connected Transformation
Connected transformation is connected to
other transformations or directly to target
          table in the mapping.
    UnConnected Transformation
   An unconnected transformation is not
 connected to other transformations in the
    mapping. It is called within another
transformation, and returns a value to that
       List of Transformations
 Following are the list of Transformations
               available in
       • Aggregator Transformation
       • Expression Transformation
           • Filter Transformation
          • Joiner Transformation
         • Lookup Transformation
       • Normalizer Transformation
           • Rank Transformation
         • Router Transformation
   • Sequence Generator Transformation
    • Stored Procedure Transformation
          • Sorter Transformation
     • Update Strategy Transformation
   • XML Source Qualifier Transformation
      • Advanced External Procedure
        • External Transformation
          • Union Transformation
 In the following pages, we will explain all
the above Informatica Transformations and
  their significances in the ETL process in
     Aggregator Transformation
Aggregator transformation is an Active and
       Connected transformation. This
     transformation is useful to perform
  calculations such as averages and sums
(mainly to perform calculations on multiple
 rows or groups). For example, to calculate
 total of daily sales or to calculate average
   of monthly or yearly sales. Aggregate
   functions such as AVG, FIRST, COUNT,
PERCENTILE, MAX, SUM etc. can be used in
          aggregate transformation.
      Expression Transformation
Expression transformation is a Passive and
  Connected transformation. This can be
  used to calculate values in a single row
 before writing to the target. For example,
to calculate discount of each product or to
  concatenate first and last names or to
       convert date to a string field.
         Filter Transformation
   Filter transformation is an Active and
  Connected transformation. This can be
used to filter rows in a mapping that do not
meet the condition. For example, to know
  all the employees who are working in
Department 10 or to find out the products
that falls between the rate category $500
               and $1000.
         Joiner Transformation
  Joiner Transformation is an Active and
  Connected transformation. This can be
used to join two sources coming from two
different locations or from same location.
    For example, to join a flat file and a
relational source or to join two flat files or
   to join a relational source and a XML
source. In order to join two sources, there
must be at least one matching port. While
joining two sources it is a must to specify
  one source as master and the other as
detail. The Joiner transformation supports
        the following types of joins:
                  • Normal
               • Master Outer
                • Detail Outer
                 • Full Outer
 Normal join discards all the rows of data
from the master and detail source that do
   not match, based on the condition.
     Master outer join discards all the
 unmatched rows from the master source
   and keeps all the rows from the detail
  source and the matching rows from the
             master source.
Detail outer join keeps all rows of data from
 the master source and the matching rows
   from the detail source. It discards the
  unmatched rows from the detail source.
 Full outer join keeps all rows of data from
   both the master and detail sources.
        Lookup Transformation
Lookup transformation is Passive and it can
 be both Connected and UnConnected as
    well. It is used to look up data in a
relational table, view, or synonym. Lookup
  definition can be imported either from
       source or from target tables.
 For example, if we want to retrieve all the
    sales of a product with an ID 10 and
   assume that the sales data resides in
  another table. Here instead of using the
sales table as one more source, use Lookup
 transformation to lookup the data for the
     product, with ID 10 in sales table.
   Difference between Connected and
  UnConnected Lookup Transformation:
 Connected lookup receives input values
 directly from mapping pipeline whereas
UnConnected lookup receives values from:
       LKP expression from another
Connected lookup returns multiple columns
 from the same row whereas UnConnected
lookup has one return port and returns one
          column from each row.
 Connected lookup supports user-defined
  default values whereas UnConnected
  lookup does not support user defined
      Normalizer Transformation
Normalizer Transformation is an Active and
Connected transformation. It is used mainly
   with COBOL sources where most of the
    time data is stored in de-normalized
  format. Also, Normalizer transformation
can be used to create multiple rows from a
            single row of data.
         Rank Transformation
   Rank transformation is an Active and
  Connected transformation. It is used to
 select the top or bottom rank of data. For
 example, to select top 10 Regions where
the sales volume was very high or to select
         10 lowest priced products.
        Router Transformation
      Router is an Active and Connected
      transformation. It is similar to filter
transformation. The only difference is, filter
 transformation drops the data that do not
 meet the condition whereas router has an
option to capture the data that do not meet
  the condition. It is useful to test multiple
 conditions. It has input, output and default
   groups. For example, if we want to filter
       data like where State=Michigan,
  State=California, State=New York and all
    other States. It’s easy to route data to
               different tables..
Sequence Generator Transformation
  Sequence Generator transformation is a
Passive and Connected transformation. It is
 used to create unique primary key values
   or cycle through a sequential range of
    numbers or to replace missing keys.
     It has two output ports to connect
transformations. By default it has two fields
  CURRVAL and NEXTVAL(You cannot add
ports to this transformation). NEXTVAL port
    generates a sequence of numbers by
connecting it to a transformation or target.
CURRVAL is the NEXTVAL value plus one or
   NEXTVAL plus the Increment By value.
  Stored Procedure Transformation
    Stored Procedure transformation is a
  Passive and Connected & UnConnected
  transformation. It is useful to automate
time-consuming tasks and it is also used in
     error handling, to drop and recreate
   indexes and to determine the space in
  database, a specialized calculation etc.
   The stored procedure must exist in the
     database before creating a Stored
  Procedure transformation, and the stored
 procedure can exist in a source, target, or
any database with a valid connection to the
 Informatica Server. Stored Procedure is an
executable script with SQL statements and
 control statements, user-defined variables
   and conditional statements. In case of
stored procedure transformation procedure
    will be compiled and executed in a
relational data source. You need data base
connection to import the stored procedure
             in to your maping
        Sorter Transformation
 Sorter transformation is a Connected and
 an Active transformation. It allows to sort
  data either in ascending or descending
  order according to a specified field. Also
used to configure for case-sensitive sorting,
    and specify whether the output rows
             should be distinct.
   Source Qualifier Transformation
Source Qualifier transformation is an Active
   and Connected transformation. When
   adding a relational or a flat file source
    definition to a mapping, it is must to
       connect it to a Source Qualifier
    transformation. The Source Qualifier
     performs the various tasks such as
   overriding default SQL query, filtering
 records; join data from two or more tables
   Update Strategy Transformation
Update strategy transformation is an Active
and Connected transformation. It is used to
   update data in target table, either to
maintain history of data or recent changes.
You can specify how to treat source rows in
table, insert, update, delete or data driven.
XML Source Qualifier Transformation
  XML Source Qualifier is a Passive and
 Connected transformation. XML Source
Qualifier is used only with an XML source
definition. It represents the data elements
that the Informatica Server reads when it
  executes a session with XML sources.
    Advanced External Procedure
       Advanced External Procedure
transformation is an Active and Connected
 transformation. It operates in conjunction
with procedures, which are created outside
    of the Designer interface to extend
 PowerCenter/PowerMart functionality. It is
 useful in creating external transformation
     applications, such as sorting and
aggregation, which require all input rows to
 be processed before emitting any output
         Union Transformation
The union transformation is used to merge
 multiple datasets from various streams or
         pipelines into one dataset. This
 transformation works similar to the UNION
ALL, it does not remove any duplicate rows.
   It is recommended to use aggregator to
remove duplicates are not expected at the
 External Procedure Transformation
   External Procedure transformation is an
     Active and Connected/UnConnected
 transformations. Sometimes, the standard
     transformations such as Expression
     transformation may not provide the
  functionality that you want. In such cases
   External procedure is useful to develop
   complex functions within a dynamic link
library (DLL) or UNIX shared library, instead
    of creating the necessary Expression
        transformations in a mapping.
 Differences between Advanced External
    Procedure and External Procedure
 External Procedure returns single value,
 where as Advanced External Procedure
        returns multiple values.
  External Procedure supports COM and
  Informatica procedures where as AEP
  supports only Informatica Procedures