Informatica Mappping Design Tips by pcherukumalla


									Mapping Design


Use the PowerCenter tool suite to create an efficient execution environment.


Although PowerCenter environments vary widely, most sessions and/or mappings can
benefit from the implementation of common objects and optimization procedures. Follow
these procedures and rules of thumb when creating mappings to help ensure optimization.
Use mapplets to leverage the work of critical developers and minimize mistakes when
performing similar functions.

General Suggestions for Optimizing
   1. Reduce the number of transformations
   2. There is always overhead involved in moving data between transformations.
   3. Consider more shared memory for large number of transformations. Session
       shared memory between 12M and 40MB should suffice.
   4. Calculate once, use many times.
   5. Avoid calculating or testing the same value over and over.
   6. Calculate it once in an expression, and set a True/False flag.
   7. Within an expression, use variables to calculate a value used several times.
   8. Only connect what is used.
   9. Delete unnecessary links between transformations to minimize the amount of data
       moved, particularly in the Source Qualifier.
   10. This is also helpful for maintenance, if you exchange transformations (e.g., a
       Source Qualifier).
   11. Watch the data types.
           o The engine automatically converts compatible types.
           o Sometimes conversion is excessive, and happens on every transformation.
           o Minimize data type changes between transformations by planning data
               flow prior to developing the mapping.
   13. Facilitate reuse.
           o Plan for reusable transformations upfront.
           o Use variables.
           o Use mapplets to encapsulate multiple reusable transformations.
   15. Only manipulate data that needs to be moved and transformed.
       o    Delete unused ports particularly in Source Qualifier and Lookups.
            Reducing the number of records used throughout the mapping provides
            better performance
        o Use active transformations that reduce the number of records as early in
            the mapping as possible (i.e., placing filters, aggregators as close to source
            as possible).
        o Select appropriate driving/master table while using joins. The table with
            the lesser number of rows should be the driving/master table.
17. When DTM bottlenecks are identified and session optimization has not helped,
    use tracing levels to identify which transformation is causing the bottleneck (use
    the Test Load option in session properties).
18. Utilize single-pass reads.
        o Single-pass reading is the servers ability to use one Source Qualifier to
            populate multiple targets.
        o For any additional Source Qualifier, the server reads this source. If you
            have different Source Qualifiers for the same source (e.g., one for delete
            and one for update/insert), the server reads the source for each Source
        o Remove or reduce field-level stored procedures.
        o If you use field-level stored procedures, PowerMart has to make a call to
            that stored procedure for every row so performance will be slow.
20. Lookup Transformation Optimizing Tips
        o When your source is large, cache lookup table columns for those lookup
            tables of 500,000 rows or less. This typically improves performance by
        o The rule of thumb is not to cache any table over 500,000 rows. This is
            only true if the standard row byte count is 1,024 or less. If the row byte
            count is more than 1,024, then the 500k rows will have to be adjusted
            down as the number of bytes increase (i.e., a 2,048 byte row can drop the
            cache row count to 250K 300K, so the lookup table will not be cached in
            this case).
        o When using a Lookup Table Transformation, improve lookup
            performance by placing all conditions that use the equality operator = first
            in the list of conditions under the condition tab.
        o Cache only lookup tables if the number of lookup calls is more than 10-
            20% of the lookup table rows. For fewer number of lookup calls, do not
            cache if the number of lookup table rows is big. For small lookup tables,
            less than 5,000 rows, cache for more than 5-10 lookup calls.
        o Replace lookup with decode or IIF (for small sets of values).
        o If caching lookups and performance is poor, consider replacing with an
            unconnected, uncached lookup
        o For overly large lookup tables, use dynamic caching along with a
            persistent cache. Cache the entire table to a persistent file on the first run,
            enable update else insert on the dynamic cache and the engine will never
             have to go back to the database to read data from this table. It would then
             also be possible to partition this persistent cache at run time for further
             performance gains
         o Review complex expressions.
22. Examine mappings via Repository Reporting and Dependency Reporting within
    the mapping.
23. Minimize aggregate function calls.
24. Replace Aggregate Transformation object with an Expression Transformation
    object and an Update Strategy Transformation for certain types of Aggregations.
25. Operations and Expression Optimizing Tips
         o Numeric operations are faster than string operations.
         o Optimize char-varchar comparisons (i.e., trim spaces before comparing).
         o Operators are faster than functions (i.e., || vs. CONCAT).
         o Optimize IIF expressions.
         o Avoid date comparisons in lookup; replace with string.
         o Test expression timing by replacing with constant
27. Use Flat Files
         o Using flat files located on the server machine loads faster than a database
             located in the server machine
         o Fixed-width files are faster to load than delimited files because delimited
             files require extra parsing
         o If processing intricate transformations, consider loading first to a source
             flat file into a relational database, which allows the PowerCenter
             mappings to access the data in an optimized fashion by using filters and
             custom SQL Selects where appropriate
29. If working with data that is not able to return sorted data (e.g., Web Logs)
    consider using the Sorter Advanced External Procedure.
30. Use a Router Transformation to separate data flows instead of multiple Filter
31. Use a Sorter Transformation or hash-auto keys partitioning before an Aggregator
    Transformation to optimize the aggregate. With a Sorter Transformation, the
    Sorted Ports option can be used even if the original source cannot be ordered
32. Use a Normalizer Transformation to pivot rows rather than multiple instances of
    the same Target
33. Rejected rows from an Update Strategy are logged to the Bad File. Consider
    filtering if retaining these rows is not critical because logging causes extra
    overhead on the engine
34. When using a Joiner Transformation, be sure to make the source with the smallest
    amount of data the Master source
35. If an update override is necessary in a load, consider using a lookup
    transformation just in front of the target to retrieve the primary key. The primary
    key update will be much faster than the non-indexed lookup override
Suggestions for Using Mapplets
A mapplet is a reusable object that represents a set of transformations. It allows you to
reuse transformation logic and can contain as many transformations as necessary. Use the
Mapplet Designer to create mapplets.

   1. Create a mapplet when you want to use a standardized set of transformation logic
      in several mappings. For example, if you have several fact tables that require a
      series of dimension keys, you can create a mapplet containing a series of Lookup
      transformations to find each dimension key. You can then use the mapplet in each
      fact table mapping, rather than recreate the same lookup logic in each mapping.
   2. To create a mapplet, add, connect, and configure transformations to complete the
      desired transformation logic. After you save a mapplet, you can use it in a
      mapping to represent the transformations within the mapplet. When you use a
      mapplet in a mapping, you use an instance of the mapplet. All uses of a mapplet
      are all tied to the parent mapplet. Hence, all changes made to the parent mapplet
      logic are inherited by every child instance of the mapplet. When the server runs a
      session using a mapplet, it expands the mapplet. The server then runs the session
      as it would any other session, passing data through each transformation in the
      mapplet as designed.
   3. A mapplet can be active or passive depending on the transformations in the
      mapplet. Active mapplets contain at least one active transformation. Passive
      mapplets only contain passive transformations. Being aware of this property when
      using mapplets can save time when debugging invalid mappings.
   4. There are several unsupported transformations that should not be used in a
      mapplet, these include: COBOL source definitions, joiner, normalizer, non-
      reusable sequence generator, pre- or post-session stored procedures, target
      definitions, and PowerMart 3.5 style lookup functions
   5. Do not reuse mapplets if you only need one or two transformations of the mapplet
      while all other calculated ports and transformations are obsolete
   6. Source data for a mapplet can originate from one of two places:
           o Sources within the mapplet. Use one or more source definitions
               connected to a Source Qualifier or ERP Source Qualifier transformation.
               When you use the mapplet in a mapping, the mapplet provides source data
               for the mapping and is the first object in the mapping data flow.
           o Sources outside the mapplet. Use a mapplet Input transformation to
               define input ports. When you use the mapplet in a mapping, data passes
               through the mapplet as part of the mapping data flow.
   8. To pass data out of a mapplet, create mapplet output ports. Each port in an Output
      transformation connected to another transformation in the mapplet becomes a
      mapplet output port.
           o Active mapplets with more than one Output transformations. You
               need one target in the mapping for each Output transformation in the
               mapplet. You cannot use only one data flow of the mapplet in a mapping.
o   Passive mapplets with more than one Output transformations. Reduce
    to one Output Transformation otherwise you need one target in the
    mapping for each Output transformation in the mapplet. This means you
    cannot use only one data flow of the mapplet in a mapping.

To top