Docstoc

Cache

Document Sample
Cache Powered By Docstoc
					Welcome to the
   World of
   “Cache”
           The Hidden agenda
a) Basics of Cache
      1) Memory Cache
      2) Where the cache files are created
      3) Naming Conventions
      4) Cache Calculations
b) Advanced Cache
      1) Look up Cache
      2) Aggregator Cache
      3) Joiner Cache
      4) Ranker Cache
               Let’s get to the Basics:
Cache is a combination of:

1)   Index Cache: Server stores key values or condition values used to index values at a faster
     rate.
2)   Data Cache: Server stores output values.

Caching Storage Overview :

•    For Index Caches:
     a) Aggregators store group by values from Group-By ports.
     b) Rankers store Group-By values
     c) Joiners store index values for the master (Join condition columns)
     d) Lookups Stores lookup condition information

•    For Data Caches:
     a) Aggregators store aggregate data based on Group-By ports (variable ports,
        output ports, non group by ports)
•    b) Rankers store ranking based on Group-By port (output rows other than ranked column)
•    c) Joiners store master table (Output columns not in Join condition).
     d) Look ups Stores stores lookup data that is not stored in the index cache.
                     Memory Cache :

•   The server creates a memory cache based on size specified in the session
    properties which can be done manually based on certain calculations .

•   By default, the PowerCenter Server allocates 1 GB to the index cache and 2GB to
    the data cache for each transformation instance.

•   If the PowerCenter Server cannot allocate the configured amount of cache memory,
    it cannot initialize the session and the session fails.

•   If the PowerCenter Server requires more memory than the configured cache size,
    it pages to the Disc. Since paging to disk can slow session performance, try to
    configure the index and data cache sizes to store data in memory.
            Where are the Cache Files
                   Created?
•      The PowerCenter Server creates the index and data cache files by default in the PowerCenter
       Server variable directory, $PMCacheDir.

•      If you do not define $PMCacheDir, the PowerCenter Server saves the files in the PMCache
       directory specified in the UNIX configuration file or the cache directory in the Windows
       registry. If the UNIX PowerCenter Server does not find a directory there, it creates the index
       and data files in the installation directory. If the PowerCenter Server on Windows does not
       find a directory there, it creates the files in the system directory.

•      If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple
       index and data files. When creating these files, the PowerCenter Server appends a number
       to the end of the filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index
       and data files are limited only by the amount of disk space available in the cache directory.

Three Instances when the Cache File exists even after Session completion:

•      a) The session performs incremental aggregation.
•      b) You configure the Lookup transformation to use a persistent cache.
•      c) The session does not complete successfully.
          Naming convention followed
            by Informatica Server:
•   [<Name Prefix> | <Prefix> <session ID>_<transformation ID>]_[partition
    index]<suffix>.[overflow index]

•   For example,

PMLKUP8_4_2.idx,

PMLKUP  transformation type as Lookup,
8  the session ID
4  the transformation ID,
2  the partition index.
File Name
Componen    Description
t
Name
            Cache file name prefix configured in the Lookup transformation.
Prefix
            Describes the type of transformation:
            Aggregator transformation is PMAGG.
Prefix      Joiner transformation is PMJNR.
            Lookup transformation is PMLKUP.
            Rank transformation is PMAGG.
Session
            Session instance ID number.
ID
Transfor
mation      Transformation instance ID number.
ID
            If the session contains more than one partition, this identifies the partition
Partition   number. The partition index is zero-based, so the first partition has no
Index       partition index. Partition index 2 indicates a cache file created in the third
            partition.
            Identifies the type of file:
Suffix      Index file is .idx.
            Data file is .dat.
            If a cache file handles more than 2 GB of data, the PowerCenter Server creates
            multiple index and data files. When creating these files, the PowerCenter
Overflow
            Server appends an overflow index to the filename, such as PMAGG*.idx.1 and
Index
            PMAGG*.idx.2. The number of index and data files are limited by the amount
            of disk space available in the cache directory.
                 Cache Calculations
•    Aggregator:
    Index size: (Sum of column sizes in group-by ports + 17) X number of groups.
    Data Size: (Sum of column sizes of output ports + 7) X number of groups.

•    Rank:
    Index size: (Sum of column sizes in group-by ports + 17) X number of groups.
    Data Size: (Sum of column sizes of output ports + 10) X number of groups + 20.

•    Joiner:
    Index Size: (Sum of master column sizes in join condition + 16) X number rows in
    master table.
    Data Size: (Sum of master column sizes NOT in join condition but on output ports
    + 8)X number of rows in master table

•    LookUp:
•   Index Size: # rows in lookup table [( S column size) + 16] * 2
•   Data Size: # rows in lookup table [( S column size) + 8]
                                   Aggregator,
                                                   Joiner, Lookup
Datatype                           Rank
                                                   precision + 8
Binary                             precision + 2   Round to nearest
                                                   multiple of 8

Date/Time                          18              24
Decimal, high precision off (all
                                   10              16
precision)
Decimal, high precision on
                                   18              24
(precision <=18)

Decimal, high precision on
                                   22              32
(precision >18, <=28)

Decimal, high precision on
                                   10              16
(precision >28)

Decimal, high precision on
                                   10              16
(negative scale)

Double                             10              16

Real                               10              16

Integer                            6               16
                                   ASCII mode:     ASCII mode: precision +
String                             precision + 3   9

Small integer                      6               16
           Lookup Caches Overview

•   The PowerCenter Server builds a cache in memory when it processes the
    first row of data in a cached Lookup transformation
•   It allocates memory for the cache based on the amount you configure in the
    transformation or session properties.
•   The PowerCenter Server stores condition values in the index cache and
    output values in the data cache
•   The PowerCenter Server queries the cache for each row that enters the
    transformation.
•   The PowerCenter Server also creates cache files by default in the
    $PMCacheDir
•   If the data does not fit in the memory cache, the PowerCenter Server stores
    the overflow values in the cache files. When the session completes, the
    PowerCenter Server releases cache memory and deletes the cache files
    unless you configure the Lookup transformation to use a persistent cache.
             Types of Lookup Cache
•   When configuring a lookup cache, you can specify any of the following options:
•   Persistent cache. You can save the lookup cache files and reuse them the next time the
    PowerCenter Server processes a Lookup transformation configured to use the cache
•   Recache from source. If the persistent cache is not synchronized with the lookup table, you can
    configure the Lookup transformation to rebuild the lookup cache.
•   Static cache. You can configure a static, or read-only, cache for any lookup source. By default,
    the PowerCenter Server creates a static cache. It caches the lookup file or table and looks up
    values in the cache for each row that comes into the transformation. When the lookup condition is
    true, the PowerCenter Server returns a value from the lookup cache. The PowerCenter Server
    does not update the cache while it processes the Lookup transformation.
•   Dynamic cache. If you want to cache the target table and insert new rows or update existing rows
    in the cache and the target, you can create a Lookup transformation to use a dynamic cache. The
    PowerCenter Server dynamically inserts or updates data in the lookup cache and passes data
    to the target table. You cannot use a dynamic cache with a flat file lookup.
•   For example, your lookup table is your target table. So when you create the Lookup selecting the
    dynamic cache what It does is it will lookup values and if there is no match it will insert the row in
    both the target and the lookup cache (hence the word dynamic cache it builds up as you go
    along), or if there is a match it will update the row in the target. On the other hand Static caches
    dont get updated when you do a lookup.
•   Shared cache. You can share the lookup cache between multiple transformations. You can
    share an unnamed cache between transformations in the same mapping. You can share a named
    cache between transformations in the same or different mappings.
        Calculating the Lookup Index
                   Cache
•   The lookup index cache holds data for the columns used in the lookup
    condition.
•   The formula for calculating the minimum lookup index cache size is different
    than calculating the maximum size.
•   For best session performance, specify the maximum lookup index cache
    size.
•   Calculating the Minimum Lookup Index Cache
•   200 * [( S column size) + 16]
•   Columns in lookup condition.
•   The minimum size for a lookup index cache is independent of the number of
    source rows.
•   Calculating the Maximum Lookup Index Cache
•   # rows in lookup table [( S column size) + 16] * 2
•    Columns in lookup condition.
                    Difference between Static and
                           Dynamic Cache
Static cache:
•   U can insert rows into the cache as u pass to the target.

•   The informatica server returns a value from the lookup table or cache when the
    condition is true.When the condition is not true, informatica server returns the default
    value for connected transformations and null for unconnected transformations.

•   You can use a relational or flat file lookup.

Dynamic cache :

•   U can not insert or update the cache.

•   The informatica server inserts rows into cache when the condition is false.This
    indicates that the the row is not in the cache or target table. U can pass these rows to
    the target table

•   You can use a relational look up only
•   Example:

•   The Lookup transformation, LKP_PROMOS, looks up values based on the
    ITEM_ID. It uses the following lookup condition:

•   ITEM_ID = IN_ITEM_ID1

•   ITEM_ID column size Column in lookup condition integer = 16

•   The lookup condition uses one column, ITEM_ID, and the table contains
    60,000 rows.
•   Use the following calculation to determine the minimum index cache
    requirements:
•   200 * (16 + 16) = 6,400
•   Use the following calculation to determine the maximum index cache
    requirements:
•   60,000 * (16 + 16) * 2 = 3,840,000
•   Therefore, this Lookup transformation requires an index cache size between
    6,400 and 3,840,000 bytes.
           Calculating the Lookup Data
                      Cache
•   In a connected transformation, the data cache contains data for the
    connected output ports, not including ports used in the lookup condition.
    In an unconnected transformation, the data cache contains data from the
    return port.
•   1) PROMOTION_ID - Connected output port not in lookup condition –
    Integer -> 16
•   2) DISCOUNT - Connected output port not in lookup condition - Decimal 
    16
•   The lookup table has 60,000 rows.
•   Use the following calculation to determine the minimum data cache
    requirements:
•   60,000 * (32 + 8) = 2,400,000
•   This Lookup transformation requires a data cache size of 2,400,000 bytes.
                  Aggregator Cache
•   When the PowerCenter Server runs a session with an Aggregator
    transformation, it stores data in memory until it completes the aggregation.

•   If you use incremental aggregation, the PowerCenter Server saves the
    cache files in the cache file directory.

     Note: The PowerCenter Server uses memory to process an Aggregator
    transformation with sorted ports. It does not use cache memory. You do not
    need to configure cache memory for Aggregator transformations that use
    sorted ports.
       Configuring the Session fro
        Incremental Aggregation
•   Use the following guidelines when you configure the session for incremental
    aggregation:

•   Verify the location where you want to store the aggregate files. Configure
    the session to write file names in the session log.
•    If you want the PowerCenter Server to write the incremental aggregation
    cache file names in the session log, configure the session with Verbose Init
    tracing.
•   Verify the incremental aggregation settings in the session properties.
    You can configure the session for incremental aggregation in the
    Performance settings on the Properties tab.
•   You can also configure the session to reinitialize the aggregate cache. If
    you choose to reinitialize the cache, the Workflow Manager displays a
    warning indicating the PowerCenter Server overwrites the existing cache
    and a reminder to clear this option after running the session.To configure a
    session for incremental aggregation:
      Calculating the Aggregator Index
                    Cache
The index cache holds group information from the group by ports.
   # groups [( S column size) + 17]
Columns  Group by columns
As per example,
STORE_ID – Integer size  6
ITEM - String size - 18
Therefore total column size = 18 + 6 = 24
Assuming there are 72,000 input rows
The Min Index Cache calculation is:
72,000 * (24 + 17) = 2,952,000
The max index cache calculation is double the amount:
2,952,000 * 2 = 5,904,000
Therefore, this Aggregator transformation requires an index cache size between
2,952,000 and 5,904,000 bytes.
             Calculating the Aggregator Data
                          Cache
•   The data cache holds row data for variable ports and connected output ports. As a result, the data
    cache is generally larger than the index cache. To reduce the data cache size, connect only the
    necessary input/output ports to subsequent transformations. Use the following information to
    calculate the minimum aggregate data cache size:
•   # groups[( S column size) + 7]
•   Column size  a) Non group by input/output ports.
                  b) Local variable ports.
                  c) Port containing aggregate
                     function (multiply by three).*


In the example,
ORDER_ID – Integer  6
SALES_PER_STORE_ITEMS - Decimal  30*
Total = 36
The total number of groups as calculated for the index cache size is 72,000. Use the following
    calculation to determine the minimum data cache requirements:
•   72,000 * (36 + 7) = 3,096,000
•   Therefore, this Aggregator transformation requires a data cache size of 3,096,000 bytes.
                           Joiner Cache
•   While using joiner cache informatica server first reads the data from master source
    and built index & data cache in the master rows. After building the cache,the
    PowerCenter Server then performs the join based on the detail source data and the
    cache data.
•   Server creates the Index cache as it reads the master source into the data cache.
    The server uses the Index cache to test the join condition. When it finds a match, it
    retrieves rows values from the data cache
•   The PowerCenter Server caches all master rows with a unique key in the index
    cache, and all master rows in the data cache.
•   For instance,
    Index cache. The PowerCenter Server caches 100 master rows with unique keys.
    Data cache. The PowerCenter Server caches the master rows in the data cache that
    correspond to the 100 rows in the index cache. The number of rows it stores in the
    data cache depends on the data. For example, if every master row contains a unique
    key, the PowerCenter Server stores 100 rows in the data cache. However, if the
    master data contains multiple rows with the same key, the PowerCenter Server
    stores more than 100 rows in the data cache.
                  Joiner Index Cache
                      Calculation
The index cache holds rows from the master source that are in the join
   condition.

# master rows [( Sum of column size) + 8]
Column Size Master column in join condition.

In the example, it joins the sources ORDERS and PRODUCTS on ITEM_NO:
• ITEM_NO – Decimal(10)  16

•   PRODUCTS is the master source and has 90,000 rows. Use the following
    calculation to determine the minimum index cache requirements:
•   90,000 * (16 + 16) = 2,880,000
•   Double the size to determine the maximum index cache requirements:
•   2,880,000 * 2 = 5,760,000
•   Therefore, this Joiner transformation requires an index cache size between
    2,880,000 and 5,760,000 bytes.
                     Joiner Data Cache
                         Calculation
•   The data cache holds rows from the master source until the PowerCenter Server
    joins the data.
•   # master rows [( S column size) + 8]
•   Column  Master column not in join condition and used for output.
•   In the example , The following figure shows the connected output ports for
    JNR_ORDERS_PRODUCTS:
•   ITEM_NAME – string  32
•   PRODUCT CATEGORY – decimal  30
•   Total column size = 62
•   The master source has 90,000 rows.
•   Use the following calculation to determine the minimum data cache requirements:
•   90,000 * (62 + 8) = 6,300,000
•   This Joiner transformation requires a data cache size of 6,300,000 bytes.
                            Rank Caches
•   When the PowerCenter Server runs a session with a Rank transformation, it
    compares an input row with rows in the data cache. If the input row out-ranks a
    stored row, the PowerCenter Server replaces the stored row with the input row.
•   For example, you configure a Rank transformation to find the top three sales. The
    PowerCenter Server reads the following input data:
•   SALES
•   10,000
•   12,210
•   5,000
•   2,455
•   6,324
•   The PowerCenter Server caches the first three rows (10,000, 12,210, and 5,000).
    When the PowerCenter Server reads the next row (2,455) it compares it to the cache
    values. Since the row is lower in rank than the cached rows, it discards the row with
    2,455. The next row (6,324), however, is higher in rank than one of the cached rows.
    Therefore, the PowerCenter Server replaces the cached row with the higher-ranked
    input row.
•   If the Rank transformation is configured to rank across multiple groups, the
    PowerCenter Server ranks incrementally for each group it finds.
               Calculating the Rank Index
                         Cache
•   The index cache holds group information from the group by ports. Use
    the following information to calculate the minimum rank index cache size:
•   Rank Index Calculation:
•   # groups [( S column size) + 17]
•   Columns  Group by columns.
•   PRODUCT_CATEGORY (string(21)- column size) = 24
•   There are 10,000 product categories, so the total number of groups is
    10,000. Use the following calculation to determine the minimum index cache
    requirements:
•   10,000 * (24 + 17) = 410,000
•   Double the size to determine the maximum index cache requirements:
•   410,000 * 2 = 820,000
•   Therefore, this Rank transformation requires an index cache size between
    410,000 and 820,000 bytes.
              Calculating the Rank Data
                        Cache
•   The data cache size is proportional to the number of ranks. It holds row data
    until the PowerCenter Server completes the ranking and is generally larger
    than the index cache. To reduce the data cache size, connect only the
    necessary input/output ports to subsequent transformations. Use the
    following information to calculate the minimum rank data cache size:
•   # groups [(# ranks *( S column size + 10)) + 20]

•   ITEM_NO Decimal(10) = 10
•   ITEM_NAME String(23) = 26
•   PRICE Decimal (14) = 10
•   TOTAL COLUMN SIZE = 46
•   RNK_TOPTEN ranks by price, and the total number of ranks is 10. The
    number of groups is 10,000.
•   Use the following calculation to determine the minimum data cache
    requirements:
•   10,000[(10 * (46 + 10)) + 20] = 5,800,000
•   This Rank transformation requires a data cache size of 5,800,000
•   bytes.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:9/9/2011
language:English
pages:37