Docstoc

oracle-and-informaatica-performance-tuning-framework

Document Sample
oracle-and-informaatica-performance-tuning-framework Powered By Docstoc
					PERFORMANCE EVALUATION:
ORACLE AND INFORMATICA




                    Page 1 of 49
                                                  TABLE OF CONTENTS


TABLE OF CONTENTS ............................................................................................................................. 2

DOCUMENT PURPOSE ............................................................................................................................. 4

SECTION 1: TUNING OF INFORMATICA SERVER AND MAPPINGS ........................................... 5
    1.1 SUGGESTED INFORMATICA SERVER CONFIGURATION ........................................................................... 5
       1.1.1 PMServer Configuration ............................................................................................................... 5
       1.1.2 Repository Server Configuration ................................................................................................... 6
       1.1.3 Informatica Environment configuration ........................................................................................ 7
    1.2     INFORMATICA SERVER TUNING FRAMEWORK ................................................................................. 8
    1.3     INFORMATICA MAPPING TUNING FRAMEWORK............................................................................... 9
       1.3.1     Tuning Goal ............................................................................................................................ 9
       1.3.2     Mapping Tuning Steps ............................................................................................................ 9
SECTION 2: TUNING OF ORACLE SERVER AND APPLICATION COMPONENTS ..................13
    2.1 DEFINING THE TUNING SCOPE AND EFFORT ESTIMATION ......................................................................14
    2.3 THE TUNING PROCEDURE .....................................................................................................................15
       2.3.1 Instance Optimization ...................................................................................................................15
         2.3.1.1 SGA Parameters ....................................................................................................................15
         2.3.1.2 I/O Parameters .......................................................................................................................17
         2.3.1.3 Sort Parameters ......................................................................................................................18
         2.3.1.4 Parallel Query Parameters .....................................................................................................18
         2.3.1.5 Oracle Parallel Server Parameters .........................................................................................18
         2.3.1.6 OS Specific Parameters .........................................................................................................18
         2.3.1.7 Contention Issues ...................................................................................................................19
       2.3.2 Space Management .......................................................................................................................21
         2.3.2.1 Categorization of tablespaces – Small extent, Medium extent, Large extent. .......................21
         2.3.2.2 Organization of tablespaces – multiple data files on different drives. ...................................21
         2.3.2.3 Table space fragmentation – Internal / External ....................................................................22
       2.3.3 SQL Reuse Opportunities .............................................................................................................22
       2.3.4 Optimal Data Access with suitable object placement ...................................................................22
       2.3.5 Query Tuning -- Concepts ............................................................................................................24
         2.3.5.1 Optimizer Approach / Goal ...................................................................................................24
         2.3.5.2 Full Table Scans Vs Range Scans..........................................................................................27
         2.3.5.3 Index Scan Types...................................................................................................................28
         2.3.3.5 Join Types ..........................................................................................................................30
         2.3.4.5 Stats Collection ..................................................................................................................33
       2.3.6 Query Tuning - Steps ....................................................................................................................34
         2.3.6.1 Using Hints – Overview ....................................................................................................34
         2.3.6.2 Optimizer Approach / Goal Hints ......................................................................................35
         2.3.6.3 Hints for Access Paths .......................................................................................................36
         2.3.6.4 Hints for Query Transformations .......................................................................................37
         2.3.6.5 Hints for Join Orders .........................................................................................................38
         2.3.6.6 Hints for Join Operations ...................................................................................................39
         2.3.6.7 Hints for Parallel Operations .............................................................................................40
         2.3.6.8 Miscellaneous Hints...........................................................................................................41
3      REFERENCES ....................................................................................................................................42

APPENDIX A ..............................................................................................................................................43


                                                                                                                                     Page 2 of 49
  APPENDIX A.1 – PMSERVER CONFIGURATION FILE ...................................................................................43
APPENDIX B – ORACLE STATS COLLECTION APPROACH .........................................................45
  B.1 – STATS COLLECTION ..........................................................................................................................45
  B.2- ORACLE STATS ANALYTICS................................................................................................................46




                                                                                                                               Page 3 of 49
Document Purpose
This document has been prepared for the DW BI practice of the e-Enabling group (Finance & Securities)
with the following objectives:


         Tune Informatica server and mappings by employing the suggested framework and best practices.

         Tune Oracle server and components by employing the suggested framework and best practices.


The document is structured as under:



   SECTION 1 – Tuning of Informatica Server and Mappings

    This section is structured along the following lines:

       Framework for tuning the Informatica server and its environment.

       Framework for tuning Informatica mappings.


   SECTION 2 – Tuning of Oracle Server and Application Components

    This section is structured along the following lines:

       Framework for tuning the Oracle server and its environment.

       Framework for tuning Oracle components.




                                                                                       Page 4 of 49
SECTION 1: Tuning of Informatica Server and Mappings
In this section, we present actionable framework for improving performance of Informatica server and the
mappings/sessions.

1.1 Suggested Informatica Server Configuration
Informatica server should preferably be hosted on a hardware platform that has the following minimum
characteristic features:

       8 CPU SMP box, to ensure a maximum of 8 partitions without any appreciable depreciation in
        scale up.
       1 GB RAM dedicated to each of the processor (8 GB RAM in total)
       High-speed swap space of at least 40 GB mapped to the CPU bank. This ensures that
        $PMCacheDir is mapped to a high-speed swap space instead of mounted drives. Empirical tests
        conducted on effects of high-speed swap Vs. mounted drives have shown paging performance gain
        of about 20%.
       At least, 10 Mbps network connectivity between Informatica server, clients, file systems and data
        base servers.

1.1.1 PMServer Configuration
PMServer must support a reasonable number of concurrent sessions without loss of performance owing to
scale up issues. The performance specific parameters are mentioned below. For all parameters of the
pmserver config file, please refer to Appendix A.1

        CommThreadPoolSize=30
        ServiceThreadPoolSize=30
        EngineThreadPoolSize=30
        LMRepoConnPoolSize=60 /*To support 60 concurrent repository connections */


        # Absolute or relative path of the file where the LM will write its messages.

        LogFileName=/path/<Rep Name>/pmserver.log         /* should be on high speed mounted
        drive */

        # Determine one of the two server data movement modes: UNICODE or ASCII.
        # If not specified, ASCII data movement mode is assumed.

        DataMovementMode=ASCII /* Unless we are processing multiple languages, ASCII mode
        is ok from performance perspective */

        # Maximum number of sessions – Based on our peak time requirement. Set it 20% over
        the maximum concurrency requirement to account for processing first time failures.

        MaxSessions=200

        # Load Manager (pmserver) shared memory, in bytes
        # For each 10-count increase of MaxSessions, increase this by 2000000 bytes

        LMSharedMem=40000000

        # Obsolete parameter PMCacheDir: PowerMart Cache directory location
        # Will be de-supported in future release
        # Recommendation: Use server variable $PMCacheDir in the Workflow Manager

        PMCacheDir=/tmp /* Should be high speed swap space rather than mounted drive to
        improve paging efficiency */



                                                                                         Page 5 of 49
        # Applies if there is no free LMSHM slot available to run a session. If set
        # to Yes, it fails the session else it places the session in the pending queue.

        FailSessionIfMaxSessionsReached=No
        # Specify Yes to turn on PowerMart 4.0 / PowerCenter 1.0 date handling
        # compatibility.

        # Max # of DB connections to be used by Lookup/SP.
        # 0 means unlimited

        MaxLookupSPDBConnections=0

        # Number of deadlock retries that will be attempted per partition.

        NumOfDeadlockRetries=10

        # Number of seconds to sleep before attempting next deadlock retry

        DeadlockSleep=0


1.1.2 Repository Server Configuration
The repository server should be able to manage concurrent requests from the pmserver without any
noticeable overheads. It should always keep session initialization time during session execution within 10-
15 seconds. It should also have lesser number of time outs while trying to fetch information from the
repository database.

        DynamicConfigRefreshInterval=10


        /*** Repserver configuration ***/
        MaximumConnections=500
        MaximumLocks=50000
        MessageReceiveTimeout=200
        MessageSendTimeout=200
        ThreadWaitTimeout=60
        KeepAliveTimeout=120
        DatabasePoolSize=500
        DatabaseConnectionTimeout=600

        ErrorSeverityLevel=Warning
        LogFileName=/path../<Rep Server Name >/pmrepagent.log
        /* Should be on a high speed mounted drive */

        DateDisplayFormat=DY MON DD HH24:MI:SS YYYY

The above Rep Server configuration has performed well in a very demanding computational environment.




                                                                                           Page 6 of 49
1.1.3 Informatica Environment configuration
The shell script excerpt gets executed by every shell started by a process.

         #
         # this file gets executed by every shell we start
         #

         limit descriptors 4096
         limit datasize unlimited
         setenv ORACLE_HOME <Path of Oracle Bin>
         setenv PM_HOME <Path of Informatica Bin>
         setenv PM_REPHOME $PM_HOME/repositoryserver
         setenv NLS_LANG <Appropriate Language Code>

         # make sure the path is correct
         set path=(. Set of Application Paths)

         setenv TNS_ADMIN $ORACLE_HOME/network/admin
         setenv LD_LIBRARY_PATH $PM_REPHOME/:$ORACLE_HOME/lib:$PM_HOME/:<Path of
         appropriate libraries>

         setenv SYBASE <Path of Sybase Runtime environment>
         setenv DSQUERY LN_GGLDREP_PROD

         # setup umask to something reasonable (sic. paranoid)
         umask 022




                                                                              Page 7 of 49
1.2Informatica Server Tuning Framework
We have recorded the server performance on the criterion mentioned below. This gives us a realistic
estimate on what to expect from the Informatica server given its operating environment. It also gives us the
read and writes throughput thresholds. Based on estimates mentioned below, we can arrive at the upper
limit of a mapping/session performance and its running time.

                    Criteria                                                Measures
Single partition write throughput on a non-         7000 rows/sec
constrained table
Single partition write throughput on a              2500 rows/sec
constrained table (key constraints, row level
trigger for sequence number generation)
Single partition read throughput from a flat file   12000 rows/sec
located on the network file system
Metadata load and session initialization            Under 1 minute. Repository database server is connected
                                                    to the Informatica server via a 10 BASE T network
                                                    connection (10 mbps)
Mapping 1 – Read 20 attributes from very large      Read throughput – 10000 rows/sec
flat file (1 GB +, 30 million records), group by    Time to build aggregator cache – 50 minutes
10 ports in an aggregator transform and dump        Total execution time – 80 minutes (single partition)
the output to a flat file (/dev/null).
Mapping 2 – Read 20 attributes from very large      Read throughput – 6000 rows/sec
table (30 million records), group by 10 ports in    Time to build aggregator cache – 90 minutes
an aggregator transform and dump the output to      Total execution time – 120 minutes (single partition)
a flat file.
Mapping 3 – Read 20 attributes from very large      Read throughput – 4000 rows/sec
table (30 million records), perform lookup on a     Time to build 4 lookup caches – 10 minutes
set of very large lookups (> 1 million) that have   Stabilized write throughput – 3800 rows/sec
default data cache and index cache of 2MB and
1 MB respectively. Dump data to a flat file         Total execution time – 150 minutes (single partition)
Scalability Test – Estimation of performance by     20 % depreciation owing to thread communication and
going in for multiple partitions                    maintenance overhead observed. This means that for 2
                                                    partitions, performance will improve by 1.6 times.


We can also look for ways to improve the above-mentioned metrics by experimenting the parameters of the
configuration file, network connectivity etc. By evaluating the above-mentioned criterion, we are seeking
to determine:

        Optimal DB Write throughput (Can also be improved by increasing the TNS packet size)
        Optimal DB Read throughput
        Repository Database access efficiency
        Aggregation efficiency
        Paging efficiency
        Server scalability for multi-partition session execution

The above metrics have been obtained on 24 CPU, 24 GB RAM Sun Fire E6800 machine with 80GB high-
speed swap drive. We can benchmark the server-operating environment along the criteria mentioned above.
This will give us the best-case run time estimates of the mappings/sessions before we tune them.




                                                                                            Page 8 of 49
1.3Informatica Mapping Tuning Framework
1.3.1 Tuning Goal
We should first set a realistic execution time for the mapping that is to be tuned. It can be done by:

        Volume read by the source qualifier – time taken to read it using the best case read throughput
         estimated in Section 1.2.
        Volume written by the mapping to the relational/ flat file target – time taken to write to the target
         using the best case write throughput estimated in Section 1.2.
        Time taken to build the aggregator cache estimated on the basis of results obtained in Section 1.2.

The execution time estimated above happens to be the best-case single partition run time for the mapping
with its existing design. At this point, we may want to appraise the choice of transforms (AGG, LKP, JNR)
used in the mapping. We can look into the possibility of using LKP for aggregation instead of an AGG
transform, SQL join instead of a LKP or JNR.

If the processing logic does not allow for innovations prescribed above, then we have to look into the
possibility of tuning the transforms used in the mapping and reach closely to the best case running time as
calculated above.

Note – Partitioning of the mapping should be done only after tuning the constituent transformations of the
mapping.

1.3.2 Mapping Tuning Steps
We can tune the mapping along the following lines:

        Transformation tuning: Database queries – LKP SQLs, Source Qualifier SQLs, UPDATE strategy
         override SQLs. Use SQL tuning details presented in Section 2.3.6.
        Transformation Cache sizes – AGG, JNR, LKP, SRTR transforms.
        DB Write on tables – Storage properties of the table object, TNS packet configuration.
        Session properties – Block buffer, DTM buffer, Commit interval, Partition points, Session
         partitioning.

   Transformation tuning: Database queries – LKP SQLs, Source Qualifier SQLs, UPDATE
    strategy override SQLs

Database queries should be tuned, re-written to get the best possible throughput. For details on query
tuning, please refer to Section 3.3.6 of the document.

As lookup caches are built in serial, we need to minimize the time spent in building the caches. So, we
should aim for retrieval of all records from the database as quickly as possible.

We can also customize the UPDATE strategy with adequate WHERE clause and hints to speed up the
process.

We should also seek to do all data validations and transformations in the SQL query itself (applicable with
relational sources). Examples for the same are as under:

decode(nvl(company_code,'0000'),
'-','0000',company_code) as COMPANY_CODE

Such clauses ensure that we do not do any data validations in EXP transform with IIF function calls.




                                                                                               Page 9 of 49
We should write our transformation logic that avoids warning messages in the session log, like:

TE_7004 [<<PM Parse Warning>> [SUBSTR]: operand converted to a string
... IIF(substr(>>>>Attribute Name <<<<, 0, 5) = '28695', (Attrib Name 1– Attrib Name2) *
-1, (Attrib Name3 – Attrib Name4) * -1)]

In order to curb such warning messages, we can model expression transform port logic like:

IIF(group_num_for_lkp = -1, in_pp_indicator, IIF(amt_for_lkp >= 0 AND substr(in_acct, 1, 1) <> '1', '08',
IIF(amt_for_lkp < 0 AND substr(in_acct, 1, 1) <> '2', '08',in_pp_indicator)))

By simplifying substr(in_acct, 1, 1) <> '1' and substr(in_acct, 1, 1) <> '2' at the database level itself by
firing this as part of extra ports in the source qualifier of the mapping

Example: Additional flag port in the SQ can be

decode (substr(IN_ACCT, 1, 1),'1','ONE_ACCT','2','TWO_ACCT','OTHER')                             as acct_type

This can in turn be used in the port logic, thus speeding up the transform (minimizes function calls in
Informatica).

We can also eliminate ISNULL checks by doing the same in the SQ by devoting an additional port for the
same.

Example: Additional flag port for null check on an attribute can be like:

Decode (IN_ACCT, null, 'NULL', 'NOT_NULL') as null_ind

This will allow us to prevent null checks like ISNULL (IN_ACCT) on the Informatica side. In a nutshell,
we should always look for ways to minimize Informatica transformation language function calls as much as
possible.

Transformation bottlenecks are also caused due to use of Aggregator transform in the mapping. We can
also explore the possibility of using LOOKUP transform for aggregation purposes. This will speed up
performance dramatically.

SELECT sum(nvl(t1.AMT_USD_EQV,0)) as AMT_USD_EQV,
decode (nvl(t1.COMPANY_CODE, '0000'), '-', '0000', COMPANY_CODE) as COMPANY_CODE,
nvl(t2.Credit_Line_Id,'-') as Credit_Line_Id,
nvl(t2.Credit_Source,'-') as Credit_Source
FROM
Table1 t1,
Table2 t2,
RulesTable RT
WHERE
Set of Constraints
GROUP BY
decode (nvl(t1.COMPANY_CODE, '0000'), '-', '0000', t1.COMPANY_CODE),
nvl(t2.Credit_Line_Id,'-'),
nvl(t2.Credit_Source,'-')

We can also use Filter transform (if applicable) as close to the SQ of the mapping. We should also explore
the possibility of performing single pass read as much as possible.




                                                                                                            Page 10 of 49
   Transformation Cache sizes – AGG, LKP, JNR, SRTR transforms

For very large lookups, aggregator, joiner and sorter caches, it is not possible to allocate enough RAM in
order to prevent paging. But, we can always minimize the effects of paging by:

        o    Allocating large amount of RAM for index caches as compared to data caches. This will
             minimize effect of paging on index cache. This will improve transformation performance.
        o    Having high speed swap space for $PMCacheDir and not mounted network drives.
        o    Going in for 40-60 rule – Have 40% in RAM and remainder in swap space for data and index
             caches.

We can also review the order in which comparison is made in the condition port of LKP transform. This
also helps in optimal formation of the index cache and improves performance of the LKP transform.

   Database write – Storage properties of the tables

Causes and remedies of Target bottlenecks (relational) can be due to:

        o    I/O contention on the underlying tablespace datafiles of the table. A quick way to check it is
             by firing the following query during high activity time slices:

             select d.name, f.phyrds, f.phywrts
             from
             v$datafile d,
             v$filestat f
             where d.file# = f.file#
             and f.phyrds >100000 and f.phywrts > 100000

        o    Too many concurrent writes on set of tables belonging to the same table space. In such
             scenarios, it is useful to consider multiplexing a set of table objects on a set of table spaces.
        o    In case of multiple partitions writing on the same table object, we should review the
             INITRANS parameter of the table object. We should increase it to a value between 10-20 to
             allow for adequate concurrency.
        o    We should also examine the possibility of going in for uniform extent management in the
             storage clause of the table object. In case of large volume batched inserts, it allows for fewer
             extent allocation cycles at run time. A desirable STORAGE clause of table objects could be:

                  INITRANS 10 /* To support adequate concurrency */
                     MAXTRANS 255
                   STORAGE (
                     INITIAL 10M
                     NEXT 10M
                     PCTINCREASE 0 /*Uniform extent management with large extent size */
                     MINEXTENTS 1
                     MAXEXTENTS 2147483645
                   )
        o    Have fewer data validation constraints at the data base level. We may not want to have NOT
             NULL DEFAULT value constraints on non-key columns and instead have such validations
             built into the mapping itself.
        o    Improve the TNS packet size on the Informatica server.

For minimizing flat file bottlenecks, we may want to stage data local to the Informatica server and then
FTP to the desired location.



                                                                                            Page 11 of 49
   Session properties – Block buffer, DTM buffer, Commit interval, Partition points, Session
    partitioning

       o   Line Sequential Length – 1024 bytes by default. Consider increasing it when the row size of
           the SQ of the mapping has greater than 1024 bytes.
       o   Block buffer – In large volume mappings, we can increase it to 128K from default 64K
       o   DTM buffer – In large volume mappings, we can consider increasing it from default value of
           12MB.
       o   Commit interval – In order to utilize large DTM buffer allocations; we should increase the
           commit interval to a large value from its default setting of 10,000 rows. This will ensure that
           we have fewer database write cycles at run time. Study the session log to see the number of
           DB Write cycles.
       o   Partition points – We can consider introducing partition points at appropriate stages of the
           mapping pipeline if existing default partition point threads show a utilization percentage of >
           50%. We should also check for new partition point thread utilization percentage. If it is lower
           than 20% then we should re-consider having a partition point at that stage in the pipeline.
       o   Session partitioning – As a general rule, partition count should not be greater than the CPU
           bandwidth. Data should be partitioned on a key that divides uniform number of records across
           partitions.




                                                                                        Page 12 of 49
SECTION 2:                 Tuning          of     Oracle       Server          and        Application
components
This section provides actionable information for tuning the highest impact database performance areas,
such as:

   Server instance memory configuration parameters – SGA performance
   Instance parameters – Redo Log, Shared pool, Buffer cache and Sort area.
   Database Object sizing and their placement, reorganization etc. for better space management and
    optimal data access.
   Application SQL tuning
   Indexing strategies as per application semantics

The database tuning process consists of tasks such as:

   Balancing the different types of competing resources in the database environment so that the most
    important applications have priority access to the resources they need to use.

   Identifying and eliminating resource bottlenecks

   Optimizing the use of existing resources within the database environment

   Taking advantage of database features for the types of work performed on the database


Tuning an Oracle application /server usually follows the logical steps mentioned below:

    1.    Tune the business rules
    2.    Tune the data design
    3.    Tune the application design
    4.    Tune the logical structure of the database
    5.    Tune the SQL
    6.    Tune the access paths
    7.    Tune the memory
    8.    Tune the I/O and physical structure
    9.    Tune the resource contention
    10.   Tune the underlying platforms

Decisions made in one step are also likely to impact other steps. For example – Rewriting an SQL in Step 5
may have significant bearing on parsing and caching issues addressed in Step 7.




                                                                                           Page 13 of 49
2.1 Defining the tuning scope and effort estimation
In a nutshell, the tuning process can be broadly enumerated as under:

               Type of Tuning                                            Tuning Scope

Application Tuning                                    SQL Reuse Opportunities
                                                      Optimal Data Access / Query Tuning


Instance Tuning                                       Instance parameter optimizations


Space Management                                      Appropriate space management


Effort estimate can be illustrated in the matrix below:

   Tuning               Instance        Space             SQL         Data Access     Data Access     Data Access
Considerations         Optimization   Management          Reuse        By Table       by workload       Index
                                                                                                       Rebuild
Amount of data            Small        Medium to        Small to        Small to        Small to         Small
to collect                              Large           Medium           Large           Large
Time required to          Short        Medium to        Short to        Short to      Short to Long      Short
collect data                             Long           Medium           Long
Impact            of    Negligible       Low             Low             Low                Low          Low
collection effort
on database
Manual        effort       Low           Low              Low         Low to High           Low          Low
required          to
collect/ edit data
Potential                  Low           Low              Low             Low               Low          Low
complexity        of
implementing
tuning
recommendations
Potential      gain      Medium         Medium          Medium         Medium to          Medium to    Medium
from                                                                     High               High
implementing
tuning
recommendations

Based on project realities, we can define the tuning scope as illustrated above.




                                                                                                  Page 14 of 49
2.3 The Tuning Procedure
We will present actionable steps to tune the Oracle server and its applications. The tuning procedure is
structured as under:

         Instance Optimization
         Space Management
         SQL Reuse Opportunities
         Optimal Data Access with suitable object placement
         Query Tuning – Concepts
         Query Tuning – Approach


Please note that the tuning steps presented above benefits the step that follows it i.e. Instance (SGA, Redo
Logs, Shared pool, Buffer cache, sort area etc) optimization ensures optimal system response and aids
space management. Effective space management ensures optimal I/Os, query response time etc.

We can take up one or more than one step based on our cost-benefit analysis studies.


2.3.1 Instance Optimization
Instance parameters control the behavior of the database and certain configuration options, such as how the
database will use memory resources on the system or how the parameters will handle various contention
problems. Please refer to Appendix B for a methodology for collecting data for instance tuning.

2.3.1.1 SGA Parameters

These parameters affect the total size of the instance’s System Global Area (SGA). The appropriate setting
of these parameters result in efficient utilization of memory and prevents reparsing SQL statements except
when necessary. Examples of these parameters include the db_block_buffers and
shared_pool_reserved_size parameters. We can view the current settings of the instance parameters in
V$PARAMETER.

The main composition of Oracle SGA is:

        The database buffer cache
        The shared SQL Area
        The redo log buffer

The SGA should be

        Large enough to fit comfortably into the system’s existing memory
        Small enough to coexist with other applications and not allocate more memory than it requires

We can see the current SGA size for a database by executing the following query:

select name, bytes            from      v$sgastat

SGA size can be altered by modifying the following parameters in init.ora:

DB_BLOCK_BUFFERS – The number of database blocks (of size DB_BLOCK_SIZE) allocated to the
database buffer cache.



                                                                                           Page 15 of 49
LOG_BUFFERS – The size in bytes of the redo log buffer.

SHARED_POOL_SIZE – The size in bytes of the shared SQL area.

   Database buffer size

The database buffer cache is the memory buffer within the SGA that holds copies of data that has been read
and often changed from the physical database files. There are as many buffers in this buffer cache as the
value of DB_BLOCK_BUFFERS. They include:

        Dirty Buffers – Buffers that have been changed but not written back to the disk
        Pinned Buffers – Buffers that are currently been accessed.
        Free Buffers - Buffers that are available for use

It is desirable to have Oracle work within the SGA memory area as much as possible, the hit rate within the
database buffer cache should be very high – usually >70%.

select name, value
from v$sysstat
where name in ('consistent gets', 'db block gets', 'physical reads')

Hit Ratio = 1- (physical reads /(consistent gets+ db block gets))

   Shared Pool Size

The SGA shared pool area is composed primarily of two entities:

        Shared SQL cache – Used to retain previously executed queries, procedures, and other SQL-based
         operations in the SGA.
        Data dictionary cache – It contains calls made to the data dictionary, which must be done before
         every single action in the database.

As with the database buffer cache, the efficiency of the shared pool cache is determined by a hit ratio that
indicates how often the Oracle RDBMS can process information in memory and how often it must retrieve
information from disk. The database should work as much from memory as possible before going to the
disk. Although this is not always practical but then we must record their values during high activity time
slices and see whether they are within acceptable limits.

The following script compares the number of pins (how often an item was executed) to the number of
reloads (how often a miss occurred):

select sum (pins) pins, sum (reloads) reloads
from v$librarycache

The ratio of reloads to pins is as under:

Ratio = (reloads / pins)*100.

If the value is greater than 1 or 1, we can bring the ratio down by tuning the Shared SQL area by increasing
the size of the shared pool.

Similarly the data dictionary cache determines how often Oracle goes to disk when it accesses information
on users, privileges, tables, indexes, and so on. Most database systems reuse the same database objects
repeatedly. Therefore, if a high degree of disk access takes place for operations that run the same programs,
the information is likely being aged out too often. The same rule holds true for other shared pool areas.



                                                                                           Page 16 of 49
The following script enables us to retrieve the number of GETS (information requests on the object) and
GETMISSES (cached or missed queries):

select sum(gets) GETS, sum(getmisses) GETMISSES
from v$rowcache

Ratio = (getmisses / gets) * 100

If the ratio is > 10 %, we may consider increasing the value of the SHARED_POOL_SIZE parameter.

   Ramifications of SGA changes

It is relatively easy to change the SGA parameters, but we also need to consider the ramifications of
making changes.

The most obvious benefit of increasing the size of the SGA is that the larger the SGA, the more information
can be processed in memory. By enabling the database to have most of its data cached, physical disk I/O is
minimized, which results in a system that is constrained more by the CPU speed than by the speed of the
I/O devices. The law of diminishing marginal utility applies, however. Depending on the size of the
database and the amount of activity being performed, increasing the size of SGA buffers ceases to have any
positive effect after a certain point. After this occurs, the database begins to hoard memory that could have
been better utilized by other OS processes or applications.

There are some SGA parameters (e.g. – Sort Area) that incur memory per user connection. It is also
possible that some SGA setting actually artificially induce paging and swapping onto the system. The
following guidelines may be useful during iterative tuning of SGA parameters:

        SGA should fit comfortably in available RAM (not VM) without starving other
         processes/applications
        Buffer size should not be larger than they need to be.
        Watch out for database parameters that incur use for every user such as, SORT_AREA_SIZE.
         Setting them too high a value can induce paging/swapping.
        We need to be careful about parameters that can cause changes in other parameter values (e.g. –
         Database block size).
        While making changes to the SGA, one needs to account for factors like how fast the database is
         growing (size plus user connections).


2.3.1.2 I/O Parameters

These parameters affect the throughput or distribution of I/O for the instance. Examples of these parameters
include the checkpoint_process and db_file_multiblock_read_count parameters.

We can quickly pinpoint which database files are accessed most heavily and spread them across disks to
balance the load. We can issue the following SQL to determine the I/O load on the database files. We need
to take readings during low activity, medium activity and high activity time slices to arrive at a more
realistic picture.

select d.name, f.phyrds, f.phywrts
from
v$datafile d,
v$filestat f
where d.file# = f.file#




                                                                                           Page 17 of 49
Apart from examining the possibility of moving a data file from one drive to another, Oracle also gives the
option of stripping a data file into multiple smaller files. Striping enables the database to distribute its data
across the disks, and it speeds I/O access by minimizing contention against disk drives.

2.3.1.3 Sort Parameters

These parameters influence how the Oracle Server performs sort operations on behalf of the user. Examples
of these parameters include the sort_direct_write and sort_area_retained_size parameters.

   Sort Area Size

SORT operation is performed by the database when the SQL query has explicit clauses like GROUP BY
and ORDER BY operation. It is also performed implicitly in case of index creation. Internal sorting
(performing the sorting in memory) is always faster than external sorting (performing the sorting in disk).

Whenever an operation is undertaken that requires sorting, Oracle attempts to do it in the memory of the
user process that requests the sort. Sorts are constrained by the following init.ora parameters:

SORT_AREA_SIZE – The maximum amount of space in bytes that a user process has available to perform
a sort.

SORT_AREA_SIZE_RETAINED – The minimum amount of space in bytes that a user process will ever
have available.

Exceeding sort area size, causes the sort to occur on disk. When no space in the memory is available for the
sort area, Oracle resorts to using the designated temporary table space for sorting operations.

The following script allows us to see whether the sorting is performing efficiently:

select name, value
from v$sysstat
where name like 'sort%'

Because of large volumes of data typically processed, it is impossible to prevent external sorting. At best
we can have optimal usage of sort area in memory.

2.3.1.4 Parallel Query Parameters

These parameters are specific to the parallel query behavior for the instance. Examples of these parameters
include the parallel_min_servers and parallel_max_servers parameters.


2.3.1.5 Oracle Parallel Server Parameters

These parameters are specific to the Oracle Parallel Server environment. These parameters are the
gc_files_to_locks and gc_releasable_locks parameters.


2.3.1.6 OS Specific Parameters

These parameters are specific to the instance parameters of the operating system and their availability
varies from platform to platform. These parameters can have a significant impact on performance.
Examples of these parameters include disk_async_io and dbwr_io_slaves parameters.




                                                                                               Page 18 of 49
2.3.1.7 Contention Issues

When contention happens, the database must wait for an event to occur. This event such as writing a block
of data to a physical device or locking a row inside a database table – causes an appreciable slow down in
performance. Mitigation of contention allows the database to perform efficiently.

   I/O Contention and Load Balancing

Contention among physical devices is the most common type of contention. A database is made of several
data files that can reside on the same physical disk drive. If the database requests access to several data files
on the same disk, the result is contention as multiple requests need to be serialized (wait).

We have the following types of database files:

         o    Database files (Read/Write operations, Heavy I/O)
         o    Redo Logs (Sequential Writes, Read Only for recovery)
         o    Archive Logs (Copy only, no access after copy)
         o    Control Files (Periodic update, read only at start up)

In general, we have multiple files that need to be placed on smaller number of available disk drives. It pays
to put the following files on a physical disk separate from other database files:

         o    Rollback segments
         o    Redo Logs
         o    Archive Logs

It is also important to optimize their physical layout so that contention is minimized between the ARCH,
DBWR, and LGWR processes. It is also useful for I/O balancing that we put index and table files on
separate devices.

We also need to pinpoint which database files are accessed most heavily and spread them across disks to
balance the load. We can issue the following SQL to determine the I/O load on the database files. We need
to take readings during low activity, medium activity and high activity time slices to arrive at a more
realistic picture.

select d.name, f.phyrds, f.phywrts
from
v$datafile d,
v$filestat f
where d.file# = f.file#

Apart from examining the possibility of moving a data file from one drive to another, Oracle also gives the
option of stripping a data file into multiple smaller files. Striping enables the database to distribute its data
across the disks, and it speeds I/O access by minimizing contention against disk drives.

   Rollback Segment Contention

A transaction that physically changes data – INSERT, UPDATE, DELETE SQL statement, produces
information that Oracle writes to its online rollback segments. As Oracle attempts to provide data
consistency when a query is issued, SELECT statements use rollback segments when they access data
(Read and write on the same partition also thus cause contention). When a query is issued, if a row has
changed but not committed, Oracle returns information from the rollback segments to provide read


                                                                                               Page 19 of 49
consistency. Rollback segments are also used when an instance is forced down or ended with an abnormal
termination.

Rollback segment contention can occur whenever a transaction accesses a block within a rollback segment
that another rollback segment needs. We can use the query below to determine the amount of contention
being experienced within the rollback segments.

select r.name, s.gets, s.waits
from v$rollstat s, v$rollname r
where s.usn=r.usn

The following ratio compares how often a rollback segment was accessed with how often the database
waited to access information with a rollback segment:

Ratio = (waits/gets) * 100

If the ratio >=2, there is contention within the rollback segments. To alleviate this, we may consider
increasing the number of rollback segments.

The size of rollback segment per transaction depends upon the nature of transaction (large volume inserts/
deletes without intermediate commits are quite taxing on the rollback segment usage).

Like tables, rollback segments are constrained by the maximum extent size to which they can grow and by
the amount of physical space available in the table space. Once these limits are reached, the database does
not use a new rollback segment. Therefore, if a rollback segment or its table space is sized incorrectly, it is
possible that the amount of rollback space needed will exceed the total size of the rollback segment.

It is useful to spread rollback segment across as many drives as possible.

   Redo Log Contention

A buffer cache area is in the SGA for redo log information. This information is stored in memory and
regulated through the use of two latches or RAM-level locks. The redo allocation latch controls the
allocation of space for writing redo log information to the buffer. The redo copy latch is used to copy
information to the buffer.

The wait latch requests wait to make a request, sleep, and then make a request again until it acquires the
latch. Conversely, the immediate latch requests do not wait; instead they continue processing. We can use
the following query to determine the status of both types of latches:

select name, gets, misses, sleeps, immediate_gets, immediate_misses
from v$latch
where name in ('redo allocation', 'redo copy')

Immediate contention = (immediate_misses / (immediate_gets + immediate_misses))*100

Wait contention = (misses / (gets + misses))* 100

If either value is greater than 1, contention is happening for that latch. To alleviate contention for a redo
allocation latch, reduce the amount of time that a latch is held by a process by lowering the value of the
LOG_SMALL_ENTRY_MAX_SIZE parameter in the init.ora file. To alleviate contention for a redo copy
latch, increase the number of latches by raising the value of the LOG_SIMULTANEOUS_COPIES
parameter in the init.ora file.




                                                                                             Page 20 of 49
2.3.2 Space Management
The goal of optimal space management should be to curb external and internal fragmentation. Proper
allocation of tablespaces to tables and indexes is achieved by following the guidelines mentioned below:

2.3.2.1 Categorization of tablespaces – Small extent, Medium extent, Large extent.

We can follow the policy of uniform extent management for databases that have large volume batched
inserts. This is true for datawarehouses/ datamarts. Adhoc inserts are usually found in OLTP databases.

To follow uniform extent management, the table space storage clause should be:

TABLESPACE <Table Space Name>
    NOLOGGING
    PCTFREE 10
    PCTUSED 40
    INITRANS 1 -- Visit this parameter to have adequate concurrency in
inserts by partitioned informatica mapping.
    MAXTRANS 255
  STORAGE
  (
     INITIAL 10M
     NEXT 10M
     PCTINCREASE 0 --- Indicates uniform extent management
     MINEXTENTS 1
     MAXEXTENTS 2147483645
  )

For static tables, it is useful to have small initial extent size (1K). For medium volume tables (100,000 –
500,000 records), we can have initial extent size of 1M and for large volume tables (> 1 million), it is better
to go for initial extent size of 10M. It will also be useful to have separate table spaces based on the above
mentioned table volume classification.

It is also desirable to follow consistent extent definition for a table space. If recovery is not a very
important goal, then it is better to go for NOLOGGING option.

If the data object undergoes frequent inserts/deletes, then we can have appropriate maintenance routines to
rebuild indexes and coalesce extents in the data tablespace.

Desired goal – Achieve 5000-6000 rows/sec write throughput on moderately constrained table per
Informatica partition; 2000-2500 rows/sec for highly constrained table (indexes and data validation
constraints).

2.3.2.2 Organization of tablespaces – multiple data files on different drives.

It is possible to achieve 5000-6000 rows/sec write throughput on moderately constrained table and 2000-
2500 rows/sec for highly constrained table. If not so, then evaluate the following symptoms:

If more than 100,000 concurrent reads/writes are there on constituent data files

select d.name, f.phyrds, f.phywrts
from
v$datafile d,


                                                                                             Page 21 of 49
v$filestat f
where d.file# = f.file#
and phyrds> 100000 and phywrts > 100000

then introduce extra data files for the table spaces in question on different drives to curb I/O pending
reads/writes.

2.3.2.3 Table space fragmentation – Internal / External

Physical reads and writes to fragmented table spaces result in additional rotational latency delays at the disk
drive. This delay become pronounced especially in case of large volume reads and writes to the database.

If number of extents                   is    approaching         Maxextents,         it     is    time      to
defragment the table.

TABLE Fragmentation

select a.owner, segment_name, segment_type,
     sum(bytes), max_extents,   count(*)
from    dba_extents a, dba_tables b
where segment_name = b.table_name
having count(*) > 3
group by a.owner, segment_name, segment_type, max_extents
order by a.owner, segment_name, segment_type, max_extents

INDEX FRAGMENTATION

select a.owner,     segment_name, segment_type,
       sum(bytes), max_extents,     count(*)
from   dba_extents a, dba_indexes b
where segment_name = index_name
having count(*) > 3
group by a.owner, segment_name, segment_type, max_extents
order by a.owner, segment_name, segment_type, max_extents

2.3.3 SQL Reuse Opportunities
We can identify similar statements that prevent SQL statements from being reused in the shared pool
because of differences in case and/or spacing. This will allow the Oracle server to maintain only one copy
of a distinct SQL statement within the cache to maximize memory and minimize redundant parsing and
validating.


2.3.4 Optimal Data Access with suitable object placement
An access method is a strategy used to retrieve data from a database in the most optimal way. We can
perform access method tuning in the following ways:

   Perform comprehensive index evaluation on tables referenced by worst performing
    SQL statements

We can analyze the poor performing SQL statement’s ratio of physical reads per execution. This points to
the possibility of fragmentation of the underlying indexes used in the execution plan.




                                                                                             Page 22 of 49
   Identify scope for creating additional indexes, drop sparsely used indexes

We should create indexes on tables with option to specify its usage. This allows us to know indexes that are
heavily used and those that are sparsely used. It allows us to evaluate our existing indexing strategy.

   Reorganization of database objects

Over time, database performance can be affected by problems such as row chaining and/or migration and
index stagnation. This can affect data access and can affect performance. We can reorganize

   Specific schema objects
   Reorganize an entire table space
   Repair migrated rows

We will need to resort to this option (maintenance intensive) if there we encounter the following situations:

   Excessive row migration

If an UPDATE statement increases the amount of data in a row so that the row no longer fits in its data
blocks, the Oracle Server tries to find another block with enough free space to hold the entire row. If such a
block is available, the Oracle Server migrates the data for the entire row to the new data. A forwarding
address is left at the original row’s block. This is called migrating a row; the rowid of a migrated row does
not change.

The effect of row fragmentation on performance is that each time a migrated row is needed for a query, an
additional I/O call is required for the row piece that is not stored in the original block. It is possible to
minimize unnecessary row fragmentation for a table by correctly setting the table’s PCTFREE value. Being
too pessimistic on PCTFREE can waste some space but will more or less guarantee no row migration while
being too optimistic can result in row migration and reduced performance.

Reorganizing the table or table partition will fix migrated rows.

   Disorganized indexes

As indexed values are inserted or deleted, the amount of unusable space within the index increases.
Because the index contains a mixture of data and empty areas, scans of the index are less efficient.
Rebuilding the index will eliminate unused index space.

   Inefficient extent sizes

In order to avoid unnecessary I/O for objects that are fully scanned, a segment’s extent size should be a
multiple of the multi-block I/O size which is defined as DB_FILE_MULTIBLOCK_READ_COUNT *
DB_BLOCK_SIZE. If unnecessary I/O is being performed, we can consider rebuilding the object
specifying an INITIAL and NEXT value that is a multiple of the multi-block I/O size.

   Overextended object

The MAXEXTENTS storage parameter specifies the maximum number of extents that can be allocated to
the segment. Once a segment has filled the maximum number of extents, any row insertion will fail. If an
object is approaching its maximum extents setting, we can consider increasing the value of the segment's
MAXEXTENTS storage parameter, or rebuilding the segment with a larger extent size. Ensure that the
extents within the segment are the same size by specifying storage parameters where the INITIAL and
NEXT values are the same and PCTINCREASE=0.




                                                                                            Page 23 of 49
2.3.5 Query Tuning -- Concepts
We will assume Cost Based Optimizer (CBO) in force. This is so because RBO is expected to be
decommissioned in future Oracle releases. Before, we go about tuning SQL queries; it will be useful to
have adequate background in the following concepts:

         Optimizer Approach / Goal
         Full Table Scans Vs Range Scans
         Index Scans
         Join Types
         Stats Collection


2.3.5.1 Optimizer Approach / Goal

The execution plan produced by the optimizer can vary depending on the optimizer’s goal. Optimizing for
best throughput is more likely to result in a full table scan rather than an index scan, or a sort merge join
rather than a nested loops join. Optimizing for best response time, however, more likely results in an index
scan or a nested loops join.

Based on application needs, we choose the optimizer approach/ goal. The following guidelines may be
useful:

   For applications performed in batch, such as Oracle Reports/ETL applications, we need to optimize for
    best throughput. Usually, throughput is more important in batch applications, because the user
    initiating the application is only concerned with the time necessary for the application to complete.
    Response time is less important, because the user does not examine the results of individual statements
    while the application is running.

   For interactive applications, such as Oracle Forms applications or SQL*Plus queries, we need to
    optimize for best response time. Usually, response time is important in interactive applications,
    because the interactive user is waiting to see the first row or first few rows accessed by the statement.

The optimizer’s behavior when choosing an optimization approach and goal for a SQL statement is affected
by the following factors:

         o   OPTIMIZER_MODE Initialization Parameter
         o   OPTIMIZER_GOAL Parameter of the ALTER SESSION Statement
         o   CBO Statistics in the Data Dictionary
         o   Optimizer SQL Hints

The details are tabulated below:




                                                                                           Page 24 of 49
              Optimization Parameter                                         Optimization Behavior
                                                           CHOOSE - The optimizer chooses between a cost-based
                                                           approach and a rule-based approach based on whether statistics
                                                           are available.

                                                           If the data dictionary contains statistics for at least one of the
                                                           accessed tables, then the optimizer uses a cost-based approach
                                                           and optimizes with a goal of best throughput.

                                                           If the data dictionary contains only some statistics, then the cost-
                                                           based approach is used, and the
                                                           optimizer must guess the statistics for the subjects without any
                                                           statistics. This can result in suboptimal execution plans.

                                                           If the data dictionary contains no statistics for any of the accessed
                                                           tables, then the optimizer uses a rule-based approach. This is the
                                                           default value.

                                                           ALL_ROWS - The optimizer uses a cost-based approach for all
                                                           SQL statements in the session regardless of the presence of
                                                           statistics and optimizes with a goal of best throughput (minimum
OPTIMIZER_MODE Initialization Parameter –                  resource use to complete the entire statement).

                                                           FIRST_ROWS_n- The optimizer uses a cost-based approach,
Establishes the default behavior for choosing an
                                                           regardless of the presence of statistics, and optimizes with a goal
optimization approach for the instance.                    of best response time to return first n number of rows. N can
                                                           equal 1, 10, 100, or 1000.

                                                           FIRST_ROWS - The optimizer uses a mix of costs and
                                                           heuristics to find a best plan for fast delivery of the first few
                                                           rows.

                                                           Note: The heuristic sometimes leads the CBO to generate a plan
                                                           whose cost is significantly larger than the cost of a plan without
                                                           applying the heuristic. FIRST_ROWS is available for backward
                                                           compatibility and plan stability.
                                                           RULE - The optimizer chooses a rule-based approach for all
                                                           SQL statements regardless of the presence of statistics.



                                                           The OPTIMIZER_GOAL parameter of the ALTER SESSION
                                                           statement can override the optimizer approach and goal
                                                           established by the OPTIMIZER_MODE initialization parameter
                                                           for an individual session.
OPTIMIZER_GOAL            Parameter    of   the    ALTER
SESSION Statement                                          The value of this parameter affects the optimization of SQL
                                                           statements issued by the user, including those issued by stored
                                                           procedures and functions called during the session. It does not
                                                           affect the optimization of internal recursive SQL statements that
                                                           Oracle issues during the session for operations such as space
                                                           management and data dictionary operations.

                                                           The OPTIMIZER_GOAL parameter can have the same values as
                                                           the OPTIMIZER_MODE initialization parameter.



                                                           A FIRST_ROWS(n), FIRST_ROWS, ALL_ROWS, CHOOSE, or
                                                           RULE hint in an individual SQL statement can override the
Optimizer SQL Hints                                        effects of both the OPTIMIZER_MODE initialization parameter
                                                           and the OPTIMIZER_GOAL parameter of the ALTER
                                                           SESSION statement.
                                                           By default, the cost-based approach optimizes for best
                                                           throughput.




                                                                                                         Page 25 of 49
                                                        We can change the goal of the CBO in the following ways:

                                                        o     To change the goal of the CBO for all SQL statements in
                                                              the session, issue an

                                                              ALTER SESSION SET OPTIMIZER_MODE statement
                                                              with the ALL_ROWS, FIRST_ROWS, or FIRST_ROWS_
                                                              n (where n = 1, 10, 100, or 1000) clause.

                                                        o     To specify the goal of the CBO for an individual SQL
                                                              statement, use the ALL_ROWS, FIRST_ROWS( n)
                                                              (where n = any positive integer), or FIRST_ROWS hint.



                                                        The statistics used by the CBO are stored in the data dictionary.
                                                        We can collect exact or estimated statistics about physical
                                                        storage characteristics and data distribution in these schema
                                                        objects by using the DBMS_STATS package or the ANALYZE
                                                        statement.

                                                        To maintain the effectiveness of the CBO, we must have statistics
                                                        that are representative of the data. It is possible to gather statistics
                                                        on objects using either of the following:

                                                        o     Starting with Oracle8i, use the DBMS_STATS package.

                                                        o     For releases prior to Oracle8i, use the ANALYZE
                                                              statement.
CBO Statistics in the Data Dictionary
                                                        For table columns that contain skewed data (in other words,
                                                        values with large variations in number of duplicates), we should
                                                        collect histograms.

                                                        The resulting statistics provide the CBO with information about
                                                        data uniqueness and distribution. Using this information, the
                                                        CBO is able to compute plan costs with a high degree of
                                                        accuracy. This enables the CBO to choose the best execution
                                                        plan based on the least cost.




Notes:

   Cost - The cost represents units of work or resource used. The CBO uses disk I/O,CPU usage, and
    memory usage as units of work. So, the cost used by the CBO represents an estimate of the number of
    disk I/Os and the amount of CPU and memory used in performing an operation. The operation can be
    scanning a table, accessing rows from a table using an index, joining two tables together, or sorting a
    row set. The cost of a query plan is the number of work units that are expected to be incurred when the
    query is executed and its result produced. Please note that the query plan produced by a CBO is a
    function of cost (prevailing computation load on the server), so plans may differ. We can make plans
    stable by storing the plan outline.

   Access Path - The access path represents the number of units of work done in accessing data from a
    base table. The access path can be a table scan, a fast full index scan, or an index scan. During table
    scan or fast full index scan, multiple blocks are read from the disk in a single I/O operation. Therefore,
    the cost of a table scan or a fast full index scan depends on the number of blocks to scan and the
    multiblock read count value. The cost for an index scan depends on the levels in the B-tree, the number
    of index leaf blocks to scan, and the number of rows to fetch using the rowid in the index keys. The
    cost to fetch rows using rowids depends on the index-clustering factor. Although the clustering factor
    is a property of the index, the clustering factor actually relates to the spread of similar indexed column
    values within data blocks in the table. A lower clustering factor indicates that the individual rows are
    concentrated within fewer blocks in the table. Conversely, a high clustering factor indicates that the



                                                                                                        Page 26 of 49
    individual rows are scattered more randomly across blocks in the table. Therefore, a high clustering
    factor means that it costs more to fetch rows by rowid using a range scan, because more blocks in the
    table need to be visited to return the data.

2.3.5.2 Full Table Scans Vs Range Scans

Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in the
table. This is because full table scans can use larger I/O calls, and fewer large I/O calls are cheaper than
many small I/O calls.

The elapsed time for an I/O call consists of two components:

    o    Call setup time (head seek time, rotational latency)
    o    Data transfer time (this is typically 10 MB/second or better)

In a typical I/O operation, setup costs consume most of the time. Data transfer time for an eight K buffer is
less than one ms (out of the total time of ten ms). This means that we can transfer 128KB in about 20 ms
with a single 128 KB call, opposed to 160 ms with 16 eight KB calls.

Example:

Suppose we are accessing 20% of the blocks of a 10,000 block table.

DB_FILE_MULTIBLOCK_READ_COUNT = 16
DB_BLOCK_SIZE =8k

Number of 8K I/O calls required for index access = 2,000(table) + index
Number of 128K I/O calls required for full table scan = 10,000/16 = 625

So, we observe that full table scan in this case is faster than index range scan.




                                                                                            Page 27 of 49
2.3.5.3 Index Scan Types

An index scan retrieves data from an index based on the value of one or more columns of the index. To
perform an index scan, Oracle searches the index for the indexed column values accessed by the statement.
If the statement accesses only columns of the index, then Oracle reads the indexed column values directly
from the index, rather than from the table.

The index contains not only the indexed value, but also the rowids of rows in the table having that value.
Therefore, if the statement accesses other columns in addition to the indexed columns, then Oracle can find
the rows in the table with a table access by rowid or a cluster scan. Various Index scan types are
enumerated in a table below:

    Index Scan Type                                              Explanation

Index Unique Scans           This returns, at most, a single rowid. Oracle performs a unique scan if there is a
                             UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses
                             only a single row.

                             SELECT STATEMENT
                             TABLE ACCESS BY ROWID OF SO_HEADERS_ALL
                             INDEX UNIQUE SCAN OF SO_HEADERS_U2

                             Index unique scan is used when all columns of a unique (B-tree) index are specified
                             with equality conditions.

Index Range Scans            Index range scan is a common operation for accessing selective data. It can be
                             bounded (bounded on both sides) or unbounded (on one or both sides). Data is
                             returned in the ascending order of index columns. Multiple rows with identical
                             values are sorted in ascending order by the ROWIDs. If data is required to be sorted by
                             order, then we should use the ORDER BY clause, and do not rely on an index. If an
                             index can be used to satisfy an ORDER BY clause, then the optimizer uses this and
                             avoids a sort.

                             SELECT STATEMENT
                             TABLE ACCESS BY ROWID OF SO_HEADERS_ALL
                             INDEX RANGE SCAN OF SO_HEADERS_N5

                             The optimizer uses a range scan when the optimizer finds one or more leading
                             columns of an index in conditions, such as the following:
                             col1 = :b1
                             col1 < :b1
                             col1 > :b1

                             The condition col1 like ‘%ASD’ does not result in a range scan. Range scans can
                             use unique or nonunique indexes. Range scans avoid sorting when index columns
                             constitute the ORDER BY/GROUP BY clause.


Index Range Scan DESC        Index range scan descending is identical to index range scan, except that the data is
                             returned in a descending order. (Indexes, by default, are stored in ascending order.)
                             Usually, this is used when ordering data in a descending order to return the most recent
                             data first, or when seeking a value less than some value.

                             SELECT STATEMENT
                             TABLE ACCESS BY ROWID OF SO_HEADERS_ALL
                             INDEX RANGE SCAN DESCENDING SO_HEADERS_U2

                             The optimizer uses index range scan descending when an order by descending clause
                             can be satisfied by an index.




                                                                                                  Page 28 of 49
Index Skip Scans        Index skip scans improve index scans by nonprefix columns. Often, it is faster to scan
                        index blocks than it is to scan table data blocks.

                        Skip scanning lets a composite index be logically split into smaller subindexes.
                        For example, table emp (sex, empno, address) with a composite index on (sex,
                        empno).

                        The number of logical subindexes is determined by the number of distinct values in the
                        initial column. Skip scanning is useful when the initial column of the composite index
                        is not specified in the query. In other words, it is "skipped."


Full Scans              This is available if a predicate references one of the columns in the index. The
                        predicate does not need to be an index driver. Full scan is also available when there is
                        no predicate, if all of the columns in the table referenced in the query are included in
                        the index and at least one of the index columns is not null. This can be used to
                        eliminate a sort operation, because the data is ordered by the index key. It reads the
                        blocks singly.


Fast Full Index Scans   Fast full index scans are an alternative to a full table scan when the index contains all
                        the columns that are needed for the query, and at least one column in the index key has
                        the NOT NULL constraint. Fast full scan accesses the data in the index itself,without
                        accessing the table. It cannot be used to eliminate a sort operation, because the data is
                        not ordered by the index key. It reads the entire index using multiblock reads (unlike a
                        full index scan) and can be parallelized.

                        Fast full scan is available only with the CBO. We can specify it with the initialization
                        parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_FFS hint.

                        Fast full index scans cannot be performed against bitmap indexes.

                        A fast full scan is faster than a normal full index scan in that it can use multiblock I/O
                        and can be parallelized just like a table scan. The following query and plan illustrate
                        this feature.

                        SELECT COUNT(*)
                        FROM t1, t2
                        WHERE t1.c1 > 50
                        AND t1.c2 = t2.c1

                        Plan
                        ------
                        SELECT STATEMENT
                        SORT AGGREGATE
                        HASH JOIN
                        TABLE ACCESS t1 FULL
                        INDEX t2_c1_idx FAST FULL SCAN

                        Because index t2_c1_idx contains all columns needed from table t2, the
                        optimizer uses a fast full index scan on that index.


Index Joins             This is a hash join of several indexes that together contain all the columns from the
                        table that are referenced in the query. If an index join is used, then no table access is
                        needed, because all the relevant column values can be retrieved from the indexes.

                        An index join cannot be used to eliminate a sort operation. Index join is available
                        only with the CBO.




                                                                                              Page 29 of 49
                                The following statement uses an index join to access the empno and sal columns,
                                both of which are indexed, in the emp table:

                                SELECT empno, sal
                                FROM emp
                                WHERE sal > 2000;

                                Plan
                                ------------------------------
                                OPERATION OPTIONS OBJECT_NAME
                                ------------------------------
                                SELECT STATEMENT
                                VIEW index$_join$_001
                                HASH JOIN
                                INDEX RANGE SCAN EMP_SAL
                                INDEX FAST FULL SCAN EMP_EMPNO

Bitmap Joins                    This uses a bitmap for key values and a mapping function that converts each bit
                                position to a rowid. Bitmaps can efficiently merge indexes that correspond to several
                                conditions in a WHERE clause, using Boolean operations to resolve AND and OR
                                conditions.
                                Bitmap access is available only with the CBO (Oracle 9i + Enterprise Edition).


2.3.3.5 Join Types

Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM
clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE
clause.

        Join Type                                                    Explanation
                                Nested loop (NL) joins are useful for joining small subsets of data and if the join
Nested Loop Joins               condition is an efficient way of accessing the second table.

                                It is very important to ensure that the inner table is driven from the outer table. If the
                                inner table’s access path is independent of the outer table, then the same rows are
                                retrieved for every iteration of the outer loop. This can degrade performance
                                considerably. In such cases, hash joins joining the two independent row sources
                                perform better.

                                NESTED LOOP
                                <Outer Loop>
                                <Inner Loop>

                                SELECT STATEMENT
                                NESTED LOOPS
                                TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL
                                INDEX RANGE SCAN SO_HEADERS_N1
                                TABLE ACCESS BY INDEX ROWID SO_LINES_ALL
                                INDEX RANGE SCAN SO_LINES_N1

                                The optimizer uses NL joins when joining small number of rows with a good driving
                                condition between the two tables.

                                This operation is used when an outer join is used between two tables. The outer join
Nested Loop Outer Joins         returns the outer (preserved) table rows, even when there are no corresponding rows in
                                the inner (optional) table.

                                In a regular outer join, the optimizer chooses the order of tables (driving and driven)
                                based on the cost. However, in an outer join, the order of tables is determined by the
                                join condition. The outer table (whose rows are being preserved) is used to drive to
                                the inner table.




                                                                                                      Page 30 of 49
                   SELECT STATEMENT
                   SORT GROUP BY
                   NESTED LOOPS OUTER
                   TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS
                   INDEX RANGE SCAN RA_CUSTOMERS_N2
                   INDEX RANGE SCAN SO_HEADERS_N1

                   The optimizer uses nested loop joins to process an outer join if the following are true:

                        o    It is possible to drive from the outer table to inner table.
                        o    Data volume is low enough to make nested loop method efficient.

                   Hash joins are used for joining large data sets. The optimizer uses the smaller of the
Hash Joins         two tables/data sources to build a hash table on the join key in memory. It then scans
                   the larger table, probing the hash table to find the joined rows.

                   This is best when the smaller table is small enough to fit in available memory. The
                   cost is then limited to a single read pass over the data for the two tables. However, if
                   the hash table grows too big to fit into the memory, then the optimizer breaks it up into
                   different partitions. As the partitions exceed allocated memory, parts are written to
                   disk to temporary segments.

                   After the hash table is complete:

                        o    Table 2 is scanned.
                        o    It is broken up into partitions like table 1.
                        o    Partitions are written to disk.

                   When the hash table build completes, it is possible that an entire hash table partition is
                   resident in memory. Then, we do not need to build the corresponding partition for the
                   second table. When table 2 is scanned, rows that hash to the resident hash table
                   partition can be joined and returned immediately.

                   Now, each hash table partition is read into memory:

                        o    The corresponding table 2 partition is scanned.
                        o    The hash table is probed to return the joined rows.

                   This is repeated for the rest of the partitions. The cost can increase to two read passes
                   over the data and one write pass over the data. There is also the possibility that if the
                   hash table does not fit in the memory, then parts of it must be swapped in and out,
                   depending on the rows retrieved from the second table. Performance for this scenario
                   can be extremely poor.
                   The optimizer uses a hash join to join two tables if they are joined using an equijoin
                   and if either of the following are true:

                        o    A large amount of data needs to be joined.
                        o    A large fraction of the table needs to be joined.

                   SELECT STATEMENT
                   HASH JOIN
                   TABLE ACCESS FULL SO_HEADERS_ALL
                   TABLE ACCESS FULL SO_LINES_ALL

                   We can use the USE_HASH hint to advise the optimizer to use a hash join when
                   joining two tables together. We can investigate the values for the parameters
                   HASH_AREA_SIZE and HASH_JOIN_ENABLED if we are having trouble getting
                   the optimizer to use hash joins.


Hash Outer Joins   This operation is used for outer joins where the optimizer decides that the amount of
                   data is large enough to warrant an hash join, or it is unable to drive from the outer
                   table to the inner table.



                                                                                         Page 31 of 49
                         Like an outer join, the order of tables is not determined by the cost, but by the join
                         condition. The outer table (whose rows are being preserved) is used to build the hash
                         table, and the inner table is used to probe the hash table.

                         SELECT STATEMENT
                         FILTER
                         HASH JOIN OUTER
                         TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS
                         INDEX RANGE SCAN RA_CUSTOMERS_N2
                         INDEX FAST FULL SCAN SO_HEADERS_N1

                         The optimizer uses hash joins for processing an outer join if the data volume is high
                         enough to make hash join method efficient or if it is not possible to drive from the
                         outer table to inner table.

Sort Merge Joins         Sort merge joins can be used to join rows from two independent sources. Hash joins
                         generally perform better than sort merge joins. However, sort merge joins can perform
                         better than hash joins if the row sources are sorted already, and if a sort operation does
                         not have to be done.

                         However, if this involves choosing a slower access method (index scan vs. full table
                         scan), then the benefit of using a sort merge might be lost.

                         Sort merge joins are useful when the join condition between two tables is an inequality
                         condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better
                         than nested loop joins for large data sets. (We cannot use hash joins unless there is an
                         equality condition).

                         In a merge join, there is no concept of a driving table.

                              o    Both the inputs are sorted on the join key - sort join operation
                              o    The sorted lists are merged together - merge join operation

                         If the input is sorted already by the join column, then there is not a sort join operation
                         for that row source.

                         SELECT STATEMENT
                         SORT GROUP BY
                         MERGE JOIN
                         SORT JOIN
                         TABLE ACCESS FULL SO_LINES_ALL
                         FILTER
                         SORT JOIN
                         TABLE ACCESS FULL SO_LINES_ALL

                         The optimizer can choose sort merge join over hash join for joining large amounts of
                         data if any of the following are true:

                              o    The join condition between two tables is not an equi-join
                              o    OPTIMIZER_MODE is set to RULE
                              o    HASH_JOIN_ENABLED is false
                              o    Because of sorts already required by other operations, the optimizer
                                   finds it is cheaper to use sort merge over hash join
                              o    The optimizer thinks that the cost of hash join is higher based on the
                                   settings of HASH_AREA_SIZE and SORT_AREA_SIZE

                         We can use the USE_MERGE hint to advise the optimizer to use a merge join when
                         joining the two tables together.

                         The optimizer uses sort merge for an outer join if a nested loop join is inefficient. A
Sort Merge Outer Joins   nested loop join can be inefficient because of data volumes, or because of sorts already
                         required by other operations, the optimizer finds it is cheaper to use a sort merge over
                         a hash join.



                                                                                               Page 32 of 49
                              SELECT STATEMENT
                              SORT GROUP BY NOSORT
                              MERGE JOIN OUTER
                              SORT JOIN
                              INDEX FAST FULL SCAN MTL_SYSTEM_ITEMS_U1
                              SORT JOIN
                              TABLE ACCESS FULL SO_LINES_ALL
                              A Cartesian join happens when one or more of the tables does not have any join
Cartesian Joins               conditions to any other tables in the statement. The optimizer joins every row from one
                              data source with every row from the other data source (Cartesian product of the two
                              sets).

                              A full outer join acts like a combination of the left and right outer joins. In addition to
Full Outer Joins              the inner join, rows from both tables that have not been returned in the result of the
                              inner join are preserved and extended with nulls. In other words, full outer joins let us
                              join tables together, yet still show rows which do not have corresponding rows in
                              tables joined-to.



2.3.4.5 Stats Collection

The cost-based optimization approach uses these statistics to calculate the selectivity of predicates and to
estimate the cost of each execution plan. Selectivity is the fraction of rows in a table that the SQL
statement’s predicate chooses. The optimizer uses the selectivity of a predicate to estimate the cost of a
particular access method and to determine the optimal join order.

The statistics are stored in the data dictionary, and they can be exported from one database and imported
into another.

For example, we can transfer statistics to a test system to simulate the production environment.

Statistics should be generated with the DBMS_STATS package.

The statistics generated include the following:

        Table statistics

             o     Number of rows
             o     Number of blocks
             o     Average row length

        Column statistics

             o     Number of distinct values (NDV) in column
             o     Number of nulls in column
             o     Data distribution (histogram)

        Index statistics

             o     Number of leaf blocks
             o     Levels
             o     Clustering factor
        System statistics

              o    I/O performance and utilization
              o    CPU performance and utilization



                                                                                                     Page 33 of 49
Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering
procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving
necessary statistical accuracy.

For example, to collect table and column statistics for all tables in OE schema with auto-sampling:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('OE',DBMS_STATS.AUTO_SAMPLE_SIZE);

The statistics gathering procedures in the DBMS_STATS package is enumerated below:

                Procedure                                                   Description
GATHER_INDEX_STATS                             Collects index statistics.

GATHER_TABLE_STATS                             Collects table, column, and index statistics.

GATHER_SCHEMA_STATS                            Collects statistics for all objects in a schema.

GATHER_DATABASE_STATS                          Collects statistics for all objects in a database.

GATHER_SYSTEM_STATS                            Collects CPU and I/O statistics for the system.

DBMS_STATS.EXPORT_SCHEMA_STATS                 To save the new statistics in a different statistics table or a statistics
                                               table with a different statistics identifier.

DBMS_STATS.IMPORT_SCHEMA_STATS                 To restore the old statistics. The application is now ready to run
                                               again.




2.3.6 Query Tuning - Steps
As an application designer, we have more information about the underlying data than the optimizer. For
example, we might know that a certain index is more selective for certain queries. Based on this
information, we can choose a more efficient execution plan than the optimizer. We use hints to force the
optimizer to use the optimal execution plan.

2.3.6.1 Using Hints – Overview

We can use hints to specify the following:

         The optimization approach for a SQL statement
         The goal of the cost-based optimizer for a SQL statement
         The access path for a table accessed by the statement
         The join order for a join statement
         A join operation in a join statement

Hints provide a mechanism to direct the optimizer to choose a certain query execution plan based on the
following criteria:

         Join order
         Join method
         Access path
         Parallelization




                                                                                                    Page 34 of 49
Hints apply only to the optimization of the statement block in which they appear. A statement block is any
one of the following statements or parts of statements:

         A simple SELECT, UPDATE, or DELETE statement.
         A parent statement or subquery of a complex statement.
         A part of a compound query.

For example, a compound query consisting of two component queries combined by the UNION operator
has two statement blocks, one for each component query. For this reason, hints in the first component query
apply only to its optimization, not to the optimization of the second component query.

{DELETE|INSERT|SELECT|UPDATE} /*+ hint [text] [hint[text]]... */

An example of usage of hint may be handy at this instance:

SELECT /*+ ORDERED INDEX (b, jl_br_balances_n1) USE_NL (j b)
USE_NL (glcc glf) USE_MERGE (gp gsb) */

2.3.6.2 Optimizer Approach / Goal Hints

If a SQL statement has a hint specifying an optimization approach and goal, then the optimizer uses the
specified approach regardless of the presence or absence of statistics, the value of the OPTIMIZER_MODE
initialization parameter, and the OPTIMIZER_MODE parameter of the ALTER SESSION statement.


          Hint                                                    Explanation
                            The ALL_ROWS hint explicitly chooses the cost-based approach to optimize a
ALL_ROWS                    statement block with a goal of best throughput (that is, minimum total resource
                            consumption).
                            The hints FIRST_ROWS( n) (where n is any positive integer) or FIRST_ROWS
                            instruct Oracle to optimize an individual SQL statement for fast response.
                            FIRST_ROWS(n) affords greater precision, because it instructs Oracle to choose the
                            plan that returns the first n rows most efficiently. The FIRST_ROWS hint, which
                            optimizes for the best plan to return the first single row, is retained for backward
FIRST_ROWS(N)               compatibility and plan stability.

                            The optimizer ignores this hint in DELETE and UPDATE statement blocks
                            and in SELECT statement blocks that contain any of the following syntax:

                                 o    Set operators (UNION, INTERSECT, MINUS, UNION ALL)
                                 o    GROUP BY clause
                                 o    FOR UPDATE clause
                                 o    Aggregate functions
                                 o    DISTINCT operator

CHOOSE                      The CHOOSE hint causes the optimizer to choose between the rule-based and cost-
                            based approaches for a SQL statement. The optimizer bases its selection on the presence
                            of statistics for the tables accessed by the statement. If the data dictionary has statistics
                            for at least one of these tables, then the optimizer uses the cost-based approach and
                            optimizes with the goal of best throughput. If the data dictionary does not have statistics
                            for these tables, then it uses the rule-based approach.




                                                                                                     Page 35 of 49
2.3.6.3 Hints for Access Paths

Specifying one of these hints causes the optimizer to choose the specified access path only if the access
path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL
statement. If a hint specifies an unavailable access path, then the optimizer ignores it. We also need to
specify the table as it appears in the statement.

          Hint                                                    Explanation
                           The full hint explicitly chooses a full table scan.
FULL
                           select /*+ FULL (pg) */ * from table pg
                           where pg_postdate < '20041001'

ROWID                      The ROWID hint explicitly chooses a table scan by rowid for the specified table.


CLUSTER                    The CLUSTER hint explicitly chooses a cluster scan to access the specified table. It
                           applies only to clustered objects.

HASH                       The HASH hint explicitly chooses a hash scan to access the specified table. It applies
                           only to tables stored in a cluster.

INDEX                      The INDEX hint explicitly chooses an index scan for the specified table. We can use
                           the INDEX hint for domain, B-tree, bitmap, and bitmap join indexes. However,
                           Oracle recommends using INDEX_COMBINE rather than INDEX for bitmap indexes,
                           because it is a more versatile hint.

                           This hint can also optionally specify more than one indexes.

                           o    If this hint specifies a single available index, then the optimizer performs a scan on
                                this index. The optimizer does not consider a full table scan or a scan on another
                                index on the table.
                           o    If this hint specifies a list of available indexes, then the optimizer considers the
                                cost of a scan on each index in the list and then performs the index scan with the
                                lowest cost. The optimizer can also choose to scan multiple indexes from this list
                                and merge the results, if such an access path has the lowest cost. The optimizer
                                does not consider a full table scan or a scan on an index not listed in the hint.
                           o    If this hint specifies no indexes, then the optimizer considers the cost of a scan on
                                each available index on the table and then performs the index scan with the lowest
                                cost. The optimizer can also choose to scan multiple indexes and merge the results,
                                if such an access path has the lowest cost. The optimizer does not consider a full
                                table scan.
                           select /*+ INDEX */ * from table pg
                           where pg_postdate <= '20010101'
INDEX_ASC                  The INDEX_ASC hint explicitly chooses an index scan for the specified table. If the
                           statement uses an index range scan, then Oracle scans the index entries in ascending
                           order of their indexed values.

                           Because Oracle’s default behavior for a range scan is to scan index entries in ascending
                           order of their indexed values, this hint does not specify anything more than the INDEX
                           hint. However, we might want to use the INDEX_ASC hint to specify ascending range
                           scans explicitly should the default behavior change.

INDEX_COMBINE              The INDEX_COMBINE hint explicitly chooses a bitmap access path for the table. If no
                           indexes are given as arguments for the INDEX_COMBINE hint, then the optimizer uses
                           whatever Boolean combination of bitmap indexes has the best-cost estimate for the
                           table. If certain indexes are given as arguments, then the optimizer tries to use some
                           Boolean combination of those particular bitmap indexes.




                                                                                                  Page 36 of 49
                        The INDEX_JOIN hint explicitly instructs the optimizer to use an index join as an
INDEX_JOIN              access path. For the hint to have a positive effect, a sufficiently small number of indexes
                        must exist that contain all the columns required to resolve the query.
                        The INDEX_DESC hint explicitly chooses an index scan for the specified table. If the
INDEX_DESC              statement uses an index range scan, then Oracle scans the index entries in descending
                        order of their indexed values. In a partitioned index, the results are in descending order
                        within each partition.
                        The INDEX_FFS hint causes a fast full index scan to be performed rather than a full
INDEX_FFS               table scan.
                        The NO_INDEX hint explicitly disallows a set of indexes for the specified table.
NO_INDEX
                        The AND_EQUAL hint explicitly chooses an execution plan that uses an access path
AND_EQUAL               that merges the scans on several single-column indexes.

                        select /*+ AND_EQUAL ( pg TABLE_IDX1 TABLE_IDX8 )*/ * from table
                        pg
                        where pg_postdate <= '20010101'

2.3.6.4 Hints for Query Transformations

        Hint                                                     Explanation
USE_CONCAT                     The USE_CONCAT hint forces combined OR conditions in the WHERE clause
                               of a query to be transformed into a compound query using the UNION ALL set
                               operator.

                               Generally, this transformation occurs only if the cost of the query using the
                               concatenations is cheaper than the cost without them.
                               The USE_CONCAT hint turns off IN-list processing and OR-expands all
                               disjunctions, including IN-lists.

                               The NO_EXPAND hint prevents the cost-based optimizer from considering OR-
NO_EXPAND                      expansion for queries having OR conditions or IN-lists in the WHERE clause.
                               Usually, the optimizer considers using OR expansion and uses this method if it
                               decides that the cost is lower than not using it.

                               The REWRITE hint forces the cost-based optimizer to rewrite a query in terms
REWRITE                        of materialized views, when possible, without cost consideration. We can use the
                               REWRITE hint with or without a view list. With a view list, Oracle uses that
                               view regardless of its cost.
NOREWRITE                      The NOREWRITE hint disables query rewrite for the query block, overriding the
                               setting of the parameter QUERY_REWRITE_ENABLED.
                               The MERGE hint allows us to merge a view on a per-query basis. If a view's
MERGE                          query contains a GROUP BY clause or DISTINCT operator in the SELECT
                               list, then the optimizer can merge the view's query into the accessing statement
                               only if complex view merging is enabled.
                               Complex merging can also be used to merge an IN subquery into the accessing
                               statement if the subquery is uncorrelated.

                               Complex merging is not cost-based--that is, the accessing query block must
                               include the MERGE hint.

                               The NO_MERGE hint causes Oracle not to merge merge able views.
NO_MERGE
                               SELECT /*+NO_MERGE(pa_v)*/ cp.col1, cp.col2, cp.col3
                               FROM TABLE cp,
                               (SELECT col1, col2
                               FROM table2 pa
                               WHERE batch_run_id = 310100000) pa_v
                               WHERE
                               cp.batch_run_id = 310100000 and




                                                                                               Page 37 of 49
                                cp.col1=pa_v.col1
                                and
                                cp.col2= pa_v.col2
                                When the NO_MERGE hint is used without an argument, it should be placed in
                                the view query block. When NO_MERGE is used with the view name as an
                                argument, it should be placed in the surrounding query.

STAR_TRANSFORMATION             The STAR_TRANSFORMATION hint makes the optimizer use the best plan in
                                which the transformation has been used. Without the hint, the optimizer could
                                make a cost-based decision to use the best plan generated without the
                                transformation, instead of the best plan for the transformed query.

                                Even if the hint is given, there is no guarantee that the transformation will take
                                place. The optimizer only generates the subqueries if it seems reasonable to do
                                so. If no subqueries are generated, then there is no transformed query, and the
                                best plan for the untransformed query is used, regardless of the hint. This hint
                                will override whatever the value is in STAR_TRANSFORMATION_ENABLED
                                initialization parameter.
FACT                            The FACT hint is used in the context of the star transformation to indicate to the
                                transformation that the hinted table should be considered as a fact table.

NO_FACT                         The NO_FACT hint is used in the context of the star transformation to indicate
                                to the transformation that the hinted table should not be considered as a fact
                                table.


2.3.6.5 Hints for Join Orders

            Hint                                                   Explanation

ORDERED                         The ORDERED hint causes Oracle to join tables in the order in which they
                                appear in the FROM clause. If we omit the ORDERED hint from a SQL
                                statement performing a join, then the optimizer chooses the order in which to
                                join the tables.

                                We might want to use the ORDERED hint to specify a join order as we are more
                                likely to know something about the number of rows selected from each table that
                                the optimizer does not. This can allow us to choose an inner and outer table
                                better than the optimizer could.

                                The statement joins table TAB1 to table TAB2 and then joins the result to table
                                TAB3:

                                SELECT /*+ ORDERED */ tab1.col1, tab2.col2, tab3.col3
                                FROM tab1, tab2, tab3
                                WHERE tab1.col1 = tab2.col1
                                AND tab2.col1 = tab3.col1



STAR                            The STAR hint forces a star query plan to be used, if possible. A star plan has
                                the largest table in the query last in the join order and joins it with a nested loops
                                join on a concatenated index. The STAR hint applies when there are at least
                                three tables, the large table’s concatenated index has at least three columns, and
                                there are no conflicting access or join method hints. The optimizer also considers
                                different permutations of the small tables.




                                                                                                  Page 38 of 49
2.3.6.6 Hints for Join Operations

We must specify a table to be joined exactly as it appears in the statement. If the statement uses an alias for
the table, then we must use the alias rather than the table name in the hint. The table name within the hint
should not include the schema name if the schema name is present in the statement.

Use of the USE_NL and USE_MERGE hints is recommended with the ORDERED hint. Oracle uses these
hints when the referenced table is forced to be the inner table of a join, and they are ignored if the
referenced table is the outer table.

              Hint                                                   Explanation
                                   The USE_NL hint causes Oracle to join each specified table to another row
USE_NL                             source with a nested loops join using the specified table as the inner table.

                                   SELECT /*+ ORDERED USE_NL(customers) */
                                   accounts.balance, customers.last_name, customers.first_name
                                   FROM accounts, customers
                                   WHERE accounts.custno = customers.custno

                                   Here, customers table is the inner table for nested loops operation.
                                   The USE_MERGE hint causes Oracle to join each specified table with another
USE_MERGE                          row source with a sort-merge join.

                                   SELECT /*+USE_MERGE(emp dept)*/ *
                                   FROM emp, dept
                                   WHERE emp.deptno = dept.deptno

                                   The USE_HASH hint causes Oracle to join each specified table with another
USE_HASH                           row source with a hash join.

                                   SELECT /*+use_hash(emp dept)*/ *
                                   FROM emp, dept
                                   WHERE emp.deptno = dept.deptno

                                   The DRIVING_SITE hint forces query execution to be done at a different site
DRIVING_SITE                       than that selected by Oracle. This hint can be used with either rule-based or cost-
                                   based optimization.

                                   SELECT /*+DRIVING_SITE(dept)*/ *
                                   FROM emp, dept@rsite
                                   WHERE emp.deptno = dept.deptno

                                   If this query is executed without the hint, then rows from dept are sent to the
                                   local site, and the join is executed there. With the hint, the rows from emp are
                                   sent to the remote site, and the query is executed there, returning the result to the
                                   local site. This hint is useful if we are using distributed query optimization.

                                   The LEADING hint causes Oracle to use the specified table as the first table in
LEADING                            the join order. If we specify two or more LEADING hints on different tables,
                                   then all of them are ignored. If we specify the ORDERED hint, then it overrides
                                   all LEADING hints.


HASH_AJ, MERGE_AJ &                For a specific query, we can place the MERGE_AJ, HASH_AJ, or NL_AJ hint
NL_AJ                              into the NOT IN subquery. MERGE_AJ uses a sort-merge anti-join, HASH_AJ
                                   uses a hash anti-join, and NL_AJ uses a nested loop anti-join.


HASH_SJ, MERGE_SJ &                For a specific query, we can place the HASH_SJ, MERGE_SJ, or NL_SJ hint
NL_SJ                              into the EXISTS subquery. HASH_SJ uses a hash semi-join, MERGE_SJ
                                   uses a sort merge semi-join, and NL_SJ uses a nested loop semi-join.



                                                                                                    Page 39 of 49
                                SELECT * FROM dept
                                WHERE exists (SELECT /*+HASH_SJ*/ *
                                FROM emp
                                WHERE emp.deptno = dept.deptno
                                AND sal > 200000)




2.3.6.7 Hints for Parallel Operations

            Hint                                                 Explanation

PARALLEL                        The PARALLEL hint allows us to specify the desired number of concurrent
                                servers that can be used for a parallel operation. The hint applies to the INSERT,
                                UPDATE, and DELETE portions of a statement as well as to the table scan
                                portion. If any parallel restrictions are violated, the hint is ignored.

                                SELECT /*+ FULL(po) PARALLEL(po, 5) */ *
                                FROM table po



NOPARALLEL                      The NOPARALLEL hint overrides a PARALLEL specification in the table
                                clause. In general, hints take precedence over table clauses.

                                SELECT /*+ NOPARALLEL(po) */ *
                                FROM table po

                                The PQ_DISTRIBUTE hint improves parallel join operation performance. We
PQ_DISTRIBUTE                   can do this by specifying how rows of joined tables should be distributed among
                                producer and consumer query servers. Using this hint overrides decisions the
                                optimizer would normally make. The optimizer ignores the distribution hint if
                                both tables are serial.

                                The PARALLEL_INDEX hint specifies the desired number of concurrent
PARALLEL_INDEX                  servers that can be used to parallelize index range scans for partitioned indexes.

                                The NOPARALLEL_INDEX hint overrides a PARALLEL attribute setting on
NOPARALLEL_INDEX                an index to avoid a parallel index scan operation.




                                                                                               Page 40 of 49
2.3.6.8 Miscellaneous Hints

            Hint                                               Explanation
                              The APPEND hint allows us to enable direct-path INSERT.In direct-path
APPEND                        INSERT, data is appended to the end of the table, rather than using existing
                              space currently allocated to the table. In addition, direct-path INSERT
                              bypasses the buffer cache and ignores integrity constraints. As a result, direct-
                              path INSERT can be considerably faster than conventional INSERT.

                              The NOAPPEND hint enables conventional INSERT by disabling parallel
NOAPPEND                      mode for the duration of the INSERT statement. (Conventional INSERT is
                              the default in serial mode, and direct-path INSERT is the default in parallel
                              mode).
                              The CACHE hint specifies that the blocks retrieved for the table are placed at
CACHE                         the most recently used end of the LRU list in the buffer cache when a full table
                              scan is performed. This option is useful for small lookup tables.

                              SELECT /*+ FULL (po) CACHE(po) */ *
                              FROM table po

                              The NOCACHE hint specifies that the blocks retrieved for the table are placed
NOCACHE                       at the least recently used end of the LRU list in the buffer cache when a full
                              table scan is performed. This is the normal behavior of blocks in the buffer
                              cache.
                              SELECT /*+ FULL (po) NOCACHE(po) */ *
                              FROM table po

                              Setting the UNNEST_SUBQUERY session parameter to TRUE enables
UNNEST                        subquery unnesting. Subquery unnesting unnests and merges the body of the
                              subquery into the body of the statement that contains it, allowing the optimizer
                              to consider them together when evaluating access paths and joins.

                              UNNEST_SUBQUERY first verifies if the statement is valid. If the statement is
                              not valid, then subquery unnesting cannot proceed. The statement must then
                              pass a heuristic test.

                              If the UNNEST_SUBQUERY parameter is set to true, then the UNNEST hint
                              checks the subquery block for validity only. If it is valid, then subquery
                              unnesting is enabled without Oracle checking the heuristics.


NO_UNNEST                     NO_UNNEST hint overrides the value of UNNEST_SUBQUERY parameter for
                              specific subquery blocks.


PUSH_PRED                     The PUSH_PRED hint forces pushing of a join predicate into the view.

                              SELECT /*+ PUSH_PRED(v) */ t1.x, v.y
                              FROM t1
                              (SELECT t2.x, t3.y
                              FROM t2, t3
                              WHERE t2.x = t3.x) v
                              WHERE t1.x = v.x and t1.y = 1

NO_PUSH_PRED                  The NO_PUSH_PRED hint prevents pushing of a join predicate into the view.

                              The PUSH_SUBQ hint causes non-merged subqueries to be evaluated at the
PUSH_SUBQ                     earliest possible place in the execution plan. Generally, subqueries that are not
                              merged are executed as the last step in the execution plan. If the subquery is
                              relatively inexpensive and reduces the number of rows significantly, then it
                              improves performance to evaluate the subquery earlier.



                                                                                           Page 41 of 49
                                      This hint has no effect if the subquery is applied to a remote table or one that is
                                      joined using a merge join.

CURSOR_SHARING_EXACT                  Oracle can replace literals in SQL statements with bind variables if it is safe to
                                      do so. This is controlled with the CURSOR_SHARING startup parameter. The
                                      CURSOR_SHARING_EXACT hint causes this behavior to be switched off. In
                                      other words, Oracle executes the SQL statement without any attempt to replace
                                      literals by bind variables.

Note:

The CACHE and NOCACHE hints affect system statistics "table scans(long tables)" and "table scans(short tables)", as
shown in the V$SYSSTAT view.




3        REFERENCES
         Informatica – Informatica manuals, Informatica developer net resources.
         Oracle – Oracle documentation at www.oracle.com and Meta                                   Link    notes     at
          www.metalink.oracle.com
         UBS project learnings




                                                                                                     Page 42 of 49
APPENDIX A

Appendix A.1 – PMServer Configuration File
      CommThreadPoolSize=30
      ServiceThreadPoolSize=30
      EngineThreadPoolSize=30
      LMRepoConnPoolSize=60 /*To support 60 concurrent repository connections */


      # Absolute or relative path of the file where the LM will write its messages.

      LogFileName=/path/<Rep Name>/pmserver.log   /* should be on high speed mounted
      drive */

      # Determine one of the two server data movement modes: UNICODE or ASCII.
      # If not specified, ASCII data movement mode is assumed.

      DataMovementMode=ASCII /* Unless we are processing multiple languages, ASCII mode
      is ok from performance perspective */

      # Maximum number of sessions – Based on our peak time requirement. Set it 20% over
      the maximum concurrency requirement to account for processing first time failures.

      MaxSessions=200

      # Load Manager (pmserver) shared memory, in bytes
      # For each 10-count increase of MaxSessions, increase this by 2000000 bytes

      LMSharedMem=40000000

      # Obsolete parameter PMCacheDir: PowerMart Cache directory location
      # Will be de-supported in future release
      # Recommendation: Use server variable $PMCacheDir in the Workflow Manager

      PMCacheDir=/tmp /* Should be high speed swap space rather than mounted drive to
      improve paging efficiency */

      # Applies if there is no free LMSHM slot available to run a session. If set
      # to Yes, it fails the session else it places the session in the pending queue.

      FailSessionIfMaxSessionsReached=No
      # Specify Yes to turn on PowerMart 4.0 / PowerCenter 1.0 date handling
      # compatibility.

      DateHandling40Compatibility=No


      # 3X compatability overrides AggregateTreatNullAsZero to one and
      # AggregateTreatRowAsInsert to one.

      Pmserver3XCompatability=0


      # If this flag is set SUM(X) with all values X = NULL will return zero.
      # Otherwise SUM(X) will return NULL.

      AggregateTreatNullAsZero=0


      #   Aggregator widget will treat all row types to be add to list of groups from
      #   the grouping perspective for a datadriven session with nonincremental
      #   aggregation. If not set, then row type delete will be removed from the
      #   list of groups.



                                                                            Page 43 of 49
AggregateTreatRowAsInsert=0


# Let pmserver know which display format to display Date/Time in
# e.g. DateDisplayFormat=YYYY/MM/DD HH24:MI:SS

DateDisplayFormat=DY MON DD HH24:MI:SS YYYY
TreatNullInComparisonOperatorsAs=


# Enable debugging capabilities of this server

LoadManagerAllowDebugging=Yes


# Create indicator files for file output

CreateIndicatorFiles=No


# Max # of DB connections to be used by Lookup/SP.
# 0 means unlimited

MaxLookupSPDBConnections=0


# Max # of internal DB connections that can be used by
# the application.
# The default (and minimum value) is 100

MaxSybaseConnections=100
MaxMSSQLConnections=100


# Number of deadlock retries that will be attempted per partition.

NumOfDeadlockRetries=10


# Number of seconds to sleep before attempting next deadlock retry

DeadlockSleep=0


#   Is the SybaseIQ 12 server located on the same machine as the
#   Informatica Server. This is used by the pmserver to determine
#   whether to use a pipe for SybaseIQ 12 external loading or a
#   regular file. If flag is set to yes when the SybaseIQ 12 server
#   is running on the same machine, the external loader will be able
#   to use the pipe to improve the performance of the external load.

SybaseIQLocalToPmServer=No


#   If denormalized data is fed into XML writer there're a lot of
#   "duplicate row" messages generated which may be undesirable.
#   To make XML Writer NOT output those warnings set the following
#   parameter to "No"

XMLWarnDupRows=Yes
OutputMetaDataForFF=


# Whether to validate codepage compatibility between source/server/target/lkup/SP

ValidateDataCodePages=Yes


# Output session log in UTF8 format




                                                                       Page 44 of 49
         SessionLogInUTF8=No


APPENDIX B – Oracle Stats Collection Approach

B.1 – Stats Collection

The approach followed to collect the Oracle stats is as under:

Informatica job/ Stored procedure can be scheduled every 15 minutes to collect stats for the database. User
defined tables like DB_BLOCK_BUFFER_STATS, LIB_STATS, DATA_DICT_CACHE are created to
store stats data.

        DB_BLOCK_BUFFER_STATS

         select name, value
         from v$sysstat
         where name in ('consistent gets', 'db block gets', 'physical reads')

        LIB_STATS

         select sum (pins) pins, sum (reloads) reloads
         from v$librarycache

        DATA_DICT_CACHE

         select sum(gets) GETS, sum(getmisses) GETMISSES
         from v$rowcache


        SORT_STATS

         select name, value
         from v$sysstat
         where name like 'sort%'

        DATA FILE STATS

         select d.name, f.phyrds, f.phywrts
         from
         v$datafile d,
         v$filestat f
         where d.file# = f.file#

        ROLLBACK_STATS

         select r.name, s.gets, s.waits
         from v$rollstat s, v$rollname r
         where s.usn=r.usn

       LATCH_STATS

         select name, gets, misses, sleeps, immediate_gets, immediate_misses
         from v$latch
         where name in ('redo allocation', 'redo copy')




                                                                                          Page 45 of 49
        REDO_SPACE_REQ

          select name, value
          from v$sysstat
          where name = 'redo log space requests'

Each run is distinguished on two attributes viz. RUN_ID and RUN_DATE

B.2- Oracle Stats Analytics
DB_BLOCK_BUFFER, DATA_DICT_CACHE, DFILE_STATS, LIB_STATS, REDO_LOG_STATS,
REDO_SPACE_REQ, ROLLBACK_STATS and SORT_STATS.

STATS 1: DB_BLOCK_BUFFER Hit Stats

select t1.run_date,t1.run_seq,t1.name as "physical reads", t1.value as
phys_reads,
t2.name as "consistent gets", t2.value as consis_gets,t3.name as "db
block gets", t3.value as db_block_gets,
1-(t1.value /(t2.value + t3.value)) as "Hit Ratio"
from
(select db_name,run_date,run_seq,name, value from DB_BLOCK_BUFFER where
name = 'physical reads')t1,
(select db_name,run_seq,name, value from DB_BLOCK_BUFFER where name =
'consistent gets')t2,
(select db_name,run_seq,name, value from DB_BLOCK_BUFFER where name =
'db block gets')t3
where
t1.db_name = 'GGLDWP2' or ‘GGLDWP1’
and
t1.db_name = t2.db_name
and
t1.run_seq = t2.run_seq
and
t2.db_name = t3.db_name
and
t2.run_seq = t3.run_seq


Hit Ratio = 1- (physical reads /(consistent gets+ db block gets))

Logical reads = db block gets + consistent gets

Or

'Hit Ratio = (logical reads - physical reads) / logical reads'

Suggestion-

If the hit ratio is less than 60%-70%, increase the initialization
parameter DB_BLOCK_BUFFERS. ** NOTE: Increasing this parameter will
increase the SGA size.




                                                                       Page 46 of 49
STATS 2 – Library Cache

Library Cache misses indicate that the shared pool is not big enough to hold the shared SQL area for all
concurrently open cursors. We have library cache misses when RELOADS <> 0. However, when we have
RELOADS = 0 (which is a very ideal situation), we may still get incremental performance improvement by
setting CURSOR_SPACE_FOR_TIME = TRUE, which will prevent Oracle from de-allocating a shared
SQL area while an application cursor associated with it is open.

PINS – Indicates the number of times an item in the library cache was executed.
RELOADS – Indicates the number of times, the requested item, as part of the execution step was not found
in the library cache.

select db_name, run_seq, run_date, sum_pins, sum_reloads,
(sum_reloads/sum_pins)*100 as Ratio
from LIB_STATS
where db_name= 'GGLDWP2' {or ‘GGLDWP1’}

Ideally, RELOADS should be as close to 0 as possible.

It is also possible to study the library cache stats by name spaces. Toward this, the following SQL may be
useful:

select namespace, trunc(gethitratio * 100) "Hit ratio",
trunc(pinhitratio * 100) "pin hit ratio", reloads "reloads"
from v$librarycache;

STATS 3 – Data Dictionary Cache

We should consider keeping this below 5% to keep the data dictionary cache in the SGA. We may need to
up the SHARED_POOL_SIZE to improve this statistic.

**NOTE: Increasing the SHARED_POOL_SIZE will increase the SGA.

select   db_name,            run_seq,         run_date,        sum_gets,         sum_misses          from
DATA_DICT_CACHE


STATS 4 – Data File I/O Stats

The table-partitioning scheme should be such so as to minimize serialization waits while reading/writing
data to the data file. The table-partitioning scheme should mirror the query semantics and access patterns.
We can minimize physical reads by adding incremental data buffers. However, this is always not possible
as we have a physical limitation on RAM. We can determine whether adding DB buffers will really help by
doing the following:

    o   Set db_block_lru_statistics = TRUE and db_block_lru_extended_statistics = TRUE in the init.ora
        parameters.
    o   Check the stats again.

select * from dfile_stats where db_name = 'GGLDWP2' {or GGLDWP1}

STATS 5- Rollback Contention

GETS - # of gets on the rollback segment header



                                                                                          Page 47 of 49
WAITS - # of waits for the rollback segment header

select db_name, run_seq, run_date, (sum(waits) /sum(gets))*100 as Ratio
from rollback_stats
where db_name = 'GGLDWP2' { ‘GGLDWP1’ }
group by db_name, run_seq, run_date

If the ratio of waits to gets is more than 1% or 2%, we may consider creating more rollback segments.

STATS 6- Redo Log Contention

The table REDO_SPACE_REQ holds information for how often the user processes had to wait for space in
the redo log buffer.

select * from REDO_SPACE_REQ

This value should be near 0. If this value increments consistently, processes have had to wait for space in
the redo buffer. If this condition exists over time, we can possibly, increase the size of LOG_BUFFER in
the init.ora file in increments of 5% until the value nears 0.

NOTE: increasing the LOG_BUFFER value will increase total SGA size.

STATS 7- Latch Contention

GETS - # of successful willing-to-wait requests for a latch
MISSES - # of times an initial willing-to-wait request was unsuccessful
IMMEDIATE_GETS - # of successful immediate requests for each latch
IMMEDIATE_MISSES = # of unsuccessful immediate requests for each latch
SLEEPS - # of times a process waited and requests a latch after an initial willing-to-wait request

If the latch requested with a willing-to-wait request is not available, the requesting process waits a short
time and requests again. If the latch requested with an immediate request is not available, the requesting
process does not wait, but continues processing.


select    db_name,   run_seq,    run_date,    gets,                                misses,           sleeps,
immediate_gets, immediate_misses from latch_stats

If either ratio exceeds 1%, performance will be affected. Decreasing the size of
LOG_SMALL_ENTRY_MAX_SIZE reduces the number of processes copying information on the redo
allocation latch. Increasing the size of LOG_SIMULTANEOUS_COPIES will reduce contention
for redo copy latches.

STATS 8: Checking for fragmented Data objects (Index and Data Table Spaces)

Physical reads and writes to fragmented table spaces result in additional rotational latency delays at the disk
drive. This delay becomes pronounced especially in case of large volume reads and writes to the database.
Our databases (GGLDWP1 and GGLDWP2) happen to share this operational characteristics of large
volume reads and writes.

If number of extents                   is    approaching         Maxextents,         it     is    time     to
defragment the table.

TABLE Fragmentation

select a.owner,               segment_name,          segment_type,



                                                                                             Page 48 of 49
     sum(bytes), max_extents,   count(*)
from    dba_extents a, dba_tables b
where segment_name = b.table_name
having count(*) > 3
group by a.owner, segment_name, segment_type, max_extents
order by a.owner, segment_name, segment_type, max_extents

INDEX FRAGMENTATION

select a.owner,     segment_name, segment_type,
       sum(bytes), max_extents,     count(*)
from   dba_extents a, dba_indexes b
where segment_name = index_name
having count(*) > 3
group by a.owner, segment_name, segment_type, max_extents
order by a.owner, segment_name, segment_type, max_extents




                                                            Page 49 of 49

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:28
posted:9/5/2011
language:English
pages:49