Database Performance Tuning by lqf20778

VIEWS: 58 PAGES: 35

More Info
									                                                      Product Name Here


Tivoli Software                                                                              (via Subject - File>Prop>Summary)




                        Relational Database Design and
                  Performance Tuning for DB2 Database
                                               Servers


                                                                                              How to document



                                                                                                  Document Version: 1.0

                                                                                          Document Status: <Draft >




                    NOTE: The hard copy version of this document is FOR REFERENCE ONLY. Online version is the master
                            It is the responsibility of the user to ensure that they have the current version.
                            Any outdated hard copy is invalid and must be removed from possible use.
                                                                                      IBM SWG - Tivoli Software
                            It is also the responsibility of the user to ensure the completeness of this document prior to use.
                                                                                                           3901 S. Miami Blvd.
                                                                                                            Durham, NC 27703
                                                                                                            +1 (919) 224-1598
                                                                                                            +1 (919) 224-2560
                                                                       Issue Date:
                                                                             Author:                             Edward Bernal
                                                                                                                         Template version 4




                                     IBM Confidential When Filled In

                                                                                                                     1
                                                                                      Product Name Here


 Tivoli Systems                                                                                                                     (via Subject - File>Prop>Summary)




HOW TO DOCUMENT .............................................................................................................................................................. 1

1.      DOCUMENT CONTROL ................................................................................................................................................... 3
     1.1        SUMMARY OF CHANGES .................................................................................................................................................. 3
2.      INTRODUCTION ................................................................................................................................................................ 4
     2.1     DEFINITIONS ................................................................................................................................................................... 4
     2.2     PERFORMANCE FACTORS ................................................................................................................................................ 5
        2.2.1    Database Design .................................................................................................................................................... 5
        2.2.2    Application Design ................................................................................................................................................. 5
        2.2.3    Hardware Design and Operating System Usage .................................................................................................... 5
3.      DATABASE DESIGN DETAILS ....................................................................................................................................... 5
     3.1     KEY FACTORS ................................................................................................................................................................. 6
        3.1.1    Tablespaces ............................................................................................................................................................ 6
        3.1.2    Tables ..................................................................................................................................................................... 7
        3.1.3    Bufferpools ............................................................................................................................................................. 8
        3.1.4    Logging .................................................................................................................................................................. 9
        3.1.5    Indexing ................................................................................................................................................................ 10
     3.2     DATABASE MAINTENANCE ........................................................................................................................................... 14
        3.2.1    REORG ................................................................................................................................................................. 14
        3.2.2    RUNSTATS ........................................................................................................................................................... 15
        3.2.3    REBIND ................................................................................................................................................................ 16
4.      APPLICATION DESIGN DETAILS ............................................................................................................................... 17
     4.1     KEY FACTORS ............................................................................................................................................................... 17
        4.1.1    Review application SQL for efficiencies *** IMPORTANT *** .......................................................................... 17
        4.1.2    Concurrency Control and Isolation Level ............................................................................................................ 17
        4.1.3    Locking ................................................................................................................................................................. 18
        4.1.4    Fetch needed columns only .................................................................................................................................. 19
        4.1.5    Reuse resources .................................................................................................................................................... 19
        4.1.6    SQL Statement Preparation.................................................................................................................................. 19
        4.1.7    Query Tuning........................................................................................................................................................ 20
5.      HARDWARE DESIGN AND OPERATING SYSTEM USAGE ................................................................................... 21
     5.1     KEY FACTORS ............................................................................................................................................................... 21
        5.1.1    Memory ................................................................................................................................................................ 21
        5.1.2    CPU ...................................................................................................................................................................... 21
        5.1.3    I/O ........................................................................................................................................................................ 22
        5.1.4    Network ................................................................................................................................................................ 23
6.      TUNING OPPORTUNITIES ............................................................................................................................................ 24
     6.1        INSERT/DELETE PERFORMANCE .................................................................................................................................... 24
     6.2        DATABASE MANAGER CONFIGURATION TUNING .......................................................................................................... 25
     6.3        DATABASE CONFIGURATION TUNING ........................................................................................................................... 26
     6.4        BUFFERPOOLS ............................................................................................................................................................... 27
     6.5        REGISTRY VARIABLES .................................................................................................................................................. 27
7.      MONITORING TOOLS.................................................................................................................................................... 28



                                                                           IBM Confidential

 18. January 2011                                                      CMDB RR API SDD1, Ling Tai                                                  Page 2 of 35
                                                                                       Product Name Here


Tivoli Systems                                                                                                                       (via Subject - File>Prop>Summary)



8.      TUNING METHODOLOGY ............................................................................................................................................ 28
     8.1     TUNING APPROACH ....................................................................................................................................................... 28
     8.2     SKILLS NEEDED ............................................................................................................................................................ 29
     8.3     GENERAL PROCESS ....................................................................................................................................................... 29
     8.4     DB2 SPECIFIC TUNING .................................................................................................................................................. 30
        8.4.1    SQL Reviews ......................................................................................................................................................... 30
        8.4.2    Database Statistics ............................................................................................................................................... 31
        8.4.3    SNAPSHOT and EVENT Monitors ....................................................................................................................... 32
        8.4.4    DB2BATCH .......................................................................................................................................................... 34
9.      PUBLICATIONS & ONLINE HELP ............................................................................................................................... 34

10.         SIZING ............................................................................................................................................................................ 34

11.         REFERENCE ................................................................................................................................................................. 35

12.         ADDENDUM .................................................................................................................................................................. 35




1.         Document Control

1.1           Summary of Changes
The table below contains the summary of changes:
Version                    Date                            Description of changes

1.0                                                        Distribution of 1st draft




                                                                           IBM Confidential

18. January 2011                                                       CMDB RR API SDD1, Ling Tai                                                    Page 3 of 35
                                                    Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)




2.    Introduction
Any performance tuning requires some knowledge of the application involved. Relational Database
systems are no different. In fact, the more knowledge available about an application, the better one can
make decisions that will positively affect the performance of that application. While it is possible to
perform some amount of tuning of Relational Database Systems after the fact, the more attention you pay
to the overall design of your entire system up front will generally yield the best results.

While much of this material is applicable to DB2 on all available platforms, this paper specifically deals
only with DB2 on distributed platforms, i.e. Windows, UNIX, and Linux. The material was obtained
from readily available publications on DB2 and from over 18 years experience with designing,
developing, and tuning DB2 systems. While DB2 specific, many of the concepts are applicable to
Relational Databases in general, such as Oracle and Microsoft SQL Server.

This document is not intended to replace the detailed information that is available in various manuals,
Redbooks, etc., that deal specifically with DB2 performance. The intent is to point out some of the major
factors that affect DB2 performance and, hopefully, make it an easy to use and understand reference so
that the user will not have to read through and understand in detail all of the previously mentioned
reference material. The detailed material can, and should, be referenced when addressing a specific area
of concern. Refer to Section 9 for a listing of some of these publications. It is important to note that all of
these factors should be addressed for each application developed.



2.1    Definitions
           Throughput - The amount of data transferred from one place to another or processed in a
            specified amount of time. Data transfer rates for disk drives and networks are measured in
            terms of throughput. Typically, throughputs are measured in kbps, Mbps and Gbps.
           Optimizer - When an SQL statement needs to be executed, the SQL compiler needs to
            determine the access plan to the database tables. The optimizer creates this access plan, using
            information about the distribution of data in specific columns of tables and indexes if these
            columns are used to select rows or join tables. The optimizer uses this information to estimate
            the costs of alternative access plans for each query. Its decisions are heavily influenced by
            statistical information about the size of the database tables and available indexes.
           Clustered Index - An index whose sequence of key values closely corresponds to the sequence
            of rows stored in a table. The degree to which this correspondence exists is measured by
            statistics that are used by the optimizer
           Cardinality – with respect to tables, the number of rows in the table. With respect to indexed
            columns, the number of distinct values of that column in a table.


                                           IBM Confidential

18. January 2011                        CMDB RR API SDD1, Ling Tai                   Page 4 of 35
                                                     Product Name Here


Tivoli Systems                                                            (via Subject - File>Prop>Summary)



             Prefetch - an operation in which data is read before, and in anticipation of, its use. DB2
              supports the following mechanisms:
               sequential prefetch - A mechanism that reads consecutive pages into the buffer pool before
                  the pages are required by the application.
               List Prefetch - Sometimes called list sequential prefetch. Prefetches a set of non-
                  consecutive data pages efficiently.

2.2     Performance Factors
There are a number of areas that factor into the overall performance of any application. Below are the
general areas of concern along with an explanation of each.

2.2.1        Database Design
The term “Database Design” can mean a lot of things to a lot of people. There are two main types of data
models: a logical and a physical. A “Logical” model is a representation, often graphical in nature, of the
information requirements of a business area, it is not a database. Its main purpose is to ensure that its
structure and content can represent and support the business requirements of an area of interest. It is
independent of any database technology.

After completing your logical database design, there are a number of issues you should consider about the
physical environment in which your database and tables will reside. These include understanding the files
that will be created to support and manage your database, understanding how much space will be required
to store your data, and determining how you should use the tablespaces that are required to store your
data.

This document deals only with the aspects of physical database design.

2.2.2        Application Design
For the purposes of this document, application design deals with aspects of how you access your database
system. There are a number of techniques that will be discussed that, if used, can positively influence the
performance of your application.


2.2.3        Hardware Design and Operating System Usage
For any database system, there are a number of common areas that need to be addressed and sized
appropriately in order to support your application workload. This section will discuss common, and
platform specific, hardware and operating system components.


3.      Database Design Details


                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai               Page 5 of 35
                                                      Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



3.1       Key Factors
3.1.1      Tablespaces
A tablespace is a physical storage object that provides a level of indirection between a database and the
tables stored within the database. It is made up of a collection of containers into which database objects
are stored. A container is an allocation of space to a table space. Depending on |the table space type, the
container can be a directory, device, or file. The data, index, long field, and LOB portions of a table can
be stored in the same table space, or can be individually broken out into separate table spaces.

When working with database systems, the main objective is to be able to store and retrieve data as quickly
and efficiently as possible. One important consideration when designing your database or analyzing a
performance problem on an existing database is the physical layout of the database itself. DB2 provides
support for two types of tablespaces:
     System Managed Space (SMS) - stores data in operating system files. They are an excellent choice
        for general purposes use. They provide good performance with little administration cost.
     Database Managed Space (DMS) - with database-managed space (DMS) table spaces, the
        database manager controls the storage space. A list of devices or files is selected to belong to a
        tablespace when it is defined. The space on those devices or files is managed by the DB2 database
        manager. There is some additional administration cost with DMS tablespaces primarily due to
        monitoring and adjusting the size of the pre-allocated files. A DMS tablespace can be easily
        increased in size by either ALTERing an existing container or adding a new container to it.
3.1.1.1    Recommendations

         DMS tablespaces usually perform better than SMS tablespaces because they are pre-allocated and
          do not have to spend time extending files when new rows are added. DMS tablespaces can be
          either raw devices or file system files. DMS table spaces in raw device containers provide the best
          performance because double buffering does not occur. Double buffering, which occurs when data
          is buffered first at the database manager level and then at the file system level, might be an
          additional cost for file containers or SMS table spaces.
         If you use SMS tablespaces, consider using the db2empfa command on your database. The
          db2empfa (Enable Multipage File Allocation) tool enables the use of multipage file allocation for
          a database. With multipage file allocation enabled for SMS table spaces, disk space is allocated
          one extent rather than one page at a time, improving INSERT throughput.
         Using DMS tablespaces also allows a single table to store its data, index, and large objects on up
          to three different DMS tablespaces, thus improving performance through parallel disk I/O
          operations. Parallel I/O is the process of reading from or writing to two or more I/O devices at the
          same time to reduce response time For example, look at the following statement (assume there are
          no large object data types):

                 CREATE TABLE T1
                 (COLA etc.
                 IN TS1 INDEX IN TS2



                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                 Page 6 of 35
                                                      Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



                 CREATE INDEX IX1 ON T1
                 (COLA ASC)

          The table data will be placed in tablespace TS1 and the index data will be place in tablespace TS2.

          It would be important to put the different tablespaces on different disk drives in order to enable the
          possibility of parallel I/O operations. In fact, each tablespace can have multiple containers, and
          each of those containers could be on a different disk drive. The ideal configuration has to consider
          a number of factors, such as the number of disks available, RAID level, etc.

3.1.1.2    Platform Specific Recommendations
Windows

File system caching is performed as follows:
     For DMS file containers (and all SMS containers), the operating system might cache pages in the
        file system cache
     For DMS device container table spaces, the operating system does not cache pages in the file
        system cache.
     On Windows, the DB2 registry variable DB2NTNOCACHE specifies whether or not DB2 will
        open database files with the NOCACHE option. If DB2NTNOCACHE=ON, file system caching
        is eliminated. If DB2NTNOCACHE=OFF, the operating system caches DB2 files. This applies to
        all data except for files that contain LONG FIELDS or LOBS. Eliminating system caching allows
        more memory to be available to the database so that the buffer pool or sortheap can be increased.


3.1.2      Tables
As previously discussed, before creating your physical database tables, you should draw a logical design
of your data model. We will briefly discuss aspects of logical database design that affect performance.

3.1.2.1    Normalization
This is the process of restructuring a data model by reducing its relations to their simplest forms. It is a
key step in the task of building a logical relational database design. Normalization reduces redundancy
from your data and can improve the performance of update and delete statements, since you only have to
do it in one place. By normalizing your data, you try to ensure that all columns in the table depend on the
primary key. The disadvantage of a fully normalized data structure is in data retrieval operations,
specifically when a query is accessing a large number of related pieces of data from different tables via
join operations. For more information about Normalization, author C.J. Date is one of the better
resources. Access to his works can be located by searching the internet.
3.1.2.2    Denormalization
The intentional duplication of columns in multiple tables whose consequence is increased data
redundancy. Denormalization is sometimes necessary to minimize performance problems and is a key

                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                  Page 7 of 35
                                                     Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



step in designing a physical relational database design. The decision about whether to denormalize or not
should be based on the following:
         Can you utilize and implement some of the other methods described to tune your database and
          improve performance to an acceptable level without denormalizing
         Can you quantify the likely performance gains by denormalizing, and are they a reasonable trade-
          off against the added update overhead?
3.1.2.3    Other considerations

         NULLs - In general, columns defined as NOT NULL perform better than nullable columns due to
          the path length reduction. The database manager does not have to check for null values in a NOT
          NULL column. Also, every nullable column requires one extra byte per column value. Use
          NULLs where appropriate and not as a default.
         Column lengths – you should define your column lengths, particularly VARCHAR columns, as
          small as possible for your application. By doing this there will be space savings which may lead
          to a reduced number of used table and index pages, and index levels, which can improve query
          performance.
             o If you create an index on a column defined as VARCHAR, all index entries for that
               column take the maximum length of the VARCHAR definition, even is the actual data
               length in the column is smaller than the maximum.
         Identity Columns - Significant performance enhancement can be realized using DB2 generated
          identity values compared to those implemented by an application. They are typically used for
          generating unique primary key values.
         Put frequently updated columns together and at the end of the row. This has an effect on update
          performance due to the following logging considerations:
             o For fixed lengths row updates, DB2 logs from the first changed column to the last changed
               column
             o For variable length row updates, DB2 logs from the first changed byte to the end of the
               row. If the length of a variable length column changes, this will result in a change to the
               row header (which includes the row length), and thus the entire row will be logged.
3.1.3      Bufferpools
A bufferpool is an area of memory into which database pages are read, modified, and held during
processing. On any system, accessing memory is faster than disk I/O. DB2 uses database buffer pools to
attempt to minimize disk I/O.

There is no definitive answer to the question of how much memory you should dedicate to the buffer
pool. Generally, more is better. A good rule of thumb would be to start with about 75% of your system‟s
main memory devoted to buffer pool(s), but this rule is applicable only if the machine is a dedicated
database server. Since it is a memory resource, its use has to be considered along with all other
applications and processes running on a server.

                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                 Page 8 of 35
                                                       Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



         A spreadsheet will be provided later in this document that can be use to estimate DB2 memory
          usage

If your tablespaces have multiple page sizes, then you should create only one buffer pool for each page
size.

There are some cases where defining multiple buffer pools of the same size can improve performance,
but, if badly configured, can have a huge negative impact on performance. Consider the following when
deciding to create multiple buffer pools:
     You create tables which reside in table spaces using a page size other than the 4 KB default. This
         is required (as mentioned above).
     You have tables which are accessed frequently and quickly by many short update transaction
         applications. Dedicated buffer pool(s) for these tables may improve response times.
     You have tables larger than main memory which are always fully scanned. These could have their
         own dedicated buffer pool.
3.1.4      Logging
On of the main purposes of all database systems is to maintain the integrity of your data. All databases
maintain log files that keep records of database changes. DB2 logging consists of a set of primary and
secondary log files that contain log records that record all changes to a database. The database log is used
to roll back changes for units of work that are not committed and to recover a database to a consistent
state. DB2 provides two logging strategy choices.
3.1.4.1    Circular logging
This is the default log mode. With circular logging, the log records fill the log files and then overwrite the
initial log records in the initial log file. The overwritten log records are not recoverable. This type of
logging is typically not suited for a production application.
3.1.4.2    Log Retain logging
Each log file is archived when it fills with log records. New log files are made available for log records.
Retaining log files enables roll-forward recovery. Roll-forward recovery reapplies changes to the database
based on completed units of work (transactions) that are recorded in the log. You can specify that roll-
forward recovery is to the end of the logs, or to a particular point in time before the end of the logs.
Archived log files are never directly deleted by DB2, therefore, it is the applications responsibility to
maintain them; i.e. archive, purge, etc.
3.1.4.3    Log Performance
Ignoring the performance of your database in relation to its logging can be a costly mistake, the main cost
being time. Placement of the log files needs to be optimized, not only for write performance, but also for
read performance, because the database manager will need to read the log files during database recovery.
3.1.4.4    Recommendations

         Use the fastest disks available for your log files

                                              IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai               Page 9 of 35
                                                    Product Name Here


Tivoli Systems                                                            (via Subject - File>Prop>Summary)



            o Use a separate array and/or channel if possible
       Use Log Retain logging
       Mirror your log files
       Increase the size of the database configuration Log Buffer parameter (logbufsz)
            o This parameter specifies the amount of the database heap to use as a buffer for log records
              before writing these records to disk. The log records are written to disk when one of the
              following occurs:
                      A transaction commits, or a group of transactions commit, as defined by the
                       mincommit configuration parameter.
                      The log buffer is full.
                      As a result of some other internal database manager event.
            o Buffering the log records will result in more efficient logging file I/O because the log
              records will be written to disk less frequently, and more log records will be written at each
              time.
       Tune the Log File Size (logfilsiz) database configuration parameter so that you are not creating
        excessive log files.


3.1.5    Indexing
An index is a set of keys, each pointing to a row, or rows in a table. An index serves two primary
purposes:
   1. To ensure uniqueness, as in the case of a Primary Key. Unique indexes can be created to ensure
      uniqueness of the index key. An index key is a column or an ordered collection of columns on
      which an index is defined. Using a unique index will ensure that the value of each index key in the
      indexed column or columns is unique.
    2. To allow more efficient access to rows in a table by creating a direct path to the data through
        pointers. The SQL optimizer automatically chooses the most efficient way to access data in
        tables. The optimizer takes indexes into consideration when determining the fastest access path to
        data.
The main advantages of indexes were pointed out above. Creating an index to ensure uniqueness is
typically driven by a business requirement of the application, and thus, are absolutely necessary to create.
Beyond that, you should be very careful about the number and size of the indexes that you create on your
database tables. Each additional index will have an impact on the following:
     disk storage
     INSERT and DELETE Processing
            o CPU
            o I/O


                                           IBM Confidential

18. January 2011                        CMDB RR API SDD1, Ling Tai                Page 10 of 35
                                                      Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



             o Logging
         Database Maintenance
             o REORG
             o RUNSTATS
3.1.5.1    Index Considerations
The creation of any index for the purpose of efficient access should be done based on a review of the
actual SQL that has been, or will be, written against the tables.
In addition to the above mentioned comment, the following items need to be considered when
determining which indexes to create.
     The size of the index, determined by:
            o The number and size of the columns in the index
            o The projected, or actual, volume of data in your tables.

          Indexes are implemented as a B-Tree structure, with a root level, intermediate level(s), and leaf
          pages (the lowest level) which contain actual pointers to the data. A typical index on a moderately
          sized table would be 3 levels deep. That means that if the optimizer decided to use this index to
          read the data, it would have to do a minimum of 4 I/O‟s to satisfy the query:
                      1. the index root page
                      2. one intermediate level page
                      3. one leaf page
                      4. one data page

          As the size of the index increases, the number of pages needed to store the index entries and/or the
          number of intermediate levels increases as well.

         The Cardinality of the indexed columns – one of the most often overlooked issues when creating
          an index. Consider the following example to illustrate the point:

                 CREATE TABLE T1
                 (COLA INTEGER          NOT NULL,
                  COLB SMALLINT         NOT NULL,
                  COLC CHAR(10)         NOT NULL)

                 CREATE UNIQUE INDEX IX1 ON T1
                 (COLA ASC) CLUSTER

                 CREATE INDEX IX2 ON T1
                 (COLB ASC)

                 For this example, assume
                     1. T1 has 100,000,000 rows
                     2. the cardinality of COLB is 4, with an even distribution among the rows

                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                Page 11 of 35
                                                     Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



                     3. we are not considering Multi-Dimensional Clustering (MDC)
                     4. the following query is typically run against this table

                        SELECT COLA, COLC
                        FROM T1
                        WHERE COLB = 3

                 On the surface, this looks like it all makes sense. We have a SELECT statement with an
                 “=” predicate on COLB, and we have an index defined on that column, what could be
                 better!

                 This table has a clustering index defined on COLA. The intent of a clustering index is so
                 that the sequence of key values closely corresponds to the sequence of rows stored in a
                 table. Each table can only have one clustering index, so our index on COLB is not
                 clustered.

                 Assuming an equal distribution of rows across the COLB predicate, our WHERE clause,
                 “COLB = 3”, will return 25,000,000 rows to our query: 100,000,000/4.

                 It is highly unlikely that the DB2 optimizer would select the use of the COLB index (IX2)
                 to satisfy this query, due to high I/O cost. It would more likely decide to scan the entire
                 table by taking advantage of sequential Prefetch.

                 Since this index would not be used to satisfy the query, there would only be a negative
                 impact by creating it as discussed above; i.e. disk, CPU, INSERT and DELETE processing,
                 etc.

      The order of the columns in the index – another of the most often overlooked issues when creating
       an index. Consider the following example to illustrate the point

                 CREATE TABLE T1
                 (COLA INTEGER         NOT NULL,
                  COLB SMALLINT        NOT NULL,
                  COLC CHAR(10)        NOT NULL,
                  COLD INTEGER)

                 CREATE UNIQUE INDEX IX1 ON T1
                 (COLA ASC)

                 CREATE INDEX IX2 ON T1
                 (COLB ASC,
                  COLC ASC,
                  COLD ASC) CLUSTER

                 For this example, assume


                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                Page 12 of 35
                                                      Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



                     1. T1 has 100,000,000 rows
                     2. the following query is typically run against this table

                        SELECT COLB, COLD
                        FROM T1
                        WHERE COLC      = „DATA‟
                        AND COLD        > 42

                 We have a SELECT statement with an “=” predicate on COLC, a second predicate with a
                 “>” predicate on COLD, and we have an index (IX2) defined on those columns, plus
                 COLB, which is also in our SELECT list.

                 Will the DB2 optimizer select index IX2 to satisfy this query? The answer is not so clear.
                 Remember that DB2 indexes are created as a B-tree structure. The first column of the IX2
                 index is COLB, but we have not provided a predicate for that column in our where clause,
                 so DB2 can not effectively navigate the B-tree since the high order part of the key was not
                 provided.

                 This does not, however, eliminate the possibility of the use of this index. Since all of the
                 data referenced (COLB, COLC, and COLD) is in the index, DB2 has two possibilities,
                 either scan the entire table, or scan the entire index. Since there would be fewer pages in
                 the index, and since there is no need to go to the data pages, an index scan would probably
                 be selected. Index scans are also eligible for Prefetch operations.

                 If this was a typical query used against this table, a better definition of the IX2 index would
                 be to order the columns as follows:

                 CREATE INDEX IX2 ON T1
                 (COLC ASC,
                  COLD ASC,
                  COLB ASC) CLUSTER

                 This would allow effective use of the B-tree structure by providing matching values on the
                 high order part of the key.

These examples were provided for illustration only. You need to consider all of the database access
queries against your tables, not just an individual SELECT statement, in order to determine the best index
strategy.

3.1.5.2   Index Recommendations
    o Create as few indexes as possible
    o Consider creating the INDEXES with the “ALLOW REVERSE SCANS” option


                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                 Page 13 of 35
                                                       Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



      o Pay close attention to the order of the columns in the index
      o Don‟t create redundant indexes
      o Use DB2 “Explain” facilities to determine the actual usage of the indexes



3.2       Database Maintenance
Regular maintenance is a critical factor in the performance of a database environment. This involves
running the Reorg, Runstats, and Rebind facilities, in that order, on the database tables. A regularly
scheduled maintenance plan is essential in order to maintain peak performance of your system.
3.2.1      REORG
After many changes to table data, caused by INSERT, DELETE, and UPDATE of variable length
columns activity, logically sequential data may be on non-sequential physical data pages so that the
database manager must perform additional read operations to access data. You can reorganize DB2 tables
to eliminate fragmentation and reclaim space using the REORG Command.
         Significant reductions in elapsed times due to improved I/O can result from regularly scheduled
          REORG‟s.
         DB2 provides two types of REORG operation.
              o Classic REORG
                        Provides the fastest method of REORG
                        Indexes are rebuilt during the reorganization
                        Ensures perfectly ordered data
                        Access is limited to read-only during the UNLOAD phase, no access during other
                         phases
                        Is not re-startable
              o In-Place REORG
                        Slower than the Classic REORG, takes longer to complete
                        Does not ensure perfectly ordered data or indexes
                        Requires more log space
                        Can be paused and re-started
                        Can allow applications to access the database during reorganization
3.2.1.1    Recommendations

         Implement a regularly scheduled maintenance plan



                                               IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai               Page 14 of 35
                                                       Product Name Here


Tivoli Systems                                                                 (via Subject - File>Prop>Summary)



         If you have an established database maintenance window, use the Classic REORG
         If you operate a 24 by 7 operation, use the In-Place REORG


3.2.2      RUNSTATS
It was mentioned earlier that the DB2 optimizer uses information and statistics in the DB2 catalog in
order to determine the best access to the database based on the query provided. Statistical information is
collected for specific tables and indexes in the local database when you execute the RUNSTATS utility.
When significant numbers of table rows are added or removed, or if data in columns for which you collect
statistics is updated, execute RUNSTATS again to update the statistics. Use the RUNSTATS utility to
collect statistics in the following situations:
        When data has been loaded into a table and the appropriate indexes have been created.
        When you create a new index on a table. You need execute RUNSTATS for only the new index if
         the table has not been modified since you last ran RUNSTATS on it.
     When a table has been reorganized with the REORG utility.
     When the table and its indexes have been extensively updated, by data modifications, deletions,
         and insertions. (“Extensive” in this case may mean that 10 to 20 percent of the table and index
         data has been affected.)
     Before binding, or rebinding, application programs whose performance is critical
     When you want to compare current and previous statistics. If you update statistics at regular
         intervals you can discover performance problems early.
     When the prefetch quantity is changed.
     When you have used the REDISTRIBUTE DATABASE PARTITION GROUP utility.
There are various formats of the RUNSTATS command, mainly determining the depth and breadth or
statistics collected. The more you collect, the longer the command takes to run. Some of the options are
as follows:
         Collect either SAMPLED or DETAILED index statistics
         Collecting statistics on all columns or only columns used in JOIN operations
         Collecting distribution statistics on all, key, or no columns. Distribution statistics are very useful
          when you have an uneven distribution of data on key columns

3.2.2.1    Recommendations

         Care must be taken when running RUNSTATS since the information collected will impact the
          optimizer‟s selection of access paths.
         Implement as part of a regularly scheduled maintenance plan if some of the above mentioned
          conditions occur



                                              IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai                 Page 15 of 35
                                                    Product Name Here


Tivoli Systems                                                            (via Subject - File>Prop>Summary)



       To ensure that the index statistics are synchronized with the table, execute RUNSTATS to collect
        both table and index statistics at the same time.
       Consider some of the following factors when deciding what type of statistics to collect
           o Collect statistics only for the columns used to join tables or in the WHERE, GROUP BY,
                and similar clauses of queries. If these columns are indexed, you can specify the columns
                with the ONLY ON KEY COLUMNS clause for the RUNSTATS command.
           o Customize the values for num_freqvalues and num_quantiles for specific tables and
                specific columns in tables.
           o Collect DETAILED index statistics with the SAMPLE DETAILED clause to reduce the
                amount of background calculation performed for detailed index statistics. The SAMPLE
                DETAILED clause reduces the time required to collect statistics, and produces adequate
                precision in most cases.
           o When you create an index for a populated table, add the COLLECT STATISTICS clause
                to create statistics as the index is created.
3.2.3    REBIND
After running RUNSTATS on your database tables, you need to rebind your applications to take
advantage of those new statistics. This is done to ensure the best access plan is being used for your SQL
statements. How that rebind takes place depends on the type of SQL you are running. DB2 provides
support for the following:
       Dynamic SQL - SQL statements that are prepared and executed at run time. In dynamic SQL, the
        SQL statement is contained as a character string in a host variable and is not precompiled.
       Static SQL - SQL statements that are embedded within a program, and are prepared during the
        program preparation process before the program is executed. After being prepared, a static SQL
        statement does not change, although values of host variables specified by the statement can
        change. These Static statements are stored in a DB2 object called a package
Both Dynamic SQL statements and Packages can be stored in one of DB2‟s cache‟s. Based on the above
types of SQL, a rebind will take place under these conditions.
       Dynamic SQL
            o If the statement is not in the cache, the SQL Optimizer will “bind” the statement and
              generate a new access plan
            o If the statement is in the cache, no “rebind” will take place
                        To clear the contents of the SQL cache, use the FLUSH PACKAGE CACHE sql
                         statement
       Static SQL
            o An explicit REBIND <package> is executed
            o Implicitly if the package is marked “invalid”


                                           IBM Confidential

18. January 2011                        CMDB RR API SDD1, Ling Tai               Page 16 of 35
                                                       Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



                         This can occur if, for example, an index that the package was using has been
                          dropped.


3.2.3.1     Recommendations
      o Perform a REBIND after running RUNSTATS as part of you normal database maintenance
        procedures.




4.        Application Design Details

4.1        Key Factors
4.1.1       Review application SQL for efficiencies *** IMPORTANT ***
If there is any one thing that you should focus on from this entire paper, it is this topic. In a significant
majority of cases, probably the single most important factor when it comes to performance with DB2, is
how efficiently your SQL statements are written. This topic mainly deals with SQL search criteria, which
can be present in SELECT, UPDATE, DELETE, or INSERT (through a subselect) statements.
Reviewing SQL serves the following purposes:
          Provides the database designer with the necessary information they need in order to determine the
           proper indexes that should be created on your database tables. These statements are essential for
           the designer to be able to create the optimal indexes to support your database access. All of the
           considerations mentioned above regarding indexes should be considered.
          Allows an independent review of the SQL for the purpose of utilizing efficient SQL coding
           techniques
          Determine if locking strategies are appropriate
          Assess the impact of changes in your data model or data content
          Assess the impact of the application of service to the database manager
          How to review the SQL statements will be discussed in Section 8: Tuning Approach
4.1.1.1     Recommendations
      o Implement a formal SQL review process for your application(s)
4.1.2       Concurrency Control and Isolation Level
An isolation level determines how data is locked or isolated from other processes while the data is being
accessed. The isolation level will be in effect for the duration of the unit of work. DB2 supports the
following isolation levels, listed in order of most restrictive to least restrictive:

                                              IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai                Page 17 of 35
                                                       Product Name Here


Tivoli Systems                                                                (via Subject - File>Prop>Summary)



         Repeatable Read - An isolation level that locks all the rows in an application that are referenced
          within a transaction. When a program uses repeatable read protection, rows referenced by the
          program cannot be changed by other programs until the program ends the current transaction.
         Read Stability - An isolation level that locks only the rows that an application retrieves within a
          transaction. Read stability ensures that any qualifying row that is read during a transaction is not
          changed by other application processes until the transaction is completed, and that any row
          changed by another application process is not read until the change is committed by that process.
         Cursor Stability - An isolation level that locks any row accessed by a transaction of an application
          while the cursor is positioned on the row. The lock remains in effect until the next row is fetched
          or the transaction is terminated. If any data is changed in a row, the lock is held until the change is
          committed to the database
         Uncommitted Read - An isolation level that allows an application to access uncommitted changes
          of other transactions. The application does not lock other applications out of the row that it is
          reading, unless the other application attempts to drop or alter the table. Sometimes referred to as
          “Dirty Reads”
4.1.2.1    Recommendations
    o Make sure you know the isolation level under which you are running. Do not count on default
      values, which can change based on how you are accessing the database.
    o Because the isolation level determines how data is locked and isolated from other processes while
      the data is being accessed, you should select an isolation level that balances the requirements of
      concurrency and data integrity for your particular application. The isolation level that you specify
      is in effect for the duration of the unit of work.
4.1.3      Locking
To provide concurrency control and prevent uncontrolled data access, the database manager places locks
on tables, table blocks, or table rows. A lock associates a database manager resource with an application,
called the lock owner, to control how other applications can access the same resource. Locking is a
fundamental process of any database manager and is used to ensure the integrity of the data. But while
maintaining those locks, there is a potential impact on the concurrency and throughput of your
application.
There are a number of factors that the database manager uses to determine whether to use row level or
table level locking:
         The different isolation levels described above are used to control access to uncommitted data,
          prevent lost updates, allow non-repeatable reads of data, and prevent phantom reads. Use the
          minimum isolation level that satisfies your application needs.
         The access plan selected by the optimizer. Table scans, index scans, and other methods of data
          access each require different types of access to the data.



                                              IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai                 Page 18 of 35
                                                     Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



         The LOCKSIZE attribute for the table. This parameter indicates the granularity of the locks used
          when the table is accessed. The choices are either ROW for row locks, or TABLE for table locks.
         The amount of memory devoted to locking. The amount of memory devoted to locking is
          controlled by the locklist database configuration parameter.
4.1.3.1    Recommendations
    o *** IMPORTANT *** - COMMIT as frequently as possible and/or practical in order to release
      any locks your application holds. If possible, design your application so that you can easily vary
      the commit frequency for large batch operations. This will allow you to optimally balance the
      throughput and concurrency of your system.
    o Use ALTER TABLE... LOCKSIZE TABLE for read-only tables. This reduces the number of
      locks required by database activity.
    o If the lock list fills, performance can degrade due to lock escalations and reduced concurrency on
      shared objects in the database. If lock escalations occur frequently, increase the value of either
      locklist or maxlocks , or both
4.1.4      Fetch needed columns only
There is additional CPU cost associated with each column selected or fetched from the database. Higher
I/O cost may also be experienced if sorting is required.
4.1.4.1    Recommendations
         Select or fetch the columns that you need
         Never code “SELECT *” to retrieve all columns in a table
4.1.5      Reuse resources
Consider reuse of the following components:
4.1.5.1    Recommendations
         Database connections – this can be accomplished using the connection pooling features of DB2.
          Connection pooling is a process in which DB2 drops the inbound connection with an application
          that requests disconnection, but keeps the outbound connection to the host in a pool. When a new
          application requests a connection, DB2 uses one from the existing |pool. Using the already-present
          connection reduces the overall connection time, as well as the high processor connect cost on the
          host. Connection pooling is implemented using:
             o   DB2 Connect
             o   via JDBC using the Websphere connection pooling feature.
4.1.6      SQL Statement Preparation
Before an SQL statement can be executed, it must be converted from text form to an executable form, by
submitting it to the SQL compiler. This is referred to as the SQL Statement Prepare process. After the

                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai               Page 19 of 35
                                                     Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



statement is prepared, the bind process occurs. This process converts the output from the SQL compiler
to a usable control structure, such as an access plan, application plan, or package. During the bind process,
access paths to the data are selected and some authorization checking is performed. DB2 supports two
types of SQL processing.
         Static - SQL statements that are embedded within a program, and are prepared during the program
          preparation process before the program is executed. After being prepared, a static SQL statement
          does not change, although values of host variables specified by the statement can change.
         Dynamic - SQL statements that are prepared and executed at run time. In dynamic SQL, the SQL
          statement is contained as a character string in a host variable and is not precompiled.
Static SQL offers the advantage of only executing the statement preparation process once, this eliminating
that processing step each time the statement is executed.
Dynamic SQL statements, by there definition, are prepared and executed at run time. You can, however,
minimize the effect of statement preparation by writing your dynamic SQL statement using parameter
markers. Parameter markers act in a similar fashion to host variables in static SQL statements. If you use
them in a dynamic SQL statement, you would first issue a single PREPARE statement, followed by
multiple EXECUTE statements, which allow you to substitute values for the parameter markers. This
savings can be significant for simple SQL statements, like an INSERT, that are executed many times with
different values.
4.1.6.1    Recommendations

         Use static SQL whenever possible
             o If using java, SQLJ supports static SQL
         If you use dynamic SQL
             o code them using parameter markers
             o Increase the size of the database package cache. This cache stores dynamic SQL
               statements and allows for their reuse
4.1.7      Query Tuning
The following SQL statement clauses may improve the performance of your application.
         Use the FOR UPDATE clause to specify the columns that could be updated by a subsequent
          positioned UPDATE statement.
         Use the FOR READ/FETCH ONLY clause to make the returned columns read only.
         Use the OPTIMIZE FOR n ROWS clause to give priority to retrieving the first n rows in the full
          result set.
         Use the FETCH FIRST n ROWS ONLY clause to retrieve only a specified number of rows.
         Use the DECLARE CURSOR WITH HOLD statement to retrieve rows one at a time and maintain
          cursor position after a commit

                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai               Page 20 of 35
                                                     Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



          Take advantage of row blocking, by specifying the FOR READ ONLY, FOR FETCH ONLY,
           OPTIMIZE FOR n ROWS clause, or if you declare your cursor as SCROLLing. This will
           improve performance, and, in addition, improve concurrency because exclusive locks are never
           held on the rows retrieved.
          Avoid DISTINCT or ORDER by if not required. This will help to eliminate any potential sorting
           that may have to occur.
              o Proper indexing may be used to eliminate SORTing also




5.        Hardware Design and Operating System Usage

5.1       Key Factors
This section will just discuss overall considerations for these factors and will not discuss detailed
calculations for capacity planning purposes.
5.1.1       Memory
Understanding how DB2 organizes memory helps you tune memory use for good performance. Many
configuration parameters affect memory usage. Some may affect memory on the server, some on the
client, and some on both. Furthermore, memory is allocated and de-allocated at different times and from
different areas of the system. While the database server is running, you can increase or decrease the size
of memory areas inside the database shared memory. You should understand how memory is divided
among the different heaps before tuning to balance overall memory usage on the entire system. Refer to
the “DB2 Administration Guide: Performance” for a detailed explanation of DB2‟s memory model and all
of the parameters that effect memory usage.
5.1.2       CPU
The CPU utilization goal should be about 70 to 80% of the total CPU time. Lower utilization means that
the CPU can cope better with peak workloads. Workloads between 85% to 90% result in queuing delays
for CPU resources, which affect response times. CPU utilization above 90% usually results in
unacceptable response times. While running batch jobs, backups, or loading large amounts of data, the
CPU may be driven to high percentages, such as to 80 to 100%, to maximize throughput.
DB2 supports the following processor configurations:
          Uni-Processor – a single system that contains only one single CPU
          SMP (Symmetric Multiprocessor) – a single system that can contain multiple CPU‟s. Scalability
           is limited to the CPU sockets provided on the motherboard.
          MPP (Massively Parallel Processors) – a system with multiple nodes connected over a high speed
           link. Each node has their own CPU(s). Scalability is achieved by adding new nodes.

                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                 Page 21 of 35
                                                       Product Name Here


Tivoli Systems                                                                (via Subject - File>Prop>Summary)



Things to consider regarding CPU:
         Inefficient data access methods cause high CPU utilization and are major problems for database
          system. Refer back to section 4.1.1.
         Paging and swapping requires CPU time. Consider this factor while planning your memory
          requirements.
5.1.3      I/O
The following are rules of thumb that can be used to calculate total disk space required by an application.
If you have more detailed information, use that instead of the ROT‟s.
         Calculate the raw data size
             o Add up the column lengths of your database tables
             o Multiply by the number of rows expected
         Once you have the raw data size, using the following scaling up ratios to factor in space for
          indexing, working space, etc.
             o OLTP ratio: 1:3
             o DSS ratio: 1:4
             o Data warehouse ratio: 1:5
Consider the following to improve disk efficiency:
         Minimize I/O – access to main memory is orders of magnitude faster than to disk. Provide as
          much memory as possible to the database Bufferpools and various memory heaps to avoid I/O
         When I/O is needed, reading simultaneously from several disks is the fastest way. Provide for
          parallel I/O operations by:
             o Using several smaller disk rather than one big disk
             o Place the disk drive(s) on separate controllers
5.1.3.1    Choosing Disk Drives
There are several trends in current disk technology:
         They get bigger every year, roughly doubling in capacity every 18 months.
         The cost per GB is lower each year.
         The cost difference of the two smallest drives diminishes until there is little point in continuing
          with the smaller drive.
         The disk drives improve a little each year in seek time.
         The disk drives get smaller in physical size.



                                              IBM Confidential

18. January 2011                           CMDB RR API SDD1, Ling Tai                 Page 22 of 35
                                                      Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



While the disk drives continue to increase capacity with a smaller physical size, the speed improvements,
seek, etc., are small in comparison.
A database that would have taken 36 * 1 GB drives a number of years ago can now be placed on one disk.
This highlights the database I/O problems. For example, if each 1 GB disk drive can do 80 I/O operations
a second, this means the system can do a combined 36 * 80 = 2880 I/O operations per second. But a
single 36 GB drive with a seek time of 7 ms can do only 140 I/O operations per second. While increased
disk drive capacity is good news, the lower numbers of disks cannot deliver the same I/O throughput.
5.1.3.2    Recommendations

         When determining your I/O requirements consider:
             o OLTP Systems
                        reading data will involve reading indexes
                        inserts and updates require data, index, and logs to be written
         Provide for parallel I/O operations
             o Use DMS tablespaces and separate data and indexes in separate tablespaces
             o Use the smallest disk drives possible purely on the basis of increasing the number of disks
               for I/O throughput. If buying larger drives, use only half the space (the middle area – it‟s
               the fastest) for the database, and the other half for:
                        Backups
                        Archiving data
                        Off hour test databases
                        Extra space used for upgrades
5.1.4      Network
The network can influence the overall performance of your application, but usually manifest itself when
there is a delay in the following situations:
    o The time between when a client machine sends a request to the server and the server receives this
      request
    o The time between when the server machine sends data back to the client machine and the client
      machine receives the data
Once a system is implemented, the network should be monitored in order to assure that its bandwidth is
not being consumed more than 50%.
5.1.4.1    Recommendations
The following techniques can by used to improve overall performance and avoid high network
consumption:


                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                 Page 23 of 35
                                                     Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



      o Transmit a block of rows to the client machine in a single operation. This is accomplished by
        using the BLOCKING option in the pre-compile or bind procedures. Refer to section 4.1.8, Query
        Tuning, for other factors that influence row blocking.
      o Use stored procedures to minimize the number of accesses to the database. Stored procedures are
        programs that reside on the RDBMS server and can be executed as part of a transaction by the
        client applications. This way, several pre-programmed SQL statements can be executed by using
        only one CALL command from the client machine.
              o Using stored procedures will typically make it more difficult to run your application on
                different database platforms, like Oracle or SQL Server, because of the syntactical
                differences of their stored procedure implementation. So if you need to run your
                application on multiple database platforms, be aware of this consideration.



6.        Tuning Opportunities

This section contains some additional tuning considerations not already discussed.
6.1       Insert/Delete Performance
Here are some things to consider about insert and delete performance. The biggest bottlenecks are
typically:

      o    Read and Write I/O for index and data
      o    Active Log Write
      o    CPU Time
      o    Locking

We‟ve previously discussed ways to address many of these issues, such as:

      o Using DMS tablespaces and placing the table and index data into separate tablespaces to enable
        parallel I/O.
      o Providing for efficient logging
      o The use of parameter markers to prepare an INSERT statement once, and execute it many times.
      o Batching of SQL Statements
            o Via JDBC batch facility
      o Minimize Indexing

Here are a few other suggestions to improve insert performance:

      o Consider the use of APPEND MODE
      o Insert multiple rows with one INSERT statement
      o PCTFREE values for Data, Clustering Index, and non-clustering Index components.


                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai               Page 24 of 35
                                                     Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



The following database parameter is important.
   o Number of asynchronous page cleaners (NUM_IOCLEANERS) - This parameter controls the
       number of page cleaners that write changed pages from the buffer pool to disk. You may want to
       increase this to the number of physical disk drive devices you have. The default is 1.

The following database manager parameter may be important.

      o Enable intra-partition parallelism (INTRA_PARALLEL) – if you have a multi-processor SMP
        system, setting this parameter to YES may improve performance. The default is NO

The following registry variable can be used

      o DB2MAXFSCRSEARCH - The setting of this registry variable determines the number of Free
        Space Control Records (FSCRs) in a table that are searched for an INSERT. The default
      o value for this registry variable is five. If no space is found within the specified number of FSCRs,
        the inserted record is appended at the end of the table. To optimize INSERT speed, subsequent
        records are also appended to the end of the table until two extents are filled. After the two extents
        are filled, the next INSERT resumes searching at the FSCR where the last search ended.
            o To optimize for INSERT speed at the possible expense of faster table growth, set the
                 DB2MAXFSCRSEARCH registry variable to a small number. To optimize for space reuse
                 at the possible expense of INSERT speed, set DB2MAXFSCRSEARCH to a larger
                 number.

6.2      Database Manager Configuration Tuning
Each instance of the database manager has a set of database manager configuration parameters (also
called database manager parameters). These affect the amount of system resources that will be allocated
to a single instance of the database manager. Some of these parameters are used for configuring the setup
of the database manager and other, non-performance related information.
There are numerous database manager configuration parameters. I will only list some of the one‟s that
have a high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed
explanation of all the database manager configuration parameters.
      o Agentpri - This parameter controls the priority given both to all agents, and to other database
        manager instance processes and threads, by the operating system scheduler. Use the default
        unless you run a benchmark to determine the optimal value.
      o Aslheapsz – the application support layer heap represents a communication buffer between the
        local application and its associated agent. This buffer is allocated as shared memory by each
        database manager agent that is started. If the request to the database manager, or its associated
        reply, do not fit into the buffer they will be split into two or more send-and-receive pairs. The size
        of this buffer should be set to handle the majority of requests using a single send-and-receive pair.




                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                 Page 25 of 35
                                                      Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



      o Intra_parallel - This parameter specifies whether the database manager can use intra-partition
        parallelism on an SMP machine. Multiple processors can be used to scan and sort data for index
        creation.
      o Java_heap_sz - This parameter determines the maximum size of the heap that is used by the Java
        interpreter started to service Java DB2 stored procedures and UDFs (User Defined Functions).
      o Max_querydegree - This parameter specifies the maximum degree of intra-partition parallelism
        that is used for any SQL statement executing on this instance of the database manager. An SQL
        statement will not use more than this number of parallel operations within a partition when the
        statement is executed. The intra_parallel configuration parameter must be set to “YES” to enable
        the database partition to use intra-partition parallelism.
      o Sheapthres – the sort heap threshold determines the maximum amount of memory available for all
        the operations that use the sort heap including: sorts, hash joins, dynamic bitmaps (used for index
        ANDing and Star Joins), and operations where the table is in memory. Ideally, you should set
        this parameter to a reasonable multiple of the largest sortheap parameter you have in your
        database manager instance. This parameter should be at least two times the largest sortheap
        defined for any database within the instance.


6.3      Database Configuration Tuning
Each database has a set of the database configuration parameters (also called database parameters). These
affect the amount of system resources that will be allocated to that database. In addition, there are some
database configuration parameters that provide descriptive information only and cannot be changed;
others are flags that indicate the status of the database.
There are numerous database configuration parameters. I will only list some of the one‟s that have a high
impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation
of all the database configuration parameters.
      o Chngpgs_thresh - Asynchronous page cleaners will write changed pages from the buffer pool (or
        the buffer pools) to disk before the space in the buffer pool is required by a database agent. As a
        result, database agents should not have to wait for changed pages to be written out so that they
        might use the space in the buffer pool. This improves overall performance of the database
        applications.
      o Locklist - This parameter indicates the amount of storage that is allocated to the lock list. This has
        a high impact on performance if there are frequent lock escalations.
      o Maxlocks – maximum percent of lock list before escalation. Used in conjunction with the locklist
        parameter to control lock escalations.
      o Logbufsz - This parameter allows you to specify the amount of the database heap (defined by the
        dbheap parameter) to use as a buffer for log records before writing these records to disk.
        Buffering the log records will result in more efficient logging file I/O because the log records will
        be written to disk less frequently and more log records will be written at each time.


                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                Page 26 of 35
                                                     Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



      o Num_iocleaners - This parameter allows you to specify the number of asynchronous page cleaners
        for a database. These page cleaners write changed pages from the buffer pool to disk before the
        space in the buffer pool is required by a database agent. As a result, database agents should not
        have to wait for changed pages to be written out so that they might use the space in the buffer
        pool. This improves overall performance of the database applications. Set this parameter to be
        between one and the number of physical storage devices used for the database.
      o Num_ioservers - I/O servers are used on behalf of the database agents to perform prefetch I/O and
        asynchronous I/O by utilities such as backup and restore. This parameter specifies the number of
        I/O servers for a database. A good value to use is generally one or two more than the number of
        physical devices on which the database resides.
      o Pckcachesz – The package cache is used for caching of sections for static and dynamic SQL
        statements on a database. Caching packages and statements allows the database manager to
        reduce its internal overhead by eliminating the need to access the system catalogs when reloading
        a package; or, in the case of dynamic SQL, eliminating the need for compilation.
      o Sortheap - This parameter defines the maximum number of private memory pages to be used for
        private sorts, or the maximum number of shared memory pages to be used for shared sorts. Each
        sort has a separate sort heap that is allocated as needed, by the database manager. This sort heap is
        the area where data is sorted. Increase the size of this parameter when frequent large sorts are
        required.


6.4      Bufferpools
The buffer pool is the area of memory where database pages (table rows or indexes) are temporarily read
and manipulated. All buffer pools reside in global memory, which is available to all applications using the
database. The purpose of the buffer pool is to improve database performance. Data can be accessed much
faster from memory than from disk. Therefore, the more data (rows and indexes) the database manager is
able to read from or write to memory, the better the database performance. The default buffer pool
allocation is usually not sufficient for production applications and needs to be monitored and tuned before
placing your application in production.


6.5      Registry Variables
Each instance of the database manager has a set of Registry and Environment variables. These affect
various aspects of DB2 processing.
There are numerous Registry and Environment variables. I will only list some of the one‟s that have a
high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed
explanation of all the Registry and Environment variables.


      o DB2_Parallel_IO - While reading or writing data from and to table space containers, DB2 may use
        parallel I/O for each table space value that you specify. The degree of parallelism is determined by

                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                Page 27 of 35
                                                        Product Name Here


Tivoli Systems                                                                 (via Subject - File>Prop>Summary)



           the prefetch size and extent size for the containers in the table space. For example, if the prefetch
           size is four times the extent size, then there are four extent-sized prefetch requests. The number of
           containers in the table space does not affect the number of prefetchers. To enable parallel I/O for
           all table spaces, use the wildcard character, ″*″. To enable parallel I/O for a subset of all table
           spaces, enter the list of table spaces. If there is more than one container, extent-size pieces of any
           full prefetch request are broken down into smaller requests executed in parallel based on the
           number of prefetchers. When this variable is not enabled, the number of prefetcher requests
           created is based on the number of containers in the table space.



7.        Monitoring Tools

DB2 provides several tools that can be used for monitoring or analyzing your database. These monitoring
and analyzing tools, along with their purposes are:
      o Snapshot Monitor – captures performance information at periodic points of time. Used to
        determine the current state of the database
      o Event Monitor - provides a summary of activity at the completion of events such as statement
        execution, transaction completion, or when an application disconnects.
      o Explain Facility - provides information about how DB2 will access the data in order to resolve the
        SQL statements.
      o db2batch tool - provides performance information (benchmarking tool)




8.        Tuning Methodology

8.1       Tuning Approach

The objective of tuning an RDBMS is to make sure that the system is delivering good performance. As
with most things, the ability to meet that objective depends on the effort and resources you apply to the
tuning effort. The following are areas that would cause a tuning effort to be started:
          Regular task - Regular periodic monitoring and tuning is standard practice. Many sites do a review
           of performance on quarterly, half-yearly, or yearly intervals.
          Generated warning - The automated monitoring of the system to warn that performance is
           degrading and has hit some threshold.




                                               IBM Confidential

18. January 2011                            CMDB RR API SDD1, Ling Tai                 Page 28 of 35
                                                      Product Name Here


Tivoli Systems                                                               (via Subject - File>Prop>Summary)



         Emergency - There is an emergency in performance or response time, which has been highlighted
          by user feedback. The tuning must identify the problem, recommend a solution, and then work out
          how to avoid this happening again.
         New system - A newly built system requires initial tuning for maximum performance before going
          into production. In many cases, a new system might be put in production before being optimally
          tuned because of the difficultly of generating user workloads artificially and not being able to
          predict real user workloads, work patterns, and data volume and distribution. For this reason, it is
          critical that the design principals outlined in this document be followed, so as to minimize the
          effects of the unknown as much as possible.
         System change – if the change is significant, this is similar to a new system if testing cannot
          effectively simulate production.
Regardless of the reason, the tuning approach will largely be the same.
8.2       Skills Needed
While this document focuses on database design and tuning, there are many other factors that influence
the overall performance of any application. Other skills that may be required tune the overall system
include:
         Hardware Experts – for the various hardware platforms you plan to run on
         Operating System Experts – for the various operating system platforms you plan to run on. This
          could include:
             o System administration experts
             o Operating system performance and tuning experts
         Relational Database skills including:
             o DBA skills
             o SQL Tuning
         Middleware Experts – if using middleware products such as Websphere
         Applications Experts
             o Functional knowledge of the application
             o Technical experts on the products used to build the application
                        Java
                        C++
8.3       General Process
The following process is recommended to improve the performance of any system:
      1. Establish performance indicators.

                                             IBM Confidential

18. January 2011                          CMDB RR API SDD1, Ling Tai                 Page 29 of 35
                                                         Product Name Here


Tivoli Systems                                                                    (via Subject - File>Prop>Summary)



      2. Define performance objectives.
      3. Develop a performance monitoring plan.
      4. Carry out the plan.
      5. Analyze your measurements to determine whether you have met your objectives. If you have,
         consider reducing the number of measurements you make because performance monitoring itself
         uses system resources. Otherwise, continue with the next step.
      6. Determine the major constraints in the system.
      7. Decide where you can afford to make trade-offs and which resources can bear additional load.
         (Nearly all tuning involves trade-offs among system resources and the various elements of
         performance.)
      8. Adjust the configuration of your system. If you think that it is feasible to change more than one
         tuning option, implement one at a time. If there are no options left at any level, you have reached
         the limits of your resources and need to upgrade your hardware.
      9. Return to Step 4 above and continue to monitor your system.


On a periodic basis, or after significant changes to your system:
             Perform the above procedure again from step 1.
             Re-examine your objectives and indicators.
             Refine your monitoring and tuning strategy.


8.4           DB2 Specific Tuning
There are numerous documents and books written describing detailed formal tuning methodologies. This
document is not intended to repeat, or rewrite, any of that. I will discuss some specific DB2 topics, tools,
and techniques that can be used to tune your DB2 database.
8.4.1          SQL Reviews
As mentioned earlier in this document, SQL Reviews are essential for a good performing RDBMS
system. SQL reviews in DB2 are generally done using SQL Explain facility. Explain allows you to
capture information about the access plan chosen by the optimizer as well as performance information
that helps you tune queries. Before you can capture explain information, you need to create the relational
tables in which the optimizer stores the explain information and you set the special registers that
determine what kind of explain information is captured. These tables can be created:
              Automatically by the DB2 Control Center
              Running the following command from a DB2 command window
                  o Db2 –tf EXPLAIN.DDL             (located in sqllib/misc directory)

                                                IBM Confidential

18. January 2011                             CMDB RR API SDD1, Ling Tai                  Page 30 of 35
                                                   Product Name Here


Tivoli Systems                                                          (via Subject - File>Prop>Summary)




 DB2 provides a number of facilities to view the information generated by the EXPLAIN facility. These
 include:
        Visual Explain – you invoke Visual Explain from the Control Center to see a graphical display of
         a query access plan. You can analyze both static and dynamic SQL statements.
        db2exfmt – this command line tool is used to display explain information in preformatted output.
        db2expln and dynexpln – these command line tools are use to see the access plan information
         available for one or more packages of static SQL statements. Db2expln shows the actual
         implementation of the chosen access plan. It does not show optimizer information. The dynexpln
         tool, which uses db2expln within it, provides a quick way to explain dynamic SQL statements
         that contain no parameter markers. This use of db2expln from within dynexpln is done by
         transforming the input SQL statement into a static statement within a pseudo-package. When this
         occurs, the information may not always be completely accurate. If complete accuracy is desired,
         use the explain facility. The db2expln tool does provide a relatively compact and English-like
         overview of what operations will occur at run-time by examining the actual access plan
         generated.


 Which tool you use depends on a number of factors including:
        how complicated is the SQL
        Do you have access to the actual program/code that will be running
        Your familiarity with the tools


Previously, I discussed the functions of the DB2 optimizer. The DB2 optimizer uses information and
statistics in the DB2 catalog in order to determine the best access to the database based on the query
provided. This information is generally gathered using the RUNSTATS utility. The catalog information
includes the following types of information:
       Number of rows in a table
       Indexes defined on the table
       Whether the table needs to be REORGanized


8.4.2    Database Statistics
Ideally, you would use the explain facility mentioned above against a database that has production level
volumes loaded into it. The access path chosen may be drastically different if a table has 100 rows in a
test database and 10,000,000 rows in a production database. This difference is exacerbated if the SQL
query involves a join of multiple tables.


                                           IBM Confidential

18. January 2011                       CMDB RR API SDD1, Ling Tai               Page 31 of 35
                                                     Product Name Here


Tivoli Systems                                                             (via Subject - File>Prop>Summary)



Often times, it is not easy or practical, particularly early on, to have a database loaded with production
level volumes. DB2 provides a facility for simulating a production database by allowing you to code
SQL UPDATE statements that operate against a set of predefined catalog views in the SYSSTAT
schema. With SYSSTAT, the database administrator is able to simulate production volumes in a test
database. Updating these views allows the optimizer to create different access plans under different
conditions. The following is a sample of one of these UPDATE statements:
          UPDATE SYSSTAT.TABLES SET CARD = 850000 WHERE TABNAME =‟CUSTOMER ‟


Although this facility is provided, it should be used only as an initial facility for updating statistics.
There are many inter-relationships about the database tables in the catalog, and understanding how to
update all of them correctly is extremely difficult.


8.4.2.1    Using Database Statistics for SQL Access Plan Analysis
The following sources should be used in the order listed when using the EXPLAIN facility to analyze
your SQL statements. Make sure any statistics you use are current by running the RUNSTATS utility.
    1. production database (or an image of one)
    2. a test database loaded with a significant amount of data
    3. a test database updated with production statistics
             o The DB2 tool, db2look , is designed to capture all table DDL and statistics of a production
               database to replicate it to the test system.
    4. update the SYSSTAT views
             o the UPDATE statements of the SYSSTAT views are generated by the db2look facility.


8.4.3      SNAPSHOT and EVENT Monitors
DB2 maintains data about its operation, its performance, and the applications using it. This data is
maintained as the database manager runs, and can provide important performance and troubleshooting
information. For example, you can find out:
         The number of applications connected to a database, their status, and which SQL statements each
          application is executing, if any.
         Information that shows how well the database manager and database are configured, and helps
          you to tune them.
         When deadlocks occurred for a specified database, which applications were involved, and which
          locks were in contention.




                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                Page 32 of 35
                                                     Product Name Here


Tivoli Systems                                                              (via Subject - File>Prop>Summary)



         The list of locks held by an application or a database. If the application cannot proceed because it
          is waiting for a lock, there is additional information on the lock, including which application is
          holding it.


Collecting performance data introduces overhead on the operation of the database. DB2 provides
monitor switches to control which information is collected. You can turn these switches on by using the
following DB2 commands:
         UPDATE MONITOR SWITCHES USING BUFFERPOOL                           ON ;
         UPDATE MONITOR SWITCHES USING LOCK                                 ON ;
         UPDATE MONITOR SWITCHES USING SORT                                 ON ;
         UPDATE MONITOR SWITCHES USING STATEMENT                            ON ;
         UPDATE MONITOR SWITCHES USING TABLE                                ON ;
         UPDATE MONITOR SWITCHES USING UOW                                  ON ;
You can access the data that the database manager maintains either by taking a snapshot or by using an
event monitor.
8.4.3.1    SNAPSHOTs
Use the GET SNAPSHOT command to collect status information and format the output for your use. The
information returned represents a snapshot of the database manager operational status at the time the
command was issued. There are various formats of this command that are used to obtain different kinds
of information. The specific syntax can be obtained from the DB2 Command Reference. Some of the
more useful ones are:
         GET SNAPSHOT FOR DATABASE - Provides general statistics for one or more active databases
          on the current database partition.
         GET SNAPSHOT FOR APPLICATIONS - Provides information about one or more active
          applications that are connected to a database on the current database partition.
         GET SNAPSHOT FOR DATABASE MANAGER - Provides statistics for the active database
          manager instance.
         GET SNAPSHOT FOR LOCKS - Provides information about every lock held by one or more
          applications connected to a specified database.
         GET SNAPSHOT FOR BUFFERPOOLS - Provides information about buffer pool activity for the
          specified database.
         GET SNAPSHOT FOR DYNAMIC SQL - Returns a point-in-time picture of the contents of the
          SQL statement cache for the database.




                                            IBM Confidential

18. January 2011                         CMDB RR API SDD1, Ling Tai                 Page 33 of 35
                                                   Product Name Here


Tivoli Systems                                                           (via Subject - File>Prop>Summary)



You can create some simple scripts and schedule them to get periodic snapshots during your test cycles.


8.4.4    DB2BATCH
A benchmark tool called db2batch is provided in the sqllib/bin subdirectory of your DB2 installation.
This tool can read SQL statements from either a flat file or standard input, dynamically describe and
prepare the statements, and return an answer set. You can specify the level of performance-related
information supplied, including the elapsed time, CPU and buffer pool usage, locking, and other statistics
collected from the database monitor. If you are timing a set of SQL statements, db2batch also summarizes
the performance results and provides both arithmetic and geometric means. For syntax and options, type
db2batch.



9.      Publications & Online Help
The following books serve as reference material.


Information Units             Contents
Administration Guide:         This book contains information about how to configuring
Performance                   and tuning your database environment to improve
SC09-4821                     performance.

Database Performance          This Redbook contains hints and tips from experts that
Tuning on AIX                 work on RDBMS performance every day. It also provides
SG24-5511-01                  introductions to general database layout concepts from a
                              performance point of view, design and sizing guidelines,
                              tuning recommendations, and performance and tuning
                              information for DB2 UDB, Oracle, and IBM Informix
                              databases.
DB2 UDB V7.1                  This IBM Redbook will provide you with guidelines for
Performance Tuning Guide      system design, database design, and application design
SG24-6012                     with DB2 UDB for AIX Version 7.1. It also discusses the
                              methods that are available for performance analysis and
                              tuning.




10. Sizing


                                          IBM Confidential

18. January 2011                       CMDB RR API SDD1, Ling Tai               Page 34 of 35
                                                   Product Name Here


Tivoli Systems                                                           (via Subject - File>Prop>Summary)



The following spreadsheets are provided in the Project Database on an “as-is” basis as samples to assists
with tablespace and index sizing, and DB2 memory utilization.
   1. TBLDATA-zOS.XLS - Spreadsheet to estimate space requirements for a table on the mainframe
   2. INDXDATA-zOS.XLS - Spreadsheet to estimate space requirements for an index on the
      mainframe
   3. TBLDATA-dist.XLS - Spreadsheet to estimate space requirements for a table in the distributed
      environment
   4. INDXDATA-dist.XLS - Spreadsheet to estimate space requirements for an index in the distributed
      environment
   5. DB2UDBMEMORY.XLS - a spreadsheet containing the estimated memory usage of the system




11. Reference




12. Addendum




                                          IBM Confidential

18. January 2011                       CMDB RR API SDD1, Ling Tai                Page 35 of 35

								
To top