Document Sample
testbed Powered By Docstoc
                                       14 June, 2011
                                          Revision 1

Objectivity Testbed
RD45 Guidelines Concerning Objectivity/DB
Parameters and Performance Measurements

1 Introduction

Since 1995, the RD45 [1] project has been investigating potential solutions to the problems
of providing object persistency services to the event data of the LHC experiments. The
currently preferred solution is based upon a combination of Objectivity/DB and HPSS,
accessible via the HepODBMS interface, distributed as part of LHC++.

Extensive performance tests have been carried out during the past years using
Objectivity/DB, in terms of scalability, indexing, clustering, re-clustering, and storage
overhead [1][3][4][5]. As a result of these tests, a number of recommendations concerning
various Objectivity/DB parameters can be made. This note discusses the main parameters
of interest, and makes recommendations concerning appropriate settings.

Although a number of wide-area tests were performed by the RD45 project, these were
more tests of the basic database functionality, rather than detailed studied of realistic use-
cases typical of the HEP environment. Such work is currently being performed in the
context of the MONARC project [2].

To assist the MONARC project in their work, we attempt below to define the relevant set
of parameters that are particularly important from the point of view of distributed ODBMS
performance based on Objectivity/DB. Thus, the main objective of this white paper is to
highlight the various issues that should be considered whenever attempting to understand
the performance of a test application.

2 The Distributed Environment

In a distributed database environment, there is a set of protocol stacks that collectively have
a significant influence on the final performance of the system. In this paper, we concentrate
on issues directly related to the distributed architecture of Objectivity/DB, as shown in
Firgure 1 below. There are clearly many other issues that affect the overall performance of
the system, such as network and local I/O, these can – and should be – optimised
separately. That is, the performance of a given test application should always be compared
to that of the underlying systems – e.g. as measured using a simple program performing C
I/O – and not given as absolute numbers. Some of the other issues that need to be
considered are listed below.

                   Figure 1 - Objectivity/DB Distributed Architecture

2.1 Other Performance-Related Issues
We list below a number of issues that need to be born in mind when optimising an
application. Recommendations concerning these topics can be found in other RD45 white
papers and status reports.

-   Data Clustering: by using appropriate data clustering, the number of page I/Os can be
    minimised (objects that are accessed together are stored together) and hence
    performance improved.
-   The Object Model: has a direct influence on the way objects are accessed and loaded
    into memory.
-   Some C++ binding programming guidelines: for example, use ooHandles instead of
    ooRefs. If a small set of objects will be referenced repeatedly, ooHandles will save the
    overhead of having to get the object from disk or locate it in cache because the object
    stays pinned in memory. Further details on this and other binding issues are given in
    Appendix A.
-   Parallel read and writes using multithreading and data partitioning to increase I/O
    bandwidth, refer to K.Holtman,J.Bunn [3]. Recent tests have demonstrated a write
    speed of up to 172MB/s with 30 worker processes running in parallel.

3 Objectivity/DB Parameters

The issues discussed below are valid not only for Objectivity/DB, but also for other
ODBMS products that are based upon a similar architecture. Some are configuration
parameters whilst others are based upon statistics gathered from the execution of a job.
Both need to be considered together to fully understand the overall behaviour of the

We have divided the relevant parameters of our distributed architecture in three groups:

1. client application,
2. data server,
3. lock server.

For each parameter we describe the impact on the overall system and how it can be set, if it
is a configurable parameter, or else measured. In the case of configurable parameters, the
default value, if any, is given.

In this paper, all system parameters refer to the Solaris operating system – the one used for
MONARC test-bed activities and the primary Unix platform for Objectivity/DB.
Equivalent parameters may typically be found on other systems.

Most of the parameters detailed in following section can be set via an environment variable
or via a C++ binding function. If you plan to set environment variables rather than the C++
binding, the use of the d_session class, included as part of the HepODBMS library, is
strongly recommended [7].

3.1 Client side
Measurable parameters

              MEASURABLE                    DESCRIPTION                               VALUE
              Number of classes             Total number of classes per schema.
                                            Although schema can be sub-divied
                                            into named schemas, this does not
                                            affect the performance.
              Fraction of attributes used   Fraction of attributes used inside an
                                            object. If you have a class with many
                                            attributes but you just access few of
                                            them all the time, you should redesign
                                            your object model by splitting the
                                            object in two, or creating an indexed
                                            key with those attributes, etc..
              Object Size                   Object Size on disk.
                                             . 14 bytes as Objectivity overhead +
                                            . the size of the attributes +
                                            . 14 bytes if any association exist +
                                            . 12 bytes per unidirectional
                                            association link. Refer to
                                            Objectivity/DDL manual for detailed
                                            Each bidirectional link requires twice
Object                                      the storage required for the
                                            unidirectional link.
Model                                       Use short associations links if objects
                                            belong to the same container.
                                            Objects are stored in eight byte
                                            boundaries, round your calculation to
                                            a multiple of eight.

              Size of ooVarray              An ooVArray is a variable size array
                                            that can be embedded on an object.If
                                            you use embedded ooVarrays in
                                            objects its size impacts directly in
                                            cache space because they are loaded
                                            at once in memory. A solution is to
                                            partitions large ooVarrays in smaller
                                            The maximum number of elements in
                                            an ooVArray is

                                                 2 ** 32 (4,294,967,296.)

                                            In any case you will run out of virtual
                                            memory before you reach this limit.

              TRANSACTION_TYPE                       The type of transaction will influence
              ( Read, Read-Mrow, Write )             on locking protocol. Specified when
                                                     opening the object to be accessed
                                                     A related parameter is the lock wait
              REAL_TIME_TRANS                        We calculate the real time of a
              = real+user+system                     transaction, before start is called and
                                                     after commit is called. Use timer
                                                     functions in HepODBMS library ( in
                                                     goodies directory)
              CPU transaction = user time +system
              Number of containers expanded          Number of container expands within
                                                     a transaction.
                                                     Expansion of containers takes a lot of
                                                     CPU time.
                                                     If it is known before hand that we are
                                                     going to fill several containers it is
                                                     better to set a high value of expansion
Per                                                  or set a big size at the beginning.
                                                     Obtain with ooRunStatus() [8]
transaction   Number of objects pinned in memory     Has impact on the cache used and
                                                     possible swapping.
                                                     To be calculated by the test
              Number of DB's pinned                  Has impact on look up on the catalog
                                                     and clustering.
                                                     To be calculated by the test
              AMS server used                        Yes or not.
                                                     In local DB the AMS server it is not
              Concurrent equivalent jobs accessing   Notice that the data server in release
              to the same AMS                        5.1 is not yet multithreaded, clients
                                                     will wait sequentially on a queue.
              Lockserver used                        Yes or not.
                                                     In STANDALONE mode the
                                                     lockserver it is not used. This mode is
                                                     only recommended for non
                                                     concurrent FDB usage.

      Number of associations resized      If this number is high it is
                                          recommended to set at least one
                                          association innediatly after creating a
                                          new object, this will reserve space fro
                                          the association links in the same page
                                          as the object itself. If you use many
                                          associations per object we would
                                          recommend to use a ooVarray of
                                          ooRefs instead.
                                          Obtained with ooRunStatus().
      Number of buffers used              If this is higher than the initial
      IMPORTANT                           number of buffer pages the initial
                                          number of buffer pages should be
                                          Obtained with ooRunStatus().
      Number of forced file closes        If this is not zero this means that the
                                          number of initial file descriptors in
                                          ooInit is too small. We should set a
                                          higher number.
                                          Obtained with ooRunStatus().
      Number of hash overflows            If this value is high we should
                                          increase the initial size of the
                                          container in which the hashed objects
                                          are located.
                                          Obtained with ooRunStatus().
      Number containers accessed in one   If there are a lot of containers
      transaction                         accessed we should reconsider the
                                          reclustering.This has direct influence
                                          on lock contention. Even if you are in
                                          read mode each time you access a
                                          container you send a message to the
                                          lock server.
                                          Obtained from test application.
      Disk reads                          It is the number of calls to the system
                                          read() function.
                                          Objectivity reads data (usually) in
                                          blocks of multiple page size. The
                                          smallest is one page, the largest is
                                          64KB. Blocks larger than one page (
                                          large objects and large ooVArrays)
                                          are done in a single read call (some of
                                          them are system data). See appendix
      Ratio buffers read/writes to disk   Indicates the clustering efficiency of
      IMPORTANT                           the program and cache efficiency.
      Number of replicas                  In case of an update transaction how
                                          many replicas has the database being
      Real elapsed time                   Obtained from the System time
Job   Number of transactions per job      The test program job may consist of
                                          many transaction

              Space in memory occupied              Memory space needed by the process
                                                    to run, i.e. with top command you
                                                    can obtain the total amount of
                                                    physical memory used by the task, in
                                                    kilobytes. Was the job swapping
                                                    when running the test?
              DB sizes                              Direct influence on replication
                                                    protocol and lookup performance.
              Number of containers per DB           Number of containers in a DB. It
Federation                                          gives an idea of the clustering.
              Number of databases                   Number of databases.
                                                    Large catalogues influence the lookup
                                                    time to find a database.
              Network Bandwidth                     Use a tool to calculate throughput of
                                                    the network.
                                                    Use tcp program to calculate raw
Network                                             network bandwidth between two
And                                                 points. It is in
File system   File System Bandwidth                 To check the file system protocol
              Latency                               To calculate the latency on the
                                                    network use ping command.
              Bandwidth seen from the application   For an update it is the total amount of
              for write                             bytes written/time duration of commit
              CPU despcription
              RAM                                   Mbytes
HOST          Raw Speed disk writing                Important if we access to DB’s to the
                                                    local disk without accessing the

Configurable parameters

              CONFIGURABLE                     DESCRIPTION                               VALUE
              PAGE_SIZE                        The maximum page size is 65 536
Federation                                     bytes. Set in oonewfd tool.

              OO_CONT_GROW                     Specifies the container grow by a         10%
              ( variable used by ooSession)    percentage of the current size. Set
                                               when creating a container, should be
                                               set big enough if the container will be
                                               filled up.
              OO_CONT_INIT                     Initial number of pages to allocate for   4 pages for
              ( variable used by ooSession)    the container. Should be set if it is     hashed
                                               known the final size of the container     container
                                               aprox. For data intensive write
                                               operations is recommend to set large      2 pages for
                                               size and and also a large container       non hashed
                                               grow percentage to avoid small
                                               container expansion.
              OO_LOCK_WAIT                     ooSetLockWait(int32 waitOption =
              ( variable used by ooSession)    oocNoWait)
                                               By default lock waiting is set to the
                                               RPC time out. In environments where
                                               the network latency may be high or
                                               there are large amounts of concurrent
Per                                            users it is recommended to use a
transaction                                    higher number.
              AMS TIMEOUT                      ooSetRpcTimeout(long sec = 25)
                                               The AMS time out is the time the
                                               client waits for an answer from the
                                               AMS. You can set a higher value
                                               with this function.
                                               IMPORTANT NOTE: if we set the
                                               RPC time out and at the same time
                                               the Lock Wait option bigger than
                                               zero the application crashes with a
                                               wrong error message indicating lock
                                               server time out when in reality is the
                                               ams server which time out.
              INDEX_MODE                       Only set if indexes are used and there
              ( CInsensitive, oocSensitive )   are continuous updates on them. For
                                               applications which only read the
                                               indexes this parameter has no
                                               Refer to Appendix A.

              OO_CACHE_INIT                       Initial number of buffer pages.            200
              (Can be set if you use ooSession)   Set in ooInit() function inside
                                                  You should try to reduce the size of
                                                  the buffer if you access the data only
Per process                                       once to avoid additional swapping.
                                                  Refer also to Appendix A.
              OO_CACHE_MAX                        Maximum number of buffer pages.            500
              (Can be set if you use ooSession)   Set in ooInit() function.
                                                  If the application needs more pages at
                                                  once than this limit an error is issued.

3.2 AMS server

                              PARAMETER                                           DESCRIPTION                               VALUE

                              % CPU used by AMS
                              Disk reads                                          Number of disk reads by the AMS
     AMS process              Concurrent users                                    Number of concurrent processes
    performance                                                                   accessing the AMS
                              Memory occupied                                     Check if the process is swapping
                              Number of socket connections                        Number of concurrent sockets opened
                              average                                             Use command Netstat -a
                              CPU description
                              Disk read/write speed

3.3 Lock Server
                              PARAMETER                                           DESCRIPTION                               VALUE

                              % CPU used by Lockserver
                              Number concurrent processes in the                  The maximum number of
                              same lockserver                                     concurrently active transaction using
                                                                                  a given Lock Server is 1031 (i.e.,
                                                                                  unique transaction ids). The
                                                                                  maximum number of concurrently
                                                                                  locked resources (to the first
                                                                                  approximation, that is containers
    LockServer                                                                    since there are very few FD and DB
    process                                                                       locks) on any given Lock Server is
                                                                                  100,003. The number of concurrently
                                                                                  held locks (a given container can be
                                                                                  locked by more than one transaction)
                                                                                  is limited only by available memory
                                                                                  (swap space, really).
                              Disk reads                                          Should be zero.                           0
                              Memory occupied                                     Check if the process is swapping.
                              Average of number of connections                    Refers to socket connections. Can be
                                                                                  obtained with netstat -a
                              CPU description
    HOST                      RAM

    AMS in Objectivity 5.1 is not multithreaded, this means clients are served sequentially. To be solved in version 5.2.

APPENDIX A: C++ binding guidelines

This appendix is not intended to provide an exhaustive guide to all programming issues
that need to be considered when optimising the performance of an application. However,
the most common pitfalls and recommended solutions – some of which are extracted from
Objectivity technical notes – are provided.

1- Use ooRefs if you are going to access the objects only once.

If you know in advance that you are going to access only once the object or simply you are
scanning a set of objects for counting issues for example, then we recommend to use
ooRefs to avoid pining in memory and save space.

2- Prefer ooHandles than ooRefs if you plan repeated access to the objects ( take care
NOT to over fill the cache with pointers ).

Objectivity/DB creates and manages a cache for storing user objects. This cache
management scheme is one of the features that allows the database developer to design and
deploy an application that can store large amount of data and give it the ability to truly

Coincidentally, it is also one of the features that we get the most questions about. Here is
an overview of how it works.

When a process first starts the user is required to call a function called ooInit(). Two of the
parameters to this function are initial cache size and maximum cache size. The initial size
is what the process starts with and maximum is what it is not allowed to grow beyond.

If the cache has grown to the maximum size and the application tries to create another
object the user will get the following message:

     Error #4306: new(<handle>): Storage Manager: All buffers are in use and no more can be allocated

The one simple rule to the way cache management works is that as long as there is a
ooHandle pointing to an object that object must stay in cache.

The above error means that every object in the cache has a handle pointing to it and there is
no more room for creating another.

This can happen if for example a user tries to create 10,000 objects by writing a recursive
routine that looks something like this:

  foo( int i )
              ooHandle(UserObj) aH;
              aH = new UserObj();
              if( i < 10000)
                     foo( i++);

By creating a handle for each object the routine pins every object in memory.

A way to make the error message go away is to simply increase the maximum size of the
cache. The only problem is that if the system does not have the virtual memory for that
cache size the application will eventually get the following error:

     Warning #4: new(<handle>): Can't allocate more heap space by using malloc(), the system will free
some space and retry it

A better way to to do the same thing would have been the following:

  foo( int i )
              ooHandle(UserObj) aH;
              for(i = 0 ; i < 1000000; i++)
                    aH = new UserObj();

This is because if the handle has been reassigned to another object or it goes out of scope
then the object it was pointing to can be taken out of cache and the space can be reused by
another object.

Sometimes when a program runs out of cache its not due to a design flaw but due to a bug.
An easy way to track the problem down is to call the following routine:

  void opiPrintHList(uint32 level, uint32 number);

This routine will print out how many handles are currently being used by the process and
should help track down where the handles are being created.

The first parameter 'level.' It controls the amount of data the routine prints for each handle.
If level = 0, the only thing that prints is the final message with the number of handles in the

stack. If set to 1, it prints the address of the handle, and if set to 2, it prints a detail line of
the handle contents.

The second parameter, number, can be used to restrict the number of handles printed. If set
to zero, the default is infinity. Note that setting number does not restrict the number of
handles scanned, but only those printed, so opiPrintHList can still be used to test the entire

3 - use indexes for fast random access

Indexes are represented as b-trees in Objectivity. When a search for an object is done using
an index, the root page of the 'tree' is read into cache and then a search for the object page
is done based on the index key. Only those pages needed to find the object are read in.

Indexes are good for range based queries. They provide fast and predictable search
capabilities at the cost of additional disk space and memory space to maintain the index.
Each index is stored in a container, although the index itself can operate on objects over a
specified scope of a container, database or federated. Also, running an application in
oocSensitive mode (see ooTrans::start(xxx, yyy, indexMode)) will have poor performance
if a scan is done for each object created. If the application does not perform any scans on
the newly created indexes, then it is better to use oocInsensitive mode, in which case all
indexes are updated once at the end of the transaction.

4 - use ooMaps for fast random access in frequently updated objects.

ooMaps are good for looking up one object at a time. They give better performance than
indexes on data involving frequent updates. Indexes perform better on read-only data.
ooMaps are implemented using a dynamically extensible hash table. To avoid
re-hashing the map, the user needs a good estimate of the initial number of hash bins. Also,
the user should make sure the ooMap and its ooMapElems are in a container by
themselves. Mixing the ooMap/ooMapElems with objects of another type will result
in poor performance of the map.

APPENDIX B: Calculation of maximum number of objects per container
Note: this example is extracted in its entirety from Objectivity/DB documentation.

An object is referenced via an Object Identifier (OID). An OID is composed of 4 16-bit
fields (DB-OC-PN-SN) where:

           DB = database ID
           OC = object cluster (container) ID
           PN = logical page number
           SN = logical slot number on PN

   Theoretically, the logical slot number SN has a maximum of (2**16)-1. However, each
slot takes 6 bytes and each object takes at least 8 bytes, the maximum number of slots per
page is (pagesize/14). For example, for a pagesize of 65528 bytes, the maximum number of
slots per page is 4680. Note that this maximum was obtained for objects of zero-size (the 8
bytes in the equation represent Objectivity overhead). As the size of your objects increase,
the number of slots per page decreases.


    Pagesize = 8192
    Average Object Size = 20 bytes (+ 14 bytes of Objectivity overhead)
    Containers have hashing enabled.
    No Objects (i.e., each container is 4 pages)

    Maximum # of databases = 65535
    Maximum # of containers/database = smaller of:
                        2000000000/(4 x 8192) = 61035
                         - or -
                       = 32767
    Maximum # objects/container = maximum pages per container(*)
                      x maximum number of objects per page
                   = 32767 x (8192/(20+14))
                   = 7894880

    (*) Assuming one container in this database (file) for this estimate,
       limited by the total number of pages for ALL containers in a given
       database to be less than 2GB/pagesize.

APPENDIX C Calculation of bytes write/read inside a program

The number of disk reads is the number of calls to the read() system function. The
calculation of bytes read/write depends if we are accessing the data via the AMS server or
the local file system.
Objectivity reads data (usually) in blocks of multiple page size. The smallest is one page,
and the largest is 64KB if we go via the AMS. Blocks larger than one page are used large
objects (object size bigger than page size) and large ooVArrays (some of them are system
Tests have proved that writing /reading ooVarrays of large sizes via the local file system
are done in one single read/write system call.

A better indication of how much data was written or read can be achieved in the following

 #include <HepODBMS/goodies/ooStats.h>

  // start transaction here

     xsmStat     stats; // this is unsupported class from ooSession
     osmGetStats( &stats ); // and this is unsupported function call
     int bw = stats.diskWriteBytes;
     int br = stats.diskReadBytes;

  // do your stuff
  // commit

     osmGetStats( &stats );
     int bytes_written_in_transaction = stats.diskWriteBytes - bw;
     int bytes_read_in_transaction = stats.diskReadBytes - br;

4   References

[1] RD45, A Persistent Object Manager for HEP,
[2] Monarc Project; http://www.cern.ch/MONARC/.
[3] K. Holtman: CPU requirements for 100 MB/s writing with Objectivity:
[4] K.Holtman: Clustering and Reclustering Hep data in object databases.
[5] M.Schaller: Objectivity Storage overhead.
[6] Objectivity: http://www.objectivity.com
[7] HepODBMS library:
[8] Using Objectivity C++ manual. "Monitoring and Tuning Performance" chapter 21-1