Type Your Title Here - DOC

Document Sample
Type Your Title Here - DOC Powered By Docstoc
					                                                                                       Database Administration

                               DATA COMPRESSION IN ORACLE 11G
                                                                                        Djordje Jankovic, MTS Allstream

Data compression is being implemented for quite some time in a number of different areas that deal with data
transmission and storage. Oracle first implemented compression in version 9i, with the introduction of bulk load
table compression and index compression. In 10g RMAN and DataPump metadata compression were introduced,
and then in 11g few important additions were made to the oracle compression arsenal. Two major improvements in
11g are the introduction of the new table compression algorithm along with the change of compression scope: while
in previous version compression was supported only for batch table loads, in 11g the full DML is supported for
compression. The second major addition in 11g is a new LOB storage method, called SecureFile which, amongst
other features, supports compression. Compression is also supported through the other tools: RMAN, Data Pump,
In this paper we will give a more detailed overview of the overall Oracle compression support in the version 11g.
Data compression is a process of encoding information using fewer bits than the non-compressed information would
take. An everyday example of compression is the use of acronyms, or shortcuts. So instead of writing “Please do
that as soon as possible” we write “Please do that ASAP”, encoding the 19–letter string “as soon as possible” with a
4-letter string ASAP. The compression ratio for the acronym alone is 19/4=4.75, which means that the compressed
message will need 4.75 times less storage then the non-compressed message.
The compression theory was first introduced in 1940s as part of the Information Theory. At that time the main area
of investigation was data transmission. The transmission media was much slower then nowadays and the ability to
send the information with fewer bits was very attractive. The compression theory has common elements with coding
theory and encryption. Nowadays compression is used in many areas: in data transmission, data storage: image, file
compression … One of the most well known implementation of the compression is the ZIP utility used to compress
The compression is based on the information entropy. A string that is deterministic, for example a string made of
repeating characters, has the entropy of 0, and the absolutely random string has the entropy of 1. The lower the
entropy of a string is, or in other words the higher the redundancy, the higher compression ratios can be achieved.
One of the important enablers of the compression is the redundancy of human languages. For example, the letters in
English language are not random. If all 28 letters had the same frequency, each letter would appear with the
probability of 3.57% (100/28). However, the most frequent letter, E1, appears with the probability 12.7% (i.e.
approximately every eight letter is E) and the least frequent letter, Z, appears with the probability of 0.074%. So letter
E is more then 170 times more frequent then the letter Z. In the traditional encoding schemes, e.g. ASCII, each letter
is coded with the same number of bits. However, if we were to code letter E with less bits and letter Z with more
bits, we would be able to encode the overall text with much less bits. This is the basic idea behind one of the
fundamental compression algorithms, the Huffman‟s code, which is used as part of most of the well known
compression algorithms: DEFLATE which is used in PKZIP, JPEG, …. If we go one step further from the letters
alone, the combination of letters and symbols, where by symbols we are thinking of punctuation characters, space, …,
has even higher redundancy. So, for example, the three-letter combinations “ a “ and “the” has much higher
probability then a random three-letter combination.
In a database environment we can look at redundancy at an additional dimension: atop of the redundancy inside an
individual string, i.e. a row, the redundancy usually exists at a column level across the table rows. The measure of

Paper #367                                                                                           Page 1 / 20
                                                                                           Database Administration

column redundancy is the column cardinality. A column with lower cardinality, i.e. with a low number of different
values, has higher redundancy and is so more suitable for compression. So, the idea in database compression would
be to find a string in a row (compression can and does go over the column boundaries) that appears more often then
a random string, and to code that string with the lower number of bits. In order to do that a dictionary with all the
frequent strings can be created. A dictionary entry would consist of an entry id and the string, where the length of the
entry_id would be much less then the length of the string. The row would then, instead of the string itself, contain
only the entry id that would point to the string in the dictionary table. Since the entry id is shorter then the string,
and since the string appears multiple times in the block (for now we will assume that compression is done on the
block level) the compression will bring the gain in storage. So for example, if in a block the string “ENGLISH”
appears 100 times, it would be put in the dictionary table, and the entry_id pointer would be left in those 100 rows.
Without compression the 100 occurrences of word ENGLISH would take 700 bytes (100 rows * 7 bytes per string),
whereas after the compression the storage used would be around 207 bytes: 9 in the symbol table (string+entry_id)+
100*2, where we assumed that the entry_id is 2 bytes. So we have saved ~ 500 bytes of storage.
As most of the techniques do, compression brings some benefits but also has the associated costs. The main benefits
are the storage savings and the performance gain. Since the compressed data is packed more densely, more rows will
fit in a block, and when the data is retrieved fewer blocks will need to be read. Since the database systems are most
often I/O bound the effect of the I/O gain could be significant. The price we pay for the compression, is the
process of compressing and decompressing data, which will consume CPU cycles. So the decision whether to
implement the compression, from the performance point a view, is a tradeoff between I/O gain and CPU loss.
Depending on the platform, system architecture and configuration, gains and losses can have different relationships.
Oracle performs table compression on the block level. That means that data redundancy at the block level is only
checked. Although global compression could in most cases achieve higher compression ratios, it would in most cases
be unfeasible, since all the data to be compressed would need to be buffered, which, in data warehouse environments
could require terabytes of buffering space. Another advantage of the block local compression is that the block is
self-contained: when it is been decompressed, no other blocks need to be read, which would reduce I/O and
contention for blocks with the compression dictionary. A consequence of the block level compression is that higher
compression ratios could be achieved if the table data is sorted before the load so that the redundancy on the block
level is increased2.
Table compression was first introduced in version 9i. It was than limited to bulk load operations, and so it was
mostly geared towards the Data Warehouse environments. In 11g, a new compression algorithm is implemented, and
even more importantly, compression implementation scope is widened to all the DML operations. This makes
compression suitable for OLTP environments as well. However, because of the compression operation overhead,
compression benefits are highest with static or WORM (Write Once Read Many) data, for example logging, auditing,
archive and similar data sets.
Table compression can be achieved in three ways:
       By specifying the clause when the table is created
       By altering a non-compressed table
       By creating a table in a compressed tablespace

    A table can be created as a compressed table by including the COMPRESS clause in the CREATE TABLE
    statement, like in:

2A valuable source for understanding internals of Oracle table compression is [POESS]. Although it talks about the encryption in
9i, most of the principles of Oracle table compression were routed in that version.
Paper #367                                                                                                Page 2 / 20
                                                                                     Database Administration

create table address
   (address_id       number
  , address          varchar2(100)
   pctfree 10 tablespace PERS_DAT;

The COMPRESS clause can also define one of the two compression modes: “FOR DIRECT_LOAD
that only the data that is loaded using a direct load will be compressed. The direct load operations are Direct Path
Sql*Loader load, “CREATE TABLE … AS SELECT” statement, parallel INSERT statement, INSERT statement
with the APPEND hint, and single row or array inserts or updates. The way the direct-load works is that, instead of
going through the SQL layer, the blocks are formed on the application/client side, and such formed blocks are
appended to the table. The new 11g clause “FOR ALL OPERATIONS” enables encryption for all DML operations:
individual inserts, updates and deletes. So, the example statement above, with addition of the “all” clause would be:
create table address
   (address_id       number
  , address          varchar2(100)
   compress for all operations
   pctfree 10 tablespace PERS_DAT;
If the COMPRESS sub-clause is not specified the “DIRECT LOAD ONLY” is the default, which makes the
statement backwards compatible with 9i and 10g.
There is also a NOCOMPRESS clause for the CREATE TABLE statement. Although the lack of any compress
setting by default means that there is no compression, the NOCOMPRESS clause should be not equated with the
lack of the clause. The NOCOMPRESS clause states the intention that compression is not allowed. When we
discuss the tablespace compression the difference between the two will be obvious.
The compression status of the table can be changed by altering the table, like in:
alter table address compress;
This statement, however, changes only the status of the table and does not compress the existing rows in the table.
Only the newly inserted rows in the blocks that were empty at the time the alter table statement was issued will be
compressed. The existing rows and the new rows inserted in the blocks that contained rows at the time the alter table
statement was issued will be left uncompressed. Similar to that, the reverse command:
alter table address nocompress;
will only change the status of the table and all the existing rows will be left compressed. This fact can create some
confusion, since there is no way of finding out the compression history of the table. A table can be created as
compressed, filled up, and then changed to NOCOMPRESS. When one checks the data dictionary, she will be under
the impression that the data is not compressed, whereas all the data in the table can be compressed. So, without
inspecting block dumps, and before Oracle releases some tools where this information could be figured out, one
would not be able to conclude whether the table has ever been compressed, i.e. whether it contains compressed rows.
In order to compress the existing rows in a table one should specify the MOVE clause in the alter table statement,
like in:
alter table address move compress;
The main use of this command, outside of the compression discussion, is to move a table to a new segment. The
new segment can be in another tablespace, can have different storage parameters, and, as in the example above, can
have different compression setting. So while the data is copied from the existing to a new segment the rows are
compressed. As opposed to the alter table command without the MOVE clause, which is a dictionary operation, and
which, so, finishes instantaneously, the alter table with the MOVE clause, will take time proportional to the number
of rows in the table.

Paper #367                                                                                       Page 3 / 20
                                                                                        Database Administration

The third way of compressing a table is to create the table in a compressed tablespace. A compressed tablespace is
created through the COMPRESS sub-clause inside the DEFAULT clause of the CREATE TABLESPACE


The COMPRESS clause has the same two mode sub-clauses, allowing compression only for direct load operations, or
for all loads. As with the table, the default is “FOR DIRECT_LOAD OPERATIONS”. The tablespace
compression setting is “only” the default setting, so table setting has precedence. The following table shows what the
outcome would be, for different combinations of table and tablespace compression settings:

               \       Tablespace     COMPRESS                NOCOMPRESS                 No clause
           Table \
           COMPRESS                   Compressed              Compressed                 Compressed
           NOCOMPRESS                 Not compressed          Not compressed             Not compressed
           no clause                  Compressed              Not compressed             Not compressed

As expected the table setting takes precedence, and the tablespace setting takes affect only in case where no
compression clause is specified on the table level. Note that NOCOMPRESS clause in CREATE TABLESPACE
statement is identical to no setting, unlike in the CREATE TABLE case, where NOCOMPRESS clause mandates
that there is no compression regardless of the tablespace setting.
Note that one can also alter the tablespace to a compressed or noncompressed one. The ALTER TABLESPACE
statement changes the default compression setting for the tablespace, which might, depending on the create table
settings, have effect on all newly created tables in that tablespace, but for the existing tables the compression setting
does not change. So, this is just a dictionary operation, which returns instantaneously.
The compression works on all datatypes except for variants of LOBs and data types derived from LOBs. For LOB
compression see next chapter: “LOB Compression: SecureFiles”.
For partitioned tables compression can be enabled or disabled on partition, table, or tablespace level. So, one table
can have both compressed and non-compressed partitions. This is very useful, since one may decide to compress
only the historic, read-only partitions. As we discussed for the non-partitioned tables, in order to compress the rows
in the table one should use the “MOVE” clause. So, for a partitioned table to compress only one partition, one
would use something like the following statement:
alter table XXX move partition XXX_2007_Q4 compress;

When the statement is run, all the local indexes for the XXX_2007_Q4 partition will become unusable, and they
would have to be rebuilt. To avoid that, the clause “UPDATE INDEXES” can be added to the “alter table”
statement shown above.
Compression setting of a table can be seen from the meta-data dictionary through the columns COMPRESSION and
COMPRESS_FOR columns of the xxx_TABLES views. The COMPRESSION column can have values
“ENABLED” or “DISABLED”,and the COMPRESS_FOR is NULL when the table is not compressed and will

Paper #367                                                                                            Page 4 / 20
                                                                                      Database Administration

have one of the values “FOR ALL OPERATIONS” or “DIRECT LOAD ONLY” when the table is compressed.
The information about the tablespace compression can be seen through the DEF_TAB_COMPRESSION and
COMPRESS_FOR columns of the DBA_TABLESPACES view. The column can have the same meaning and
contents as the corresponding columns in the xxx_TABLES view. For the partitioned tables, the metadata views also
contain compression related columns, with the similar meaning as in the xxx_TABLES: the DEF_COMPRESSION
As far as we know there is currently no way of finding out the level of compression ratio achieved in an existing table,
other then testing. In a presentation at the Oracle OpenWorld ‟07 [HODAK], a future package dbms_tabcomp that
will contain a procedure getratio which will be able to show the compression ratio, was announced to be “coming
soon”. One nice advertised feature of the procedure is that it can be run against an uncompressed table, in which
case it would show the projected compression ratio. In the White Paper [OWP02] a PL/SQL block that will create a
subset of the compressed and non-compressed table of the one that you are thinking of compressing is given. The
same concept is used in the code that follows:

The OLTP table compression (“FOR ALL OPERATIONS”) is part of the Advanced Compress option and is
licensed separately.

The compressed table block in its header contains a new structure: the symbol table (see Figure 1. copied from the
[OWP01]). The symbol table contains data that is repeating in the block‟s rows. The rows themselves contain a
pointer to the symbol table. That way, a repeating string appears only one time inside the block, and the rows contain
a much shorter pointer. The gain obtained by the compression is something like:
CompressionGain = (N-1)*CompressedStringLength - N*PointerLength - SymbolTableOverhead.
Where N is the number of times the compressed string appears in the block. So, the longer the compressed string is,
and the more times it appears inside the block, greater the gain is. That would point to the fact that low cardinality
CHAR and VARCHAR columns are the best candidates for table compression. Note that this is a somewhat
simplistic view since compression is not done on the column level, but on the row level. So if there is correlation
between the columns that should increases the compression ratio. Since very little details of the compression are
published the best way to estimate the compression that could be achieved for a table we are considering for
compression is through a try. We have selected one of our production tables which contains few low cardinality
columns of moderate length: the ADDRESS table. We compressed the whole table and achieved the compression
ratio of 2.37. We wanted then to see what is the contribution of individual columns to the compression ratio. So, we
selected three columns:

COLUMN_NAME                     Distnct NunNullsPerc AverageLen
------------------------------ -------- ------------- -----------
CITY_NM                           5,538          1.2%          10
PROVINCE_STATE_ID                    56          0              3
STREET_TYP_PLACEMENT_CD               3         10.4%           7

The CITY_NM name column has pretty high cardinality, whereas STREET_TYP_PLACEMENT_CD has very low
cardinality. At the same time the later column takes 7 bytes in average, so one would expect that quite a good
compression ratio could be achieved by compressing the table containing only that column. However, when we
created a table with that column alone, the achieved compression ratio was insignificant: 1.07, i.e. only 7% gain. We
tried sorting the column before the load, but the compression rate did not change. Similar, insignificant, gains were
achieved for the other two one-column tables. We then created a table with all of those three columns, and we
achieved the compression ratio of 1.63, which is still lower then the compression ratio for the entire table, 2.37, but
much higher then for any of the individual columns. The immediate conclusion, which could be made, but would
Paper #367                                                                                           Page 5 / 20
                                                                                      Database Administration

have to be proven with much more testing, is that compression benefit is higher for the tables with more columns.
But the real main conclusion is that before a table is compressed a test for that particular table should be run.

                              Figure 1 Compressed vs. Non-compressed Block Structure

A dump of a compressed block shows difference from the block of the noncompressed table. Here the few
differences that we noted:
- The symbol table is not clearly marked in the dump, but can be identified in the “byte” dump part of the block.
- The rows in the block dump are separated in two logical “tables”, tab0 and tab1, one containing the non-
compressed and one containing the compressed rows. The compressed rows are located at the “bottom” of the
block, and the non-compressed blocks are at the “top” of the block – closer to the header. This allows for
incremental compression which will be discussed later in this chapter.
- There is quite a difference in the row representation as well. The traditional representation of a row in a block
dump, and the one that is still in effect for non-compressed blocks, is something like:
. .   .
tab   0, row 0, @0x1f6b
tl:   21 fb: --H-FL-- lb: 0x0            cc: 3
col     0: [ 7] 4c 41 43 48 49           4e 45
col     1: [ 2] c1 0b
col     2: [ 6] 53 55 46 46 49           58
tab   0, row 1, @0x1f5d
tl:   14 fb: --H-FL-- lb: 0x0            cc: 2
col     0: [ 7] 4d 49 52 41 42           45 4c
col     1: [ 2] c1 0b
tab   0, row 2, @0x1f49

Paper #367                                                                                          Page 6 / 20
                                                                                   Database Administration

tl: 20 fb: --H-FL-- lb: 0x0 cc: 3
col 0: [ 6] 45 4c 4d 49 52 41
col 1: [ 2] c1 09
col 2: [ 6] 53 55 46 46 49 58
. . .

So, for each row, the contents of all the row columns, except for the NULL columns at the end of the row, is shown
in the block dump. The table this dump was done for, has three columns.
The dump of the compressed block:

tab 0, row 0, @0x1c6d
tl: 5 fb: --H-FL-- lb:         0x0 cc: 2
col 0: [ 6] 53 55 46           46 49 58
col 1: [ 2] c1 09
bindmp: 00 67 02 13 1f
tab 0, row 1, @0x1bbe
tl: 7 fb: --H-FL-- lb:         0x0 cc: 2
col 0: [ 6] 50 52 45           46 49 58
col 1: [ 2] c1 0b
bindmp: 00 40 02 17 ca         c1 0b
tab 0, row 2, @0x1bec
tl: 5 fb: --H-FL-- lb:         0x0 cc: 2
col 0: [ 6] 53 55 46           46 49 58
col 1: [ 2] c1 03
bindmp: 00 35 02 13 20

looks somewhat different in few respects. The row information does not always show all the rows, although it
sometimes does. It seems that for the non-compressed rows, more of the rows are shown. However when a row
contents in the byte block dump is inspected it does not seem that all the columns that are shown in the row part of
the dump exist, non-compressed, in the row. We were not able yet to find out what is the logic there. Each row
information is ended by bindmp information, which is exclusive to compressed block, but appears in both
compressed and non-compressed rows.
The compression ratio does not seem to have been significantly, if at all, changed between 9i, 10g and 11g database
For DML compression, which was introduced in 11g, compression is performed in delayed/batch fashion. When
rows are inserted through a non-direct load (e.g. insert statement) they are first inserted non-compressed. So, no
performance impact is endured during the inserts. When a block is full up to an internal threshold it is then
compressed. So, each block can contain compressed and non-compressed rows (as seen in the block dump as
described above). The whole compression process is shown on Figure 2. ([OWP01]). As can be seen from the
diagram, a block in a compressed table can contain only non-compressed rows (before it reaches the fullness
threshold), it can contain only the compressed rows, and it can contain a mixture of compressed and non-compressed
rows, after the block has once been compressed, but the newly added rows have still not filled up the block to the
The batch compression is good from the performance point of view since no time is lost for the compression on the
insert time. On the negative side the delayed compression can lead to somewhat lower compression ratios.
To test the delayed batch compression we conducted the following experiment. We created three tables: one non-
compressed and two compressed. Of the two compressed tables we loaded one using a direct load, and the other one
using DML load. The space usage of the three tables is:

TableName                      NumBlocks AvgRowsPerBlock
------------------------------ --------- ---------------
T03_ADDR_ORIG_DA                      84          119.04
T03_ADDR_ORIG_DML                    114           86.82
T03_ADDR_ORIG_NC                     197           50.76
Paper #367                                                                                      Page 7 / 20
                                                                                     Database Administration

So, in the non-compressed table, Oracle was able to pack 50.76 rows in a block using 197 blocks in total. In the
Direct-Load table there were 119 rows in the block with 84 blocks used, for a compression factor of 2.35. The DML
table was somewhere in between: almost 87 rows per block with 114 blocks used, and compression factor 1.73. The
explanation for that is that Oracle first fills the blocks with non-compressed rows, and then, while further blocks are
being populated, the already populated blocks get compressed, some space gets released in those blocks and then new
rows get inserted there, …, which finally ends up with not all blocks being fully populated.

                                     Figure 2 Delayed DML Compression in 11g

There does not seem to exist any initialization parameters related to the compression. However, in the version we
were working on, there are 24 hidden parameters dealing with compression.
When compression is enabled “FOR ALL OPERATIONS” new columns can be added to the table and existing
columns can be dropped. However, when compression is enabled for “FOR DIRECT_LOAD OPERATIONS”
only, columns cannot be dropped, and the new columns can be dropped only if DEFAULT values are not specified
for those columns.
If table compression needs to be enabled on one or more partitions of a table where none of the partitions already
have compression enabled, and if the table has bitmap indexes defined, the bitmap indexes need first to be marked
unusable, the compression attribute changed, and the bitmap indexes rebuilt. Once at least one partition of a
partitioned table has compression enabled, when the further partitions need to be compressed, there is no need for
any change to the bitmap indexes.
Logical standby databases do not support the segment compression storage type.

Paper #367                                                                                         Page 8 / 20
                                                                                       Database Administration

In our experiments we obtained the same compression ratios as Oracle claims: between 2 and 4. As we mentioned
earlier the compression ratio for a particular table will heavily depend on the column cardinality, clustering of rows
with the similar/same values, size of the block, and it is quite difficult to estimate, so running a test is a best
estimating mechanism.
When selecting the compressed data, the decompression will consume some CPU time, but then at the same time the
fact that a compressed table is stored in less blocks then the non-compressed one, a table scan that reads many blocks
of the table, would benefit from lower I/O. So, CPU loss for decompression – I/O gain while reading the full table.
Which one would prevail in a particular case would depend on the query and table specifics: compression ratio, query
type, … . Overall, our tests show that selecting from compressed data is in average 10% slower, but we were getting
different results for different tests, so again the best advice is that one should run a test for each of their particular
cases which should give a best estimate of the future performance gain or loss.
In terms of the compression impact on the data-load, Oracle claims that overhead on loading the data is between 1%
and 3%. Our tests did not find any major slowdown neither for direct load nor for the DML load.
We did not notice a significant difference for the DELETE statement. For a DELETE with a filtering condition, if
the columns used in the filtering conditions are compressed, there will be some decompression involved. Also, if a
row that is deleted is compressed, the symbol table would need to be changed: either the counter of the number of
columns that are compressed has to be decremented, or if this is the last column that was compressed, then the entry
should be removed from the symbol table. So, in any case some additional operations would need to be performed,
but they are typically insignificant from the performance point of view.
For the UPDATE it all depends on the data. If part of the row that is compressed is updated, the UPDATE can
endure a considerable slowdown, since the data would need to be decompressed and then updated, which might
further require a change in the symbol table … . There is, however, one potential benefit for an UPDATE statement.
Suppose that the table contains a column called COLOR, and that you want to update the table to change all
occurrences of color=„BLUE‟ to color=‟DARK BLUE”. If the value „BLUE‟ was compressed, the only occurrence
of it will appear in the symbol table, so Oracle would need to change only the symbol table, and would not need to
touch any rows (the pointers will remain the same) which might provide performance gain.
In Oracle 11g a new storage paradigm for LOBs is introduced. The legacy LOB storage is now called BasicFile and
the new way of storing LOBs is requested by specifying the storage parameter SecureFile. The SecureFile storage
offers few benefits: compression of data, avoiding duplicate LOBs, encryption.
With the SecureFile LOBs storage Oracle has reengineered the way the LOB is managed. That is why many of the
LOB parameters changed their meaning, or were made obsolete, and some new parameters were introduced – for
example the ones to support the new functionality” compression, deduplication and encryption. The comparison of
the LOB storage parameters was given in the following table:

Parameter                  BasicFile meaning                                SecureFile meaning
CACHE / NOCACHE            CACHE: Oracle places LOB blocks in the buffer cache for faster access, which is convenient for
/ CACHE READS              LOBs that are frequently read and written. CACHE implies that the LOGGING is set as well.
                           NOCACHE (default): LOB blocks are not cached, which is convenient for LOBs that are
                           occasionally read and never written.
                           CACHE READS: LOBS are placed in buffer cache only for reads and not for writes, which is
                           convenient for LOBs that are frequently read and rarely written to.
CHUNK                      The amount of space, which is a multiple of      Only the advisory size.
                           database block size, a multiple of which will
                           be allocated for any out-of-line stored LOB.
Paper #367                                                                                            Page 9 / 20
                                                                                     Database Administration

Parameter                BasicFile meaning                                SecureFile meaning
FREEPOOLS                Number of groups of free lists for the LOB       The parameter is ignored.
                         segment. Can be specified only if the
                         database is running in automatic undo
                         mode. In this mode, FREEPOOLS is the
                         default unless you specify the FREELIST
                         GROUPS parameter.
FREELISTS                Number of free lists for each of the free list   The parameter is ignored.
                         groups for the LOB. Ignored for ASSM
FREELISTS GROUPS         The number of groups of free lists for the       The parameter is ignored.
                         LOB object. Ignored for ASSM tablespaces.
LOGGING /                LOGGING – DDL and DML is logged in the redo log file.
NOLOGGING /              NOLOGGING – DDL and DML is not logged.
LOGGING                  FILESYSTEM_LIKE_ LOGGING –                       FILESYSTEM_LIKE_ LOGGING – only
                         Invalid for BasicFile LOBs.                      the metadata is logged.
MAXSIZE                  Can be specified for SecureFile LOBs only.       Specifies the amount of space that old LOB
                                                                          can take when the RETENTION parameter is
                                                                          set to MAX. If and when this size is
                                                                          consumed, the old version disk space is
                                                                          recycled regardless of time requirements and
                                                                          as needed.
PCTVERSION               Percentage of LOB data space that can be         The parameter is ignored.
                         occupied by old versions of LOB data blocks.
                         As soon as old versions of LOB data blocks
                         start to occupy more than the PCTVERSION
                         amount of used LOB space, Oracle tries to
                         reclaim the old versions and reuse them.
RETENTION                An alternative to PCTVERSION parameter.          Database will automatically manage the
                         The RETENTION parameter specifies that           LOB retention for Flashback and
                         LOB retention would be governed by the           consistent reads. The possible values for
                         database setting of the                          this parameter are:
                         UNDO_RETENTION setting. For the                  AUTO (default) – UNDO is retained for
                         RETENTION parameter the database has             consistant read purposes only.
                         to be in the Automatic Undo Management           MIN n – Can be set when the database is in
                         Mode.                                            Flasback mode and the LOBs will be kept
                                                                          for n seconds.
                                                                          MAX – Limits the size of the LOB UNDO
                                                                          in bytes. IF MAX is specified the
                                                                          MAXSIZE has to be specified as well.
                                                                          NONE – No UNDO is required neither
                                                                          for consistent read nor for the flashback

Since Oracle did not publish much about the internal workings of LOB compression, all the conclusions are based on
our own tests, and should be confirmed in the environments where the compression will be implemented. LOB
Paper #367                                                                                       Page 10 / 20
                                                                                     Database Administration

compression is done inside an individual LOB - each LOB is compressed individually. So, the higher redundancy
inside a LOB the higher compression ratio could be achieved, whereas the redundancy between different LOBs will
not result in higher compression ratios. LOB compression is most likely using one of the variants of a standard
compression algorithm.
Another feature of the SecureFile LOB storage is LOB deduplication. In certain application exact same LOBs are
stored multiple times in the database: e.g. mails sent to multiple addresses, mail attachments, … . LOB deduplication
is designed to achieve gains in these cases: only one copy of a same LOB is stored in the database, and the pointers to
that copy are stored in the rows instead of the duplicate contents. This not only conserves the storage, but also
makes the access faster.
The desired storage paradigm, BasicFile or SecureFile, can be set through the LOB storage clause in the CREATE
TABLE statement, or by setting the initialization parameter db_securfile. An example of the CREATE TABLE
statement with all possible parameters for the lob storage both a BasicFile and a SecureFile LOB is:

Paper #367                                                                                        Page 11 / 20
                                                                                      Database Administration

create table test_lob
   (DF        CLOB
  , BF        CLOB
  , SFD       CLOB
  , SFN       CLOB
  , SFCD      CLOB
  , SFCM      CLOB
  , SFCH      CLOB
  , SFKD      CLOB
  , SFDD      CLOB
  , SFCDD     CLOB
   LOB(DF)    STORE AS                          TEST_LOB_DF

Let‟s analyze these LOBs:
DF         - Since storage type is not specified the BasicFile, or traditional, LOB is created. This guarantees that all
              legacy applications will work the same way as in earlier versions.
BF         - The BasicFile storage paradigm is specified, so a legacy LOB is created. There is no difference between
              BF and DF.
SFD        - The SecureFile storage paradigm is specified. Note that there are no compression or deduplication
              parameters specified, so no compression and deduplication is enabled.
SFN        - The NOCOMPRESS clause is specified, so compression will not be enabled. The SFN is exactly the
              same as SFD.
SFCD - The COMPRESS clause is specified, so compression is enabled. Since no compression degree is
              specified, the default MEDIUM compression will be applied.
SFCM - The medium compression is set. The SFCM is identical to SFCD.
SFCH - The high compression is set. Higher level of compression requires more CPU resources, and as a
              consequence it will take more time. How much more compression would the HIGH achieve over the
              MEDIUM compression setting depends on the data.
SFKD - The KEEP_DUPLICATES clause is specified. This is a default setting, so SFKD is exactly same as
SFDD - The DEDUPLICATION clause is specified, so duplicate LOBs will not be stored.
SFCDD - The deduplication and compression are interdependent setting, and both can be set, or only one of them
              as in the previous examples.
Different LOB parameters can be set for partition LOBs: in one partition LOB can have SecureFile and in another
BasicFile storage. The SecureFile parameters, compression and duplication, can also be configured differently in
different partitions, as shown in the following example:

Paper #367                                                                                        Page 12 / 20
                                                                                    Database Administration

CREATE TABLE test_lob_part
   (REGION       VARCHAR2(20)
  , ID           NUMBER
  , A            BLOB
   tablespace CMPR_CA
           lob(A) STORE AS BASICFILE tlp_a_p1
           lob(A) STORE AS tlp_a_p2
           lob(A) STORE AS tlp_a_p3 ( NOCOMPRESS )
           lob(A) STORE AS tlp_a_p4 ( KEEP_DUPLICATES )
Note that most of the traditional LOB storage parameters are still in effect, however some are kept for backward
compatibility but have no effect,
The default LOB setting can be set through the DB_SECUREFILE initialization parameter that can be set on the
session or on the system level. The parameter can take the following values:
    NEVER           - Any LOBs that are specified as SecureFiles are created as BasicFile LOBs. All SecureFile-
                     specific storage options and features (for example, compress, encrypt, deduplicate) will cause an
                     exception. The BasicFile LOB defaults will be used for storage options not specified.
    PERMITTED - LOBs are allowed to be created as SecureFiles. This is DEFAULT setting.
    ALWAYS          - All LOBs created in the system are created as SecureFile LOBs. If the LOB is not created in
                     an ASSM tablespace, an error will occur. Any BasicFile LOB storage options are ignored. The
                     SecureFile defaults will be used for all storage options not specified.
    IGNORE          - The SECUREFILE keyword and all SecureFile options are ignored.

If the COMPATIBLE initialization parameter is not set to 11.1 or higher, then LOBs are not treated as SecureFiles.
If there is a LOB column with two partitions, one in an ASSM-enabled tablespace and the other one in a
tablespace for which ASSM is not enabled, the LOB in the partition in the ASSM-enabled tablespace will be
treated as SecureFiles and the LOB in the other partition will be treated as BasicFile LOB.
The SecureFile LOB can be created only in an ASSM tablespace. If contrary is tried the error:

ORA-43853: SECUREFILE lobs cannot be used in non-ASSM tablespace "XYZ"

would be returned.
There are few columns added to the DBA_LOBS view in relation to the SecureFile storage paradigm:
SECUREFILE             - Shows the storage paradigm selected. It is „YES‟ for SecureFile and „NO‟ for BasicFile
COMPRESSION            - Shows whether the LOB is compressed. The value for BasicFile is „NONE‟, and for
                          SecureFile: „NO‟ if there is no compression, „MEDIUM‟ and „HIGH‟ for the medium
                          and high compression respectively.
DEDUPLICATION - Shows whether the de-duplication is turned on for the LOB. The values are „NONE‟ for
                          BasicFile and „YES‟ or „NO‟ for SecureFile stored LOBs.
ENCRYPT                - Shows whether the encryption is turned on for the SecureFile. The encryption is outside
                          of the scope of this paper.

Paper #367                                                                                      Page 13 / 20
                                                                                        Database Administration

As far as we know there are no tools available to finding out the level of compression achieved, or the gain achieved
through deduplication.
When talking of LOB encryption performance we should first look into the performance of the SecureFile LOB
without any “special” options versus BasicFile LOB. The SecureFile LOB name in the 11g Beta was FastFile. This
name is justified by the performance improvement over the BasicFile LOB. In our tests the improvement for both
writing (populating the LOB column) and reading is over 30%. The improvement is only in the case of LOBs stored
in their own segment (“DISABLE STORAGE IN ROW”). For in-row stored LOBs it appears that there were no
changes in the LOB architecture, so consequently there are no performance implications.

Three types of Oracle indexes may be subject to index key compression. Bitmap indexes are by their definition
stored in compressed format and they are not in the scope of our further analysis. The B-tree indexes and Index
Organized Tables (IOT) are subject to key compression. The principle of index key compression is similar in both of
those objects, so in the rest of this paragraph we will talk about index compression, and most of the said will apply to
both index and index-organized-tables compression.
Similarly to the table compression, index compression is performed on the block level. The idea behind the
compression comes as a natural one: since the index entries inside are ordered by the index key, one can expect that
on a block level there will be redundancy, especially at the beginning of the index key. So, if an index is a compound
one, i.e. the index key is made of multiple columns, one can expect that in one block the first columns in the index
key will have proportionally more repeating values. And where there are repeating values, there is a potential for
compression. The previous assumption is true only if the first column of a compound index does not have too high
selectivity. If it is highly selective, for example it is a unique column, index compression can turn from a friend to an
enemy. In case where the columns to be compressed are not carefully selected, the compressed index can end up
being larger then the non-compressed index.
When creating a compressed index Oracle divides the indexing columns in prefix columns and suffix columns. The
prefix is the grouping part and suffix is the unique part. The prefix columns values are stored in a prefix table in the
block header, and the index entries contain only a pointer to the prefix table. If the prefix columns are not selective
on a block level, there will be quite a few entries in the prefix table, so the index entries‟ pointers would just be an
overhead. Also, if the prefix columns are very short, the pointer can be longer then the prefix columns, which will
again result in the loss of space,, and the compressed index being larger then the non-compressed one. So, correctly
selecting the prefix columns is essential for the creation of an efficient compressed index.
From the discussion above it can be concluded that the best index candidates for compression, are compound
indexes, where the initial index columns have low cardinality. This might, at first, seem contradictory, to the usual
practice of putting the high cardinality columns as the initial columns of a compound index. From the point of view
of index scan efficiency the order of columns does not really matter. It does matter if an index would be used for
searches where only few indexes columns are used in the filtering condition, so a tradeoff between the compression
and that use of index should be evaluated for each particular case.
When compressing an index one can specify how many columns will make a prefix. By default, if the number of
prefix columns is not specified, Oracle will for non-unique indexes put all the columns in the prefix, and for the
unique index the first n-1 columns will be in the prefix. From that it is obvious that a unique index with only one
column cannot be compressed, since the prefix would have to contain none of the columns. At the same time
compressing a unique index with one column would anyway not make too much sense.
It is very important to properly asses which columns should create the compressed index prefix. Suppose that the
index is created of three columns (col1, col2, col3), where col1 has low cardinality, and col2 and col3 make the index
unique. Since the index is unique, there are two options: the prefix could contain col1 alone or (col1, col2). In the
case of col1 only the prefix will have low cardinality, and in case (col1, col2) it might have pretty high cardinality. So,
most likely the much better solution would be to define the prefix with col1 alone.

Paper #367                                                                                           Page 14 / 20
                                                                                  Database Administration

An index is compressed using a statement:
CREATE INDEX XXX_I01 on XXX(col1, col2, col2) COMPRESS n;
In the statement above, “n” should be substituted with the number of leading columns that should make the prefix.
As we discussed earlier for a non-unique index n can be between 1 and the number of index columns (in the above
example 3), and for a unique index n can be between 1 and number of index columns minus one (if the index in the
above case were unique it would be between 1 and 2). The consequence of that is that one column unique index
cannot be compressed:

Paper #367                                                                                    Page 15 / 20
                                                                                     Database Administration

create unique index XXX_U01 on XXX(col1) compress;
ERROR at line 1:
ORA-25193: cannot use COMPRESS option for a single column key
The COMPRESSION column in the xxx_INDEXES view will indicate whether an index is compressed
(ENABLED) or not (DISABLED), and the and PREFIX_LENGTH column bears the information as to how many
indexing columns are used as the compression prefix.
Compression can be enabled or disabled when the index is rebuilt as in:
For the partitioned indexes compression has to be enabled on the index level when the index is built. Compression
settings for individual partitions can then be enabled or disabled either when they are created or rebuilt.
The best way to evaluate whether an index should be compressed is to run a test: create a table with the representative
data, and compare the storage requirements and the gain in search time. For the initial assessment, one can run the
ANALYZE INDEX statement with the VALIDATE STRUCTURE clause, and then inspect the INDEX_STATS
view. There are two columns that would provide a guideline, as to how many columns should make the index prefix,
and how much space saving could be gained by compressing the index:
OPT_CMPR_COUNT NUMBER                     - Optimal key compression length
OPT_CMPR_PCTSAVE NUMBER                   - Corresponding space savings after an ANALYZE
There is a also a script at the ixora www (see [IXORA]) that should suggest which indexes are good candidates for
compression. Although compression, in most cases where the storage gain is obvious, will be a right choice, one
should be aware that compression can increase the contention of database blocks.

DataPump compression was first introduced in 10g. It was then limited to metadata compression. In 11g the
compression was extended to the data itself.
The DataPump compression is set through the parameter COMPRESS, which has the following possible values:
ALL, DATA_ONLY, METADATA_ONLY, and NONE. The default setting is METADATA_ONLY. In order to
use the new 11g options, i.e. all except then METADATA_ONLY, the COMPATIBILITY parameter has to be set
to 11.0.0.
The storage gain the compression will provide depends on the data contents. The metadata compression usually does
not offer too much space savings in absolute numbers. However, in a lot of shops the metadata exports are run on a
daily basis, and are kept for a longer time for auditing purposes, so the number of export files that are kept can
become considerable, and so the storage savings. The relative savings in storage for metadata DataPump export is
tens of percents. So, for example, a meta-data-only export that would non-compressed take 10M, it might take
somewhere between 5M and 7M when compressed. The data compression usually offers much higher level of
compression, both in relative an absolute terms. The compression ratios are similar to the ones that would be
achieved if the DataPump is done without compression and then the file is compressed using a zip program. Our
tests indicate that DataPump compression ratio is a bit higher then the zip compression ratio.
The very compression operation in DataPump will obviously consume some CPU resources. However, our tests did
not show significant increase in the DataPump runtime, when data is being compressed. The reason for that can
probably be contributed to the fact that the compressed file is much smaller in size, so the loss in CPU time to
perform compression is offset by the gain in I/O to write the bigger file to disk. We also compared doing DataPump
compression, and doing DataPump without compression, and then compressing the file using the zip utility. The
conclusion is that DataPump compression is much more efficient, for same reason that were just discussed.
The DataPump compression is part of the Advanced Compress option and is licensed separately.

Paper #367                                                                                       Page 16 / 20
                                                                                     Database Administration

Recovery Manager, RMAN, offers binary compression of the backup set backups. The compression was introduced
in 10g, and it was improved in 11g by adding a new, more efficient, compression algorithm ZLIB. Note that the
binary compression is different from RMAN “compression” in earlier releases, where by compression Oracle called
the process of avoiding to copy empty blocks to the backup set, which made the backup set smaller then the data
copy backup.
While database compression will generate smaller backups it will increase the CPU usage. So, again it is a tradeoff
between storage savings and CPU usage increase. One should asses both of those resources and make a decision
which is the best approach in the appropriate situation. The situations which may warrant paying the performance
price and use the compression are:
    When disk space in your flash recovery area or other disk-based backup destination is limited.
    When a backup to some device over a network is performed and the reduced network bandwidth is more
     important than CPU usage.
     An archival backup media such as CD or DVD is used, where reducing backup sizes saves on media costs and
      archival storage.
If the backup is done to the tape and the tape device performs its own compression, you should not use both RMAN
backup set compression and the media manager vendor's compression. In most instances the media manager's
compression will offer better results.
Compression is performed in the RMAN Copy phase, when blocks are copied from database files to backup set.
This is done at the same time when backup validation and encryption are done. Since RMAN is using the standard
binary compression algorithms, the compression ratio is very similar to the one that would be achieved by
compressing the non-compressed backup using the operating system compression tools. It depends on the type of
the data stored in the database and its redundancy, but one can expect compression ratios of 2-10. Since RMAN
appears to be doing compression on the block level, if the tables are already compressed using the table compression,
we can expect a lower compression ratio.
The database compression is set by using the “COMPRESSED BACKUPSET” clause in the backup statement, as in:


It can also be configured as default using the CONFIGURE statement as in:

new RMAN configuration parameters:
new RMAN configuration parameters are successfully stored

Note that compression is not available for the DATA COPY backups, but only the backup set backup.
When the database is recovered, there is no need to specify any different clauses in the recovery clauses then if the
backup is not compressed. RMAN will infer the compression setting and automatically configure decompression by
the proper compression algorithm.
By default Oracle will use the BZIP2 algorithm for compression. The BZIP2 algorithm is optimized for maximum
compression ratio. The RMAN compression, as any other compression, adds CPU load to the system. If this leads
to serious performance degradation, a new algorithm, ZLIB, introduced in 11g, which is optimized for speed, can be
used to lower the CPU load. Since it is optimized for speed and not for compression ratio, the ZLIB compression
generates larger backups then BZIP2. In order to use the ZLIB algorithm, the COMPATIBILE initialization
parameter has to be set to at least In order to use the ZLIB algorithm one needs to have the license for
the Advanced Compression option. The RMAN compression algorithm can be set using the CONFIGURE
command as in:

Paper #367                                                                                       Page 17 / 20
                                                                                    Database Administration

The list of available algorithms can be seen through the V$RMAN_COMPRESSION_ALGORITHM. Amongst
other information it shows whether the algorithm is compatible with the database compatibility setting and which is
the default algorithm. For now there are two compression algorithms supported, that we mentioned above.
The information about the backup compression setting can be seen in a number of ways:
       The configured encryption algorithm can be seen by running the SHOW command as in:
        RMAN> show compression algorithm;
        RMAN configuration parameters for database with db_unique_name DJ11 are:
       Whether the compression is on or not, can be seen from the backup output while the backup is running, and
        from the log afterwards, can be seen in the output message:
        channel ORA_DISK_1: starting compressed incremental level 1 datafile backupset
       The compression setting of the past backups can be seen from the COMPRESSED column (the
        V$BACKUP_FILES also contains the BS_COMPRESSED column) of the following views:
       The achieved compression ratio can be seen from the following views:
       The output of the LIST BACKUP command also shows whether the backup was compressed or not, as in:
        RMAN> list backup;

        using target database control file instead of recovery catalog

        List of Backup Sets

        BS Key Type LV Size        Device Type Elapsed Time Completion Time
        ------- ---- -- ---------- ----------- ------------ ---------------
        1       Full    1.03M      DISK        00:00:22      24-FEB-08
                BP Key: 1   Status: AVAILABLE Compressed: YES Tag: TAG20080224T104641
                Piece Name:
          SPFILE Included: Modification time: 24-FEB-08
          SPFILE db_unique_name: DJ11
          Control File Included: Ckp SCN: 3061672       Ckp time: 24-FEB-08

3   Note that documentation wrongly adds the word “TO” to the statement.
Paper #367                                                                                      Page 18 / 20
                                                                                      Database Administration

     BS Key Type LV Size        Device Type Elapsed Time Completion Time
     ------- ---- -- ---------- ----------- ------------ ---------------
     2       Full    1.70G      DISK        00:06:57     24-FEB-08
             BP Key: 2   Status: AVAILABLE Compressed: NO Tag: TAG20080224T111007
             Piece Name:
       List of Datafiles in backup set 2
       File LV Type Ckp SCN    Ckp Time Name
       ---- -- ---- ---------- --------- ----
       1       Full 3062388    24-FEB-08
     . . .

If the compatibility was not set correctly, or the compression algorithm was not specified correctly one of the RMAN
compression-related errors might be returned:
RMAN-06806: compression algorithm not supported
        Cause:    An unsupported compression algorithm was specified for the backup.
        Action:   Query the V$RMAN_COMPRESSION_ALGORITHM view for the list of supported
                  compression algorithms. Specify a valid compression algorithm and retry the command.
RMAN-06807: compression algorithm cannot be used because database compatibility is less than string
        Cause:    The compression algorithm is not supported because the current compatibility level of the database
                  is too low.
        Action:   Query the V$RMAN_COMPRESSION_ALGORITHM view for the list of supported
                  compression algorithms. Specify a valid compression algorithm and retry the command.

Physical Data Guard is Oracle solution for Disaster Recovery. A standby database, which is an exact copy of a live
database is kept at a distant location, and if the live database becomes subject to a disaster, the standby database can
assume its role. The synchronization between the live and the standby database is archived through redo logs or
archived redo logs. The logs are shipped from the live database location to the standby location and applied to the
standby database, which is operating in a special, so called managed recovery, mode. There is a possibility that logs
do not get shipped from the live to the standby location. This is most often caused by the network problems. The
standby database will then start lagging more and more behind the live database. This situation is called a redo log
gap. After the underlying problem that disallowed redo log shipment has been resolved, all the logs that were
generated in the meantime, need to be copied to the standby location and applied to the standby database. This is
called gap resolution. Depending on the duration of the problem situation, the number of archived logs that need to
be transferred can become significant, and so the shipment of those logs can take extended time. At the same time it
is critical that the gap resolution is done in a shortest possible time, so that the live and standby database again get
into sync. In order to speed up the gap resolution, in 11g, Oracle has introduced an option that archived logs can be
compressed during the gap resolution. This option can be configured by specifying the compress clause in the
ARCHIVED_LOG_DEST_n initialization parameter, as shown in the following setting:
Note again that this setting applies only while the Data Guard gap resolution is in effect, and not for normal archived
log generation. The compression setting for the archived log destination can be seen through the view

Paper #367                                                                                         Page 19 / 20
                                                                                   Database Administration

Oracle has started implementing different compression techniques started with version 9: table compression for batch
loads, index compression. In 10g they added compression of metadata in then new export utility DataPump. Finally,
in version 11g, they further enhanced the compression support, and packaged it separately licensed option: Advanced
Compression Option.
Most of the 11g added features are part of the Oracle Advanced Compression option, which is separately licensed,
and which contains the following features:
    SecureFile Compression and Deduplication
    Data Pump Compression
    Fast RMAN Compression (using the ZLIB algorithm)
    Data Guard Network Compression


[DSF_LOB] Oracle® Database SecureFiles and Large Objects Developer's Guide, 11g Release 1 (11.1), Part
          Number B28393-02
[DSQL]    Oracle® Database SQL Language Reference, 11g Release 1 (11.1), Part Number B28286-01
[FOOTE]   Richard Foote‟s Blog, A series of Blogs on index compression, and indexes in general,

[GORMAN] Tim Gorman, Quick Tips #3: Table Compression Do‟s and Don‟ts, IOUG-A Live 2003
[HODAK]   Bill Hodak, Advanced Compression in Oracle 11g, Oracle Open World November 2007
[IXORA]   Ixora News – October 2000,
[KYTE]    Thomas Kyte, Expert Oracle Database Architecture, 9i and 10g Programming Techniques and
          Solutions, Apress 2005
[MISHRA]  Sanjay Mishra, Compressing Data for Space and Speed, Oracle Magazine, March/April 2004.
[OWP01]   Oracle Advanced Compression, An Oracle White Paper, June 2007
[OWP02]   Table Compression in Oracle Database 10g R2, An Oracle White Paper, May 2005
[OWP03]   Table Compression in Oracle9i Release 2, An Oracle White Paper, May 2002
[OWP04]   Table Compression in Oracle9i Release 2: A Performance Analysis, An Oracle White Paper, May 2002
[POESS]   Meikel Poess, Dmitry Potapov, Data Compression in Oracle, Proceedings of the 29th VLDB
          Conference, Berlin, Germany
[WIKI]    Wikipedia, Different entries referred to in the footnotes,

Paper #367                                                                                     Page 20 / 20

Shared By: