Schema Design by vivek1982s

VIEWS: 406 PAGES: 75

									Schema Design - Constraints It is time to visit the business rules for each attribute, documenting what you find in column comments, and enforcing it by constraints. However on tables that are a part of data-heavy loads or transformations, or performance-sensitive queries, you should evaluate each and every constraint to make absolutely sure it is required and defined correctly. NOT NULL Use NOT NULL judiciously. Although NOT NULL check constraint enforcement requires miniscule overhead, there is no need to put them on every column in your model unless the results of business analysis and requirements gathering demand it. Add a NOT NULL constraint to any columns that will be used by predicates in performance-intensive queries. Although a nullable column requires less space to index, when the CBO has to compare a column to NULL, it frequently forces a full scan of the table. This is a disaster on huge table query performance. If the raw data loaded into that column is actually empty, replace it with something non-empty but that can still be indexed and treated as NULL. I frequently use '~', 'U', -1 and 'Unknown' as m y NULL replacements. Tip: If your development, delivery and change environment require the databases to be compared frequently and kept in-sync, consider naming each and every NOT NULL constraint (see example below). If you do not explicitly name them, Oracle names them for you starting with "SYS_". The system-named constraints will never match across systems and always show up on difference reports. Unique There is never a good reason, even if performance is a concern, to skip or remove a uniqueness constraint. If there is a natural column or columns in the table that describe a unique piece of data that should never be repeated, ensure you enforce that rule with a unique constraint. The consequence can and probably will be duplicate rows that skew results, mislead users, ruin reports and crash or hurt applications. Referential It is job #1 of every data modeler, architect and DBA to protect the integrity of the data. One of the primary methods of accomplishing this is by defining and enforcing relationships through referential constraints, aka. foreign keys. In the performance-sensitive portions of your systems, pay special attention to the child columns in foreign keys. They must be indexed either independently, or as the leading column in a composite index, to prevent response time problems during parent key DML validation and updates.


This query can help you find possible missing FK indexes: SELECT DECODE(b.table_name, NULL, 'MisngIdx?', 'OK') status, a.table_name, a.columns, b.columns FROM (SELECT table_name, constraint_name, LTRIM(SYS_CONNECT_BY_PATH(column_name, ','), ',') columns FROM (SELECT ucc.table_name, ucc.constraint_name, column_name, ROW_NUMBER() OVER(PARTITION BY ucc.constraint_name ORDER BY position) rn, COUNT(*) OVER(PARTITION BY ucc.constraint_name) cnt FROM user_cons_columns ucc, user_constraints uc WHERE ucc.constraint_name = uc.constraint_name AND uc.constraint_type = 'R') WHERE rn = cnt START WITH rn = 1 CONNECT BY PRIOR constraint_name = constraint_name AND PRIOR rn = rn - 1) a, (SELECT table_name, index_name, LTRIM(SYS_CONNECT_BY_PATH(column_name, ','), ',') columns FROM (SELECT table_name, index_name, column_name, ROW_NUMBER() OVER(PARTITION BY index_name ORDER BY column_position) rn, COUNT(*) OVER(PARTITION BY index_name) cnt FROM user_ind_columns) WHERE rn = cnt START WITH rn = 1 CONNECT BY PRIOR index_name = index_name AND PRIOR rn = rn - 1 ORDER BY table_name, index_name) b WHERE a.table_name = b.table_name(+) AND b.columns(+) LIKE a.columns || '%'; There are very rare exceptions in tables used primarily for loading massive quantities of data, where having foreign key validation, although miniscule per row, can still cost too much time. If the data can be cleaned and validated before loading or during transformation, humans aren't allowed to enter data into the table and the columns


won't be used as the primary method of filtering a predicate, you might choose to skip the referential constraint and accompanying index. Check Check constraints are another great, built-in method of protecting your data and enforcing business rules. Create a data model checklist that you fill out or use as a guide for each new table created, and ensure that it includes an item to "Carefully analyze whether each column requires a check constraint to protect it". Unlike datatypes, check constraints aren't required in order to create a table. Having a checklist to introduce a little consistency and rigor has helped me a lot. Like the advice with foreign keys, on certain tables dedicated to receiving massive amounts of data from an automated loading system, if humans aren't included in the process and you can be reasonably certain about the cleanliness of the data, it may improve performance to leave off check constraints. In the DDL example below, the check constraint for the two-value flag column is probably unnecessary if only one internal system feeds this table. But in a table with data that a human can enter and modify, you would want to have as many client-side and server-side protections as possible to keep the data in compliance with business rules. You might also make the judgment call on rare occasions to reduce the number of moving parts in your system. In general, it is always a good idea to create a new "type" or "lookup" parent table for each new set of unique codes you find during analysis. However, if a particular codeset will only be used on one table, it only contains a handful of valid values and there is no need to store additional attributes about each code because the codes are self-explanatory, you might use a simple check constraint. This would be in place of a full-blown parent table and foreign key, to monitor the column for the short list of valid values. You may find this improves performance as well as reducing system complexity and storage space used for the child column indexes. This is an example of what carefully crafted DDL looks like (minus table and column comments and storage clauses for the table and indexes): CREATE TABLE temp ( temp_id INTEGER CONSTRAINT temp_id_nn NOT NULL ,temp_cd VARCHAR2(10) CONSTRAINT temp_cd_nn NOT NULL ,temp_descr VARCHAR2(500) ,active_yn VARCHAR2(1) DEFAULT 'Y' CONSTRAINT temp_active_yn_nn NOT NULL ,mod_by VARCHAR2(60) DEFAULT NVL( SYS_CONTEXT('userenv','client_identifier'), SYS_CONTEXT('userenv','session_user') ) ,mod_ts TIMESTAMP(4) DEFAULT SYSTIMESTAMP ,CONSTRAINT temp_pk PRIMARY KEY (temp_id)


,CONSTRAINT temp_uk UNIQUE (temp_cd) ,CONSTRAINT temp_active_yn_chk CHECK (active_yn IN ('Y','N')) ); If your shop's DDL tends to look like a simple list of columns and datatypes, this is an indication that the modeling phase is proceeding too quickly. This will introduce errors and performance issues and put your data integrity in jeopardy. Cascading Constraints Since one of the primary principles of performance tuning is to visit each piece of data as few times as possible, for certain tables that participate in delete transactions, you can create ON DELETE CASCADE constraints that will automatically traverse the relationships and delete the data from the child tables when the parent data is removed.

Triggers The explanation on using triggers as a part of designing for performance will be short: don't use them. Triggers have their place, but in a resource-constrained system, the context switch between the SQL and PL/SQL engines, as well as the mandatory detour from bulk/array to row-by-row operations make triggers deadly to performance goals. If you are used to using triggers to ensure the primary key is filled in transparently from a sequence, fill it in explicitly. If you are used to using triggers to monitor complex business rules, it will have to be done in another way, in bulk SQL operations. Just as the routines in your code should do one thing and one thing well, the automated drivers and processes in your system should have one job. Instead of trying to do loading, validation and transformation in one operation, design your application to resemble the assembly line, with each process performing its one job efficiently, putting its output in a bin for the next process, then returning to wait for more input. Such a design requires more tables (incoming and outgoing "bins" in the assembly line), code and thought, but in the end will be much more scalable and robust. These are the sorts of differences that accompany designing for a very large system. One can no longer mindlessly created textbook-like tables and slap a few indexes and triggers on them. Everything must be approached early on from the perspective of performance goals, which greatly alters how things are normally done.

One Sequence per Artificial Key Create one sequence for each artificial key. Name it after the table or the column that it supports, suffixed with "_SEQ". The idea of a single sequence to feed all your artificial keys is dangerous. This creates a point of lock contention that will cripple your future


scalability. Worse, once a system has started down the single-sequence path, it is prohibitively difficult to migrate data and fix the problem later on, so difficult that companies tend to decide to just live with the mistake and hope they survive long enough for the system to be rewritten. Some argue that one sequence per table is too many extra moving parts and increases the difficulty of managing the system. This is misguided. Sequences are the most reliable part of Oracle and rarely require much attention if named and used consistently. Adjust the Cache Based on your expected incoming data volume, adjust the sequence cache accordingly. In systems that receive thousands of rows per minute, the default cache of 20 is far too low. Managers and accountants worried about holes and wasted numbers shouldn't be. In 99% of the cases, there is no valid business reason to ensure there are never holes in the sequence-generated numbers. If there really is a valid reason, then it will take either a ponderously slow, disk-only, NOCACHE sequence; or some complex PL/SQL and home-grown code to write your own monotonically-increasing, gapless sequence generator. For everyone else, adjust the cache to 500, 1000, whatever matches your insert volume, and don't worry about the occasional loss of numbers on database restart. Don't Hide the Sequence For best performance, call upon sequences directly in your insert statements and PL/SQL. Never fill them by trigger or by dynamic SQL in a misguided attempt to abstract real sequence names from the code. The recursive SQL required to get a sequence's NEXTVAL is bad enough on performance. Do not exacerbate the problem by getting a PL/SQL function and dynamic SQL involved as well. Recall that dynamic SQL requires significant overhead to perform parsing. Also recall that a synonym is already an abstraction of a real object; there is no need to layer another abstraction on the top of a sequence. Schema Design - Indexes Adding an index is a quick and easy way of improving response time by several orders of magnitude. But it is not painless. An index comes at a cost of increased disk space and slower DML execution. When dealing with terabyte-class tables, a couple additional indexes that slow loading and backups, and double the amount of disk will need to be well researched and solidly justified. This is very important to remember if you are designing tables meant for high throughput; usually the tables receiving all the incoming data and transactions from users and external systems. Keep indexes on these tables to a bare minimum to support unique constraints, foreign keys, and the occasional critical query. As an example, using this table:


CREATE TABLE load_test( ld_id NUMBER, ld_dt TIMESTAMP, ld_msg VARCHAR2(30), ld_flg VARCHAR2(1) ) PCTFREE 1 PCTUSED 99 / Averaged over 3 runs using (WinXP, 2GB RAM, 8K block size), here are the completion times when inserting rows in a fashion that mimics a high-volume OLTP web app vs. a high-volume ETL database: Condition Bare table single inserts 100K 3.9 s single bulk inserts bulk inserts 1M 100K inserts 1M 55.4 s 52.6 s 59.4 s 92.9 s 151.1 s 197.0 s 252.2 s 690.7 s 0.14 s 0.11 s 0.2 s 2.6 s 1.4 s 4.6 s 3.5 s 7.0 s 5.4 s 3.7 s 5.0 s 14.1 s 40.8 s 72.0 s 78.7 s 69.5 s

Added NOT NULL constraint to 4.0 s ID Above + added simple Check Constraint to ld_flg Above + added PK to ID (PK is constraint and index) Above + added index to ld_dt Above + added index to ld_msg Above + added index to ld_flg Above + replaced ld_flg index with bitmap index 5.0 s 7.3 s 11.2 s 13.2 s 17.2 s 50.4 s

As you can see the NOT NULL constraint adds overhead of no concern. The check constraint adds a little overhead ,but not much. And the addition of indexes is where the performance of OLTP-oriented DML gets worse by 1X for every index. Bulk DML displays different behavior. Bulk inserts take much less time that any row-by-row insert (10X faster on the bare table), but show much worse degradation as indexes are added (5X worse for 1 index to 40X worse for 4 indexes). So the lesson to be learned is that indexes add quite a bit of overhead to DML operations, even if you have the luxury of using bulk loading and update operations. Keep this in mind when you are led to the possibility of a new index as the solution to your problems. If the table isn't meant for high throughput, but needs to support many and varied query activity, a handful of indexes may be just the thing to preempt or solve performance problems. If you do choose to add an index, there are many choices


available, each with certain advantages and restrictions. Study the Oracle documentation to get to know indexing strategies intimately. Below are a few general tips and lessons learned that might not have been obvious. Composite When dealing with large tables that serve query-heavy applications, it is a good idea to group the most frequently requested and searched columns together in a composite nonunique index. Of course this assumes that your business analysis and requirements gathering have pinpointed the sorts of queries, views and reports that will make use of this data. If such analysis is available, create the new index, gather statistics, and then monitor its use and results. In a system where the uses of the data cannot be anticipated up front, it will be imperative for you to make educated guesses about the required indexes before unleashing the system on end users. Use one of the various Oracle features to monitor the most resource-intensive queries, and adjust your indexes accordingly. Function-based One of the classic ways for a database developer or report writer to ruin a query has been to wrap one of the columns in a function, like TRUNC or UPPER. If you find yourself in a similar situation, with the function wrapper forcing a full scan of the table's blocks, consider using a function-based index (FBI). This is essentially a materialized version of that column with the function pre-added and computed. Some refer to this as a virtual column. This feature has been overshadowed by the new virtual column feature in 11g, which accomplishes the same goal. On 9i though, the use of an FBI has certain limitations and known bugs, most of which are fixed by version For example, the use of a logical standby database does not support FBIs until version 10g. Bitmap If you are designing or improving an OLTP system, using regular DML to add and change data in the tables; do not consider tinkering with bitmap indexes. Bitmap indexes are terrible for OLTP systems with incremental data changes, but wonderful for data warehouses and similar reporting-centric databases where new data is accompanied by partition exchanging or complete index rebuilds. If you are designing a warehousing/reporting-type system, where long, slow queries are running against primarily read-only tables, bitmap indexes may be the right choice for many of your low-cardinality columns used heavily in predicates. As an added bonus, the CBO can combine these single-column bitmaps at runtime to satisfy certain multicolumn predicate, eliminating the need for extra query-specific composite indexes. Index Tips


Do your indexes need the default of 10% block freespace? Create them with less freespace if not. Create your indexes in a separate tablespace from your data. With modern storage architectures, many of the arguments for isolating and aligning discs and tablespaces are moot, but it is still a good idea to keep your indexes and tables separated for administrative, maintenance and certain concurrent access performance aspects. Create separate tablespaces for really large tables and indexes, even partitions of the same, opening up backup, restore and administrative options not possible if all objects are kept in one tablespace. When maintaining massive tables, it is usually quicker to re-create or rebuild indexes, than to let Oracle handle the changes on an existing index transparently. You'll need to test both options using expected data volume to determine which one is the best option. There are special features, like parallel DDL, NOLOGGING and the DBMS_PCLXUTIL package that help you re-create and rebuild indexes as fast as possible. Composite indexes and natural key indexes are composed of multiple columns. When the values in these columns repeat, differentiated only by the last or last few values in the index, you will benefit from using key compression. Even if you do not use key compression, you can take advantage of the newer 10g and 11g index, table and tablespace compression. The benefit it seems to provide is to conserve a great deal of disk space without much cost in additional object access time. Composite or natural key indexes which encompass almost every column in a table indicate performance would benefit if the table were converted to an index-organized table. I create all my reference (parent/lookup) tables as IOTs. This has improves response times. If you do use IOTs, consider creating the IOT tables in the index tablespace since the table's data is kept in the underlying index's segments anyway. Dropping uniqueness constraints usually drops the underlying index as well. This can be undesirable on massive tables where an index rebuild could take hours or days. If you perform certain maintenance operations where the constraint needs to be dropped temporarily, you can avoid the cost of rebuilding the index by using the KEEP INDEX clause of the ALTER TABLE...DROP CONSTRAINT syntax. This is seldom used, but can save a tremendous amount of time. One can work with Oracle for years before finding out that certain existing indexes, if they meet the criteria, can serve as the index for a new uniqueness constraint or foreign key. The columns of the new constraint just have to be found as the leading columns in an existing index. If that condition is met, you can use the USING INDEX clause of the CREATE INDEX syntax to refer to the existing index, rather than creating a brand new one. This too can save a tremendous amount of time and storage space in massive database environments.

Schema Design - Physical Structures and Storage


As you design the physical structures that will hold large data; storage hardware, media and attributes should be carefully considered to ensure:
    

Data can be written and read as fast as required. Block space is used efficiently. Freelists and transaction slots in block headers are sufficient for expected transactions and concurrent user load. Future archival and purging plans can be accommodated with minimal effort. Bytes are organized effectively to match the data and its expected usage patterns.

Storing Data With modern file and storage systems, the range of options to physically and logically store and organize data is much larger than in the past. The hardware you possess or can budget for plays a critical role in storage decisions, but the variety and combinations of solutions available today is virtually endless. The focus here will be on the kinds of questions that should be answered during business and data analysis phases which drive how the data is to be stored, and ultimately how the tables and indexes are designed. Luckily with the advent of locally managed tablespaces and ASSM, most of the storage attributes that use to be fret about are deprecated or moot; leaving PCTFREE, PCTUSED, COMPRESS, CACHE and TABLESPACE as the relevant attributes still requiring input. Consider the purpose of and demands upon each table and index. The answers should help in the decisions that need to be made about the organization and storage attributes of each table and index. Will the table be used mainly for receiving incoming data, or querying? Tables used primarily for data loads will see little to no DML activity after the initial insert. This suggests a heap table with very few indexes and constraints to optimize insert operations. The PCTFREE can be set low and the PCTUSED high. Tables in OLTP or operational databases used primarily for queries and reporting suggest a heap or IOT table, with sufficient constraints and indexes to maintain clean, responsive data. Tables in large DW databases are usually built as IOT or heap fact tables, with accompanying IOT dimension tables. Will Direct Path loading be used? If you will be using direct path loads (either through import, data pump, SQL*Loader or by use of the APPEND hint), there are a number of restrictions that affect how the table will be designed. Direct path loading requires more extra disk and temp space than conventional loads since data is added after the high water mark. With direct path, it is better to have no indexes at all loading into a bare loading/staging table. If you must have indexes on your loading table, know that index maintenance is reserved until after the load is done, which on millions of rows could take an unacceptable amount of time. There are various features available to build indexes quickly, using parallel DDL, and


NOLOGGING. Test first. But depending on your volume of data, it may be faster to drop the indexes before the load and recreate them after it is done. Also remember that a direct path load requires an exclusive lock on the whole table. This is another reason to do direct path loads into an interim staging table, one not required by end users, which will be read by a later data transformation and validation process responsible for moving clean data into the permanent tables. Will the data be updateable and visited frequently by end users, or rarely? If the table will see significant DML activity, set the PCTFREE and PCTUSED to a reasonable level to accommodate the growing and shrinking blocks. You might start with tests using 20/50 or 30/70 to determine if space is efficiently used, row chaining does not occur and response time is still adequate. Will the table be fairly static after the initial load, or changing constantly? If the table is compact and static (typical of code/lookup tables), consider creating it as an IOT with no columns in overflow Another option is as an ordered cluster. If the table is static, but not necessarily compact, spend time testing various PCTTHRESHOLD and INCLUDING settings. This is done to keep rarely used columns in the overflow area. Also use key compression if the primary key is a composite key. If the table is not static, but changes frequently, a heap table is probably the best choice. Will the table be one of the system's hot points, or only referred to rarely? If the table will be one of the most frequently-visited areas of your database, ensure that access to it is as fast as possible. For heap and cluster tables, use the CACHE clause. For IOT tables, caching behavior is already inherent. If money is no object, consider keeping ultra-hot data in RAM disks, or in-memory databases like TimesTen from Oracle. How long must the data be kept online and immediately useable? If the table is large to massive, any requirements around history, archival, backups, nearline/offline access, and purging tend to suggest the use of partitioning and transportable tablespaces. In particular, proper partitioning can yield maintenance operations (like adding or deleting months of data) that take seconds, instead of days to complete. Understand the level of granularity the data must be managed by: hourly, daily, monthly, yearly? Partition accordingly. With 11g's addition of automatically named and extended partitions, the excuse of not wanting to manage so many partitions has been eliminated. If data volume is massive, it is a good idea to have a separate tablespace per partition as well; easing administration, backup, transportation, purging and reintroduction of entire partitions. Remember that if you are using transportable tablespaces, the data structures and constraints need to be designed to be self-contained in the same tablespace. If this is not


possible due to referential constraints to data in other tablespaces, you may need to add a denormalized version of the table to your design and capacity plans so the data can pass TTS tests and export successfully. Are there any columns in the table that are unstructured and rarely changed or queried? Certain columns that are rarely queried or updated, like large LOB data can be kept "out of line" by explicitly specifying a separate tablespace and testing settings for PCTVERSION, RETENTION and DISABLE STORAGE IN ROW. Will the table be the subject of heavy contention, with many concurrent users needing access to similar blocks? Generally the default settings for INITRANS, MAXTRANS, FREELISTS and FREELIST GROUPS are sufficient and need no alteration, especially with 10g and ASSM which deprecates MAXTRANS and ignores most of the typical STORAGE clause. But if you can accept the additional space required by more interested transaction slots (ITLs) in each block header, and you anticipate many transactions working on the same rows concurrently, consider setting INITRANS a little higher to avoid the minor overhead of ITL allocation from the block's free space at runtime. Start with 2 or 3 and check results. I wouldn't recommend going any higher than the average expected number of concurrent transactions for a given block (which is fairly difficult to guess until the system is in production, at which time it is usually too late to rebuild the large table). The extra space per block for additional ITLs is usually too costly to stomach for most shops, so Oracle recommends that in general, you just use the default INITRANS setting. If you are not using ASSM, and your table will see significant concurrent DML activity, set FREELISTS and FREELIST GROUPS a little higher to accommodate the expected average load (the default of 1 is generally insufficient for busy tables). Start with 2 for both attributes and increase only if buffer busy wait events do not improve measurably. Is storage space at a premium? Despite the low cost of disk nowadays, shops will often continue to have all available storage space eaten up by backups, denormalizations, mirrors, extra environments, etc.. If you are or will be in a similar budgetary/logistical pinch, design your segments to purposefully conserve space.

  

Don't allow the default PCTFREE/PCTUSED of 10/40 to waste needless space on tables that are rarely updated. Consider using all or most of the blocks with a high PCTUSED setting. Use table and index compression. With IOT structures use key compression (doing an ANALYZE with index validation can even tell you the optimal level to use). Go sparingly with secondary indexes on IOT tables since they often take more space than the entire IOT they support.


 

Use partitioning effectively so old partitions can be quickly archived and purged freeing more space. Re-examine any tables that have an INITRANS set higher than 1.

Ensure compact surrogate keys are being used and data is normalized as far as is reasonable to prevent redundancy and unnecessarily lengthy columns. Schema Design - Partitioning Although it is an extra cost option, for any firm dealing with large to massive quantities of data (currently large is between 100GB and 500GB, massive is several TB or more), partitioning is a necessity. Many make the mistake of thinking partitioning is meant for improving query performance. Partitioning can reduce the number of blocks scanned for certain queries, but the primary reason for its existence is to simplify and render data maintenance operations as blazing fast. In any database, the data growth trend always tends to be logarithmic instead of linear. All disk space is frequently eaten up months or years ahead of schedule. The need to archive and purge old business data or irrelevant data (like ancient logging and auditing rows) quickly becomes critical to free up disk space and reduce the quantity of blocks visited by full scan operations. Without partitioning, deleting a month or quarter off the tail end of a heavily used, heavily indexed permanent table with hundreds of millions of rows could take days and will probably not complete successfully unless designed well. With partitioning, maintenance operations, like removing the last few months, or adding the most recent month, take a mere second or two. The benefit of increased uptime alone is usually more than worth the cost of the partitioning option. But partitioning is also a great boon to those systems that need parallel DML or DDL, and can speed up certain intensive queries. Much has been written about partitioning. Click here to learn more. Here we'll try to touch on just the points that affect how you design your schema's tablespaces and data segments. Dedicated Tablespace per Partition As mentioned above, one of the primary purposes for partitioning is data maintenance activities, in particular removing partitions from the back end of a table, and adding newly loaded partitions to the front end. In both cases, when dealing with really large quantities of data per partition, it may improve manageability to keep each partition in its own dedicated tablespace. If this is overkill for your shop, at least create a separate tablespace per partitioned table. If you end up needing to use Oracle's Transportable Tablespaces feature (TTS), dedicated tablespaces means little to no further effort need be expended to take advantage of TTS. With dedicated tablespaces per partition, reporting and data integrity can be further protected by rendering older segments as read-only (with 11g, this is even easier). Partition Naming


If you are using list partitioning and the data is fairly static, it is a good idea to name each partition in some manner to represent the data contained in each partition. For example, if you are partitioning by sales data for each state in the United States, each partition name could be the two letter state code since it is unlikely states will be removed or added to the Union anytime soon. But if your partitioned lists are likely to change, then a meaningful name can get in the way since it will now be misleading unless it is renamed. In this case name the list partitions more generically, like PART1, PART2, etc. If you are using explicit hash partitioning or range partitioning, name the partitions for the scheme you have chosen to indicate what is contained in each partition. Remember that Oracle object names cannot begin with a number. I recommend beginning each partition name with "P" and each subpartition name with "SP". Follow that by the numeric or data scheme used. If you are partitioning, say by facility ID, the partitions could be named P0000TO1000 and P1001TO2000, for example, or simply by the endpoints like P1000, P2000, etc. If you are partitioning by year, P2007, P2008, etc. If by month P200702, P200703, etc. If by day P20090101, P20090102, etc. When you use a naming scheme that is easily filled by an existing Oracle format modifier, it makes partition addition easier and automatable. With 11g, this is trivial using the new interval partitioning feature. Other Design Considerations Although each tablespace can have its own blocksize, and each partition can have its own storage attributes, it is recommended that each partition and subpartition inherit from the parent table's definition or from a partitioning template. To have unique storage definitions (other than dedicated tablespaces named after the partition) makes a partitioned table very difficult and time-consuming to manage. Schema Design - Advanced Queuing & Scheduling You have a list of items you need from the grocer. So you go to the store, wander up and down the aisles looking for the products you need, and finally check out an hour after leaving home. Imagine instead that a new grocery store has opened its doors that pulls the products from the shelves for you. You email your list ahead of time and since the grocer is more efficient and familiar with his products than you are, he/she usually has the order ready by the time you get to the store. If there is a large backlog of customers, it's not a problem if the order takes a little longer, because you can still use the extra time to visit a spa, or your favorite music store. In this case, the grocer gives you a quick call when your order is complete. You pick up the sacks and go home happy. In the first scenario, you had to personally shepherd the grocery transaction from beginning to end, binding you so that nothing else could get done while you were shopping. This is typical of synchronous computer processes. The second scenario left you free to engage in other activities while your message to the grocer kicked off a flurry


of activity on his end. You really didn't have to know where everything was at, or go searching for it, or worry how the grocer would get it done. The grocer just lets you know when your order is complete and you take it from there. You are able to get more done while your order is being filled, and it seems to you like grocery shopping is now pain-free and fast. The grocer is able to specialize in order-fulfillment (modularity), achieve huge efficiencies (performance) and get more orders out the door (scalability) than if the customers were allowed to do their own shopping. This is the advantage of asynchronous processing and demonstrates how the decoupled nature of messageoriented systems work. So one way of improving the performance of database-centric applications, or at least perceived performance, is to decouple pieces of your system that can do independent work. In Oracle, you can do this the hard way, designing your own task tables, and PL/SQL code to manage it all. Or you can do it the easy way; taking advantage of the built-in, robust Advanced Queuing and Scheduling features. We recommend that you learn more about Advanced Queuing, DBMS_AQ, DBMS_JOB and new Scheduler interface to become familiar with how easy it is to use them. Knowing when to use Scheduling or Queuing, while designing a database for performance, is what will be briefly discussed here. Scheduling Prior to 10g, task scheduling was accomplished using basic OS-level schedulers like AT on Windows and cron on *nix systems, expensive commercial software, or using the DBMS_JOB interface in Oracle to automate independent processes. Most shops that needed to automate a lot of things found AT, cron and DBMS_JOB too basic and limited, frequently constructing their own task/scheduling subsystem to manage task/job parameters, dependency chains, logging/debugging, and status notification. Most of this weakness was addressed with the introduction of the "Scheduler" in 10g, which is meant to replace the old DBMS_JOB interface. When to use a job is fairly straightforward: when something needs to run on a periodic basis, and can run without human intervention. However, less well known is that using a job scheduler can take advantage of the full power of your database host's CPU and memory resources. By breaking up certain tasks into chunks and submitting each chunk as a separate job, an extra degree of parallelism can be obtained not possible through a typical single-session PL/SQL process. You can also submit multiple jobs to mimic full load/stress/concurrency testing. If you have a data loading process that seems likes it is drinking from a tsunami when it connects to the data source, consider breaking the incoming data into independent channels and jobs if your machine has the horsepower. This is done for you implicitly when you use the parallel DML or DDL engine, and execute certain routines like the DBMS_STATS.gather_*_stats procedure which have a degree of parallelism parameter. Tip: If you decide to take advantage of all your machine's CPUs with concurrent jobs, understand that it is not very scalable since new sessions will need to wait until one of


the jobs completes. Any parallel operations that max out your machines' resources need to be reserved for hosts with little load, or reserved for operation during off hours. Scheduling affects schema design in that when using scheduled jobs to divide and conquer the work, it is best that the table(s) being operated on are divided in a manner that matches the division of work. This generally means a design solution using partitioning. Work on simple tables can still be logically divided by primary key or date range, but there is risk of contention for shared blocks. So if you have the partitioning option, use it and partition your tables along the lines in which work like data extracts or loads are divided. You may also wish to create a job log table so that each job can report on its status and details about work completed and errors handled. If you are using 10g or higher, it is recommended that you use the DBMS_SCHEDULER interface instead of DBMS_JOB. It is many times richer, more flexible and robust that DBMS_JOB. For example, instead of two data dictionary views to give you metadata on jobs and their execution, there are twenty. And the ability to chain jobs, share schedules and programs, expire jobs, group related jobs, prioritize and allocate resources, and start jobs based on external events (through Advanced Queuing) are all fantastic. Queuing If you encounter a business process that you anticipate needing to scale very well, besides load balanced servers and other measures to reduce network traffic and resource consumption; consider introducing message-oriented middleware (MOM). Oracle Advanced Queuing (AQ), officially part of the Oracle Streams suite of features as of 10g, is a very well designed, flexible and reliable implementation of MOM. Where the business transactions touch the database and need to scale, you can set up message queues. External or internal subsystems (to the database) produce messages or events that signal waiting or interested database processes that there is work to be done, sometimes passing a structured or unstructured payload of parameters or data. Each message is processed once by the interested process. Oracle AQ has a rich set of features that guarantee the messages will be delivered. The producer of the message or event does not need to know anything about the consumer, and can go back to doing other work once the message is put on the queue. Using broadcast, direct recipient notification, subscriber notification or simple dequeuing; consumers become aware of new messages. Like the grocer, they can focus on doing one thing and one thing well, performing their intended task as fast as possible, then going back to awaiting new messages on the queue. All this push/pull message-passing is accomplished through queues, which are implemented as a set of queue tables and scheduled jobs. This is where using AQ for performance affects schema design. The queue tables can be created in the main application schema, or a separate schema dedicated to the queues. If you create them in the application schema, since they are permanent tables, they will show up in reverse-


engineered DDL and models. This will happen even though Oracle is entirely responsible for their design and contents. You may want to filter them out of your model when reverse engineering your databases or exporting entire schemas to other environments. Tip: Queue tables can be dedicated to single-consumer queues or multi-consumer queues. Consider prefixing single-consumer queues with S_, SC_ or SCQ_; multiconsumer prefix with M_, MC_, or MCQ_. Another feature of Oracle AQ that can improve performance is instance affinity on RAC systems. If you are using RAC, you have a number of database hosts that can participate in the processing of work, but might not be fully utilized. With instance affinity, you can set up certain queues to prefer using a given instance in the cluster, thereby explicitly spreading work across the nodes, much like concurrent jobs mentioned in the Scheduling section above, to take advantage of all the computing resources available. Data Design - Datatypes Performance and memory utilization can be improved by using sane and appropriate data types and data lengths to contain and constraint your company's data. Using the right data length is a simple, no-overhead way of constraining your data to ensure that values of unexpected lengths are rejected, rather than polluting your database. As a side benefit, less memory and resources are required to process the slimmer, trimmer data. When strings and numbers are constrained by reasonable length declarations, there are only three reasons for getting a length-related error during DML operations: either the business rules were not accurate to begin with, the business rules have changed, or junky data is being fed into your system. In any case, having the purposeful length constraint complaint is better than silently accepting the data that breaks the rules. Strings If your columns' are generically typed as VARCHAR2 or CLOB, as the least significant consequence, your database will end up performing frequent implicit conversion on those columns which store numerical data. Implicit means unplanned computing resources were used. In a system where every millisecond counts, systems which handle millions and billions of rows, you don't want to force Oracle to implicitly convert even one column in every one of those rows. If it is a number, make it a NUMBER. If it is alphanumeric, make it a VARCHAR2. Resist the temptation to make everything a VARCHAR2 just because it can store digits as well. This principle applies to XML as well. Just because XML is flexible doesn't mean you should use it. In a large, performance-intensive system, just the extra byte and parsing overhead alone should be enough to get you fired. If you anticipate frequent changes to the attributes of your entities (typical in systems where requirements aren't fully discoverable up front, or where constant change like market forces and government regulation force periodic redesign) plan and budget for the resources required to


refactor your data model and interfaces when needed. Do not try to save money by burying your entire data model in a wide-open XML-based column. For example, let's say there is a core table that stores all metrics kept about the flow of energy through the components of your regional electricity grid, but the metrics change all the time due to regulation, business and technology changes. The hardware sends about 2 billion rows of data per month in a table with very wide rows to accommodate all the possible metrics. The table has 200 columns, with an average 300 bytes of data per row. In a well designed model, each month of that table will occupy at least half a terabyte of space, to say nothing of block and index overhead. If you instead reduced the variable attributes to a single CLOB column that stored XML, you might have reduced the number of columns and made it easier to change/add metrics, but just the extra characters required for the XML element tags will occupy an additional 2.7 terabytes each month kept online. What if, you had to remove one of the attributes, or update a code across all historical rows? Dropping a column in 9i and up takes no time at all, but you will have to parse and update every row if it were kept in an XML CLOB. The operation could take days. Updating a given XML element would be much worse as each row would have to be visited, parsed and updated as many more bytes than necessary are processed. XML is for flexible communication between disparate systems, not for modeling and storing your data. A generic data model, usually implemented as a few tables with generic names or as CLOB columns filled with XML, will perform poorly, will be immensely difficult to write SQL to and maintain, and will be impossible to constraint and index sufficiently. Until quantum computing is a reality, one constant is that generic data models are a horrible idea. They only work in trivial databases and textbooks. By the same token, do not model your columns' data lengths too widely to save yourself the trouble of having to widen them in the future when business rules change. Doing so allows bad data into your tables and can rob performance of precious milliseconds as Oracle may have to allocate, read, and write more bytes than it had to. You can easily widen existing columns if the rules change. The change to the data dictionary takes less than a second. But shrinking them -- after you discover the wide column is hindering data integrity and performance -- takes an enormous amount of time as columns have to be copied, emptied, shrunk, then refilled from backup. That's three full passes on your massive table, each requiring massive UNDO and REDO overhead, possibly costing you or your customers days of downtime. Take the time to model your data properly up front. Ensure the model expresses the business rules as they are today, perhaps with a small nod toward future growth. Numbers If you are modeling monetary columns, obtain the business rules about how many digits of precision are required beyond the decimal point. In most cases NUMBER(9,2) is more than sufficient for monetary columns. If the business rules say the column will never have a price in it more than $1000.00, by all means, shorten the datatype to NUMBER(6,2) to reflect that rule. Although it does not prevent any values from $1000.01


to $9999.99, it does prevent randomly large numbers from being entered. Also, if the built-in protection of the length declaration is sufficient, you might skip the check constraint altogether, which would otherwise be added to enforce sane monetary values. Skipping unnecessary check constraints can save precious milliseconds in heavilyloaded systems. If you are modeling numeric identifiers, obtain the business rules and design your model to match them. If the identifier is currently 3 digits long and foreseen to never be longer than 6 digits, declare the datatype as NUMBER(6,0). If you are unsure how large the number could grow, but want to ensure only whole numbers are used, declare the column to be NUMBER(*,0) or INTEGER. Both declarations will ensure digits to the right of the decimal are not allowed. Dates and Timestamps If your business rules or modeling approach require a date and time stamp as part of the unique key, determine whether there is a possibility for sub-second accuracy. If it is possible to have multiple rows arriving or being stamped within the same second, you will need to use one of the new TIMESTAMP datatypes to achieve sub-second uniqueness. LONG and LONG RAW Oracle has been warning us for years to not use LONG and LONG RAW. I couldn't agree more. These data types are extremely limiting. If you find them in a legacy system, do what you can to push through an enhancement to convert them to CLOB and BLOB. Certainly don't use them in a new data design. RAW and BLOB If you have the need to store binary data (perhaps a PGP passphrase, unstructured payload obtained from a C pipe, etc.) and the amount of bytes is small, consider using RAW instead of BLOB. In most other circumstances BLOB is the right datatype to use when storing unstructured, binary content. Remember that RAW is limited to 2000 bytes, not 4000 as it is with its VARCHAR2 counterpart. As with VARCHAR2, be explicit about the expected data length, instead of opening it up as wide as possible "just in case." ROWID and UROWID In general if you find the need to stamp a column as ROWID or UROWID, there is a flaw in the implementation. It is not a good idea to store physical or logical rowids as they can both shift and disappear. Logical standby databases cannot support columns of this type either. In rare cases though, you may find that temporarily holding a set of ROWID values in a global temporary table is just the ticket to achieve performance goals. I've used this


technique in a few large systems where it helped us quickly update, log and handle changes to downstream reporting data when retroactively modifying a couple thousand rows found scattered among a few hundred million. Click here to learn best practices for using ROWID. Collections It is recommended that you not use collection columns unless you have a very specific need that cannot be solved in any other way. Many database drivers do not support them. They are rather difficult to pack and unpack except for advanced Oracle developers. As of 11g, logical standby databases cannot support them. So when possible, avoid collection columns. Data Design - Codes and Identifiers In the appropriate and advisable quest to remove most redundancy from the data model (normalization), you will be simplifying access to each piece of business data by assigning it a unique key. Most rows in tables that contain the business' static, reference data can be uniquely identified by a name, code or number that they already carry with them internally. For example it can be things like UPC number, social security number, country name, currency code, bank account number, phone number, email address, host name, contract ID, license plate "number", etc. In other cases, you will create a new code, name, abbreviation or number to uniquely identify the piece of business knowledge it represents. All of these types of identifiers carry meaning, and are sometimes based on naming or numbering schemes. These types of identifiers are known as intelligent or natural keys. Natural Key If you have to create a new name, code, or abbreviation (unless constrained by the business rules), make the column a little wider than you might think necessary. Names can be 30 to 100 chars long. Codes can be 10 to 20 chars long. Short codes -- best saved for reference tables with a handful of values -- can be 1 to 10 chars long. If you really need a shorter version of the code, probably to conserve screen or report space, add a "display" version of the name or code to the table, and a new unique constraint to it to prevent inadvertent duplication. Tip: Natural key columns are named after the attribute, of course, followed by the token representing the domain, like "_CD" for codes, "_NUM" for meaningful numbers, and "_NM" for names. Although it seems elegant and more simple to use the natural key as the primary key for a given reference table, avoid the instinct to do so. Common reasons heard for using natural keys as the primary keys have usually relied on the argument that queries are easier for developers and report writers "because you don't have to join" to the parent, followed by the reassurance that the "code will never change."


Wrong. Meaningful identifiers tend to change, some more than others. The underlying naming scheme changes, area codes switch, banks merge, countries topple, things die, leave, transfer or are retired. When change happens, and the natural key needs to change with it, on large systems, you could be looking at hours to days of downtime as historical data is corrected to reflect the new key value. You may also be faced with weeks of code refactoring and regression testing to ensure the change didn't break anything. An amusing example of this is Oracle's own OTN site. When Oracle designed the site, they decided to use the registrant's email address as the primary key for an OTN member. This was used on a number of tables that stored the member's profile and posts. Then members started switching email providers and employers. They needed to change their ID on OTN, but they could not. For those who had been members of the site for years would now lose their history. If you got a new email address, you simply had to start from scratch and register as a new member. This is the sort of user experience you want to avoid by planning for performance and flexibility using good data modeling principles up front. Artificial Key In 99% of the tables you will design to represent the conceptual and logical entities, the primary key should be a sequence-generated, meaningless numeric column. This sort of primary key is known in various data modeling circles as the "surrogate" or "artificial" or "auto-increment" key. It allows each row of data to be identifying by a single, 2 to 6 byte number (rarely will your numeric identifiers stretch into the trillions or above). This value is usually never seen by the end user in the application. When, not if, the natural key changes, the artificial key never has to change with it. Primary keys should simply never carry any implicit meaning unless you are 100% certain the values will never change. Even then you should favor artificial keys. Tip: The name of the column should be derived from the entity or table name, suffixed with the token "_ID". So if the table name is SEC_USER, the artificial key should be SEC_USER_ID or USER_ID. Data models in 3rd normal form, with artificial primary keys may look difficult to understand, join or query, but they are fantastic for preserving data integrity, easy to modify, and easy to tune. Normalized models are a solid foundation upon which you can then build additional layers that meet developer, user and performance needs. As a side benefit to adding artificial keys to your tables, modeling tools and development tools that have relationship and master-detail navigation functionality tend to have an easier time and work better, making possible certain cool automation and self-discovery features. Artificial keys make joining easier, since there is only one equality check per join, instead of several when dealing with multi-column natural keys. And due to the


reduced amount of bytes and logic involved and blocks processed, performance can be improved as well. Data Design - Normalization No discussion of database design is complete without a review of "normalization," the process of eliminating duplication so that each piece of data is kept in only one place. Start with 3rd Normal Form Sensible modelers typically create first draft DDL in third normal form (3NF). As a quick review: a table in first normal form contains the same number of fields for each row, does not allow duplicates, and has removed repeating attributes to a separate child table. In 2nd normal form, columns are put where they belong. The technical description is that each non-key attribute must depend wholly on the entire natural key. If there is a column that does not depend on the natural key at all, or only part of the key, the column belongs somewhere else. Putting a table in 3rd normal form is usually the step where the last reference tables are flushed out, surrogate keys are added to the parent reference tables, and placed in the child tables where the redundant parent data used to be. Technically, this is eliminating transitive dependencies, where a column is only semirelated to the entity represented by the row. If a non-key column (B) is an attribute of another non-key column (A), it is a transitive dependency and should be broken out into its own table, where column A serves as the natural key of the new table. Performance Aspects of Normalization Eliminating duplication results in more compact, concise tables with relationships to one another described and codified by referential constraints, emptiness (NULL) constraints and uniqueness constraints. By keeping a data model clean and concise, "soft" performance goals can be realized. Normalizing a data model also reduces risk. Data becomes nearly invulnerable to mistakes when it is kept in only one place. When every piece of data has only one home, it is much more likely to be stored and updated correctly. But when a given piece of data is allowed to spread all over the model, sometimes kept in columns with different names, risk increases that the various homes for the fact may be missed. This would leave orphaned, inaccurate and inconsistent information. Bad data can break a program, a sale, even an entire business given the right circumstances. Since this is about designing for performance, it is useful to know that normalization aids performance of OLTP (write-mostly) systems in that inserts, updates and deletes of columns and instances (rows) are much easier and faster than in non-normalized or denormalized models. For example, updating a value in a normalized AREA_CODE table touches a single column on a single row. If however, a lazy modeler or misguided OO-centric architect managed to push through a model by composition, where the area code attribute is stored with every entity that has a telephone number, the update could take hours. Each row on every table with area code must first be searched and then relevant blocks updated. In eBay's case, if they were to include area code or telephone


number on their registered member table, a single area code change would require working through over 200 million rows. This is write performance disaster. Unfortunately, due to all the tables that a normalized model engenders, the relationships that must be traversed and physical bytes that must be gathered, querying data in normalized models can have performance limitations. This can be overcome by limited and well-planned denormalization, as described in the next section. But if the database starts out non-normalized, with duplicate data everywhere and missing integrity constraints, no amount of workarounds will ever make it behave properly. Please do not begin development until you have a well normalized physical model. Then as details are ironed out, and initial stress tests are constructed, solve performance problems with the model by building on top of your solid foundation, rather than tearing it down. Make use of the strategies described in Denormalization to overcome performance obstacles.

Tuning PL/SQL Exploit SQL Never underestimate the supremacy of SQL. We have seen many programmers so overwhelmed by the idea of PL/SQL (finally, a structured programming language!) that they forget basic SQL. Simple tasks that are better suited to pure SQL are coded in PL/SQL, using many times more lines of code than are actually necessary. In general, SQL is easier to code, easier to understand, and much more efficient. For example, the following PL/SQL: DECLARE CURSOR C1 IS SELECT FROM WHERE xfld1 xfld2 xfld_tot BEGIN OPEN C1; FETCH C1 INTO xfld1, xfld2; xfld_tot := xfld1 + xfld2; UPDATE SET summary_table fld1 = 0, fld2 = 0, fld3 = xfld_tot

fld1, fld2 summary_table fld3 = 1234; NUMBER; NUMBER; NUMBER;


WHERE END; should be coded as BEGIN UPDATE SET

fld4 = 1234;


summary_table fld1 = 0, fld2 = 0, fld3 = fld1 + fld2 fld4 = 1234;

ORACLE ADMINISTRATION Oracle System Architecture Oracle is an Object-Relational Database Management System, or, ORDBMS for short. A traditional RDBMS (Oracle Version 7 and below) stores data in tables called relations. These relations are two dimensional representations of data where the rows, called tuples in relational jargon, represent records, and the columns, called attributes, are the pieces of information contained in the record. Oracle provides object oriented extensions to the Oracle RDBMS forming the new entity the Object-Relational database. In an Object-Relational database columns can represent either a single value (as in standard relational databases), a varray (a fixed number of additional records) or a REF to a second table where a variable amount of data can be stored. This takes the two dimensional relational view and adds a third dimension. In addition, in an object relational database procedures known as methods can be tied to the tables. Oracle introduced a new concept in its new version called Grid Computing and named the new version as 10g. As the name suggests, Grid Computing is a step towards the most flexible and cost effective way to manage information and applications. Grid Computing internally creates a large pool of Servers, Storage and networks wherein resources from one pool can easily be allocated to another depending upon the work load. Grid Computing uses a Workload Manager which enables the applications to share resources across the network in a Grid. It thus offers a high performance and scalable system. The Oracle Concepts of Database and Instance In an Oracle configuration there is the base set of files, data files, redo logs, control files, parameter files, which make up the database and then there are one or more sets of operating system processes and shared memory areas (each known as an SGA) that make up an Oracle instance.


  

Click Here to view a diagram of a typical Oracle10g system. Click Here to view a diagram of a typical Oracle8/9 system. Click Here to view a diagram of a typical Oracle7 system.

The diagrams show the relation of these files and processes to the SGA (or SGAs in the case of Oracle Parallel Server). As the diagrams illustrate, Oracle is more than just a collection of programs that allow ease of data access. Oracle can be compared to an operating system that overlays the operating system of the computer on which it resides. Oracle has its own file structures, buffer structures, global areas, and performance tuning parameters above and beyond those provided within the operating system. Oracle controls its own processes, monitors its own records and consistencies, and cleans up after itself. Oracle Database Files Oracle files are either used for storage of data, transaction information or parameter values. Oracle data is stored in files known as datafiles. The tablespace, a logical unit of storage in Oracle, maps to one or more datafiles. Each Oracle instance must have at least a single datafile, for the SYSTEM tablespace, a control file, to maintain transaction control numbers and datafile information, two redo logs to contain transaction redo information and a parameter file to specify constants used to initialize the Oracle system. Therefore the minimal Oracle installation will consist of a set of 5 files and the Oracle executables. Once Oracle starts up, more files are created. These files fall into either event or trace logging categories. A single alert log that tracks overall instance status is opened and, trace files for all baseline Oracle processes are started. There will be a minimum of four baseline Oracle processes, PMON, SMON, DBWR, LGWR with a possible fifth (RECO) if the distributed option is loaded. Generally the trace files will contain process startup information and will only contain more if there are errors with their process. You should periodically check the contents of the location specified for trace files and clean out any old trace files. The alert log should be checked frequently during the database operation (in UNIX you can place a "tail -f" against all monitored database alert logs and have them in their own windows on your monitoring screen) some errors such as the archive logging process getting stuck or redo log corruption in a mirrored set, will only show up in the alert log. If the database has more tablespaces than just the SYSTEM tablespace (and it should) then there will be additional datafiles, at least one for each additional tablespace. If there are optional processes started (such as ARCH, CKPT, Sxxx and Dxxx, SNPxx ) then they too will have trace files started for them. In most installations there will be a minimum of 3 redo log groups of two mirrored files each, generally more. It is suggested that as a minimum the following tablespaces be created in any instance:


    

USERS TEMP RBS TOOLS SYSTEM (the required tablespace)

At least two additional tablespaces, APL_DATA and APL_INDEX (or ones named for the specific application), are usually created, as well as those mentioned previously. In complex databases (especially parallel server databases) some tables may be placed in their own tablespace adding to the number of required datafiles. The only datafile required for instance startup is the SYSTEM tablespace. Tablespaces are the logical repository for physically grouped data. The SYSTEM tablespace should never be used for mundane day-to-day storage of common indexes and tables. The SYSTEM tablespace is the strict domain of the data dictionary tables and indexes. Being the only required tablespace, if the SYSTEM tablespace is somehow taken offline, the instance will immediately shutdown. A tablespace used for rollback segments should also only be used for rollback segments, just as a temporary tablespace, such as TEMP, should only be used for temporary segments. In the absence of any application tablespaces for tables or indexes, the USER tablespace should be used for these items. Loss of any of the datafiles will result in the instance not starting up if it is shutdown. Loss of a datafile while active will result in several possible scenarios depending on which datafile is lost. Loss of any datafile is considered a media failure, and recovery can be either incomplete or complete depending on the file lost. Parameter files, control files, redo logs, and Datafiles make up the physical database. The Control Files The control files contain information on all physical database files (the database physical structure) and their current transaction state. The control files are read to mount and open the database and transaction numbers are recorded for each datafile. If the control files and datafiles are out of sync, the database will not startup and will report either that recovery is needed or that the datafiles are out of sync with the control files. Control files are required for database startup and recovery. The database is required to have one control file, however a minimum of two on separate physical disks or on separate disk farms in a raid configuration, are recommended. Control files play an important role in database backup and recovery. Recovery Manager (RMAN) uses the control file of a target database to store some or all,


depending upon the configuration, of the backup information. Even if a recovery catalog database is created for RMAN, RMAN uses Control files of the target database to store this information, which is then synchronized at regular intervals or manually. Hence, it is strictly recommended to multiplex the Control files. The Parameter File (init_<SID>.ora) While not considered to be a part of the database since the Oracle processes do not write to it, the parameter file contains the initialization parameters that instruct the Oracle instance how to set itself up with buffers, processes, caches and the like. The parameter file is read while starting the instance during the mount and open phases of startup. Thus, the parameter file sizes all SGA components either through direct parameter values or indirectly from calculations based on parameter values specified in the parameter file. The DBA is responsible for tuning the database using the initialization parameters. Depending on the version of Oracle, there will be more or less parameters available. Click Here to view a list of the init.ora parameters. System Parameter File (SPFILE) SPFILE was first introduced in Oracle 9i and is a major breakthrough in tuning of initialization parameters. The major drawback of PFILE was that any change in the database parameter was to be recorded to the PFILE and the instance was to be restarted to effect that change. Any parameter value that can be changed dynamically had to be modified manually in the PFILE. SPFILE gives the ability to change the value of a parameter dynamically while updating the SPFILE with this current change. Thus any subsequent restart of the database will have the modified value in the SPFILE. Oracle introduced this SPFILE as a dynamic means of maintaining initialization parameters. Moreover, RMAN has the ability to take the backup of the SPFILE thus enabling us to restore this important file, in case of loss of this file. SPFILE is a binary file and cannot be modified or read using any text editor. To check whether the database has been restarted using a SPFILE, following query can be used: SQL> select isspecified, count(*) from v$spparameter group by isspecified; ISSPEC COUNT(*) ------ ---------FALSE 227 TRUE 32 2 rows selected.


SQL> The Alert Log (alert_<SID>.log) The alert log contains informational, warning and error messages dealing with the Oracle core processes and the SGA. Additionally, the alert log contains a history of any physical changes to the database such as the addition or status change of datafiles, redo logs or rollback segments. Using optional initialization parameters information concerning checkpoints can also be recorded in the alert log. The alert log is the only location where errors such as detection of a corrupted member of a mirrored redo log or the filling of the archive log destination (archiver stuck) are reported. Other informational messages useful for tuning (such as excessive archive waits for a checkpoint or waits due to the redo logs writing to archive) are also reported in the alert log. Oracle recommends that the log be checked at least once per day. The alert log is continuously written to, thus, it continuously grows in size. If the disk location of the alert log fills up, the instance will slow down or stop until space is made available. You should periodically compress and archive the alert log, or, if it contains no important information, remove it. The instance will recreate it if it's not present. The Redo Log Files The redo log files are set up at instance creation. A minimum of two one-member groups in a single thread is required for instance startup. We recommend running three or more groups of two mirrored members with each member being placed on a separate disk or separately controlled disk farm if possible to reduce contention. In most cases we have found five groups optimal to prevent checkpoint and archiver wait errors. The redo logs hold records used for recovery purposes. The redo logs contain information on all data modifying transactions that occur in the database unless these transactions have been run as non-recoverable. The LGWR process writes data on changes from the redo log buffers to the redo logs. A COMMIT is not considered complete until the LGWR signals that all redo log entries have been written to disk. Remember, the redo log buffer is a part of the SGA. Database Datafiles The Oracle system uses logical and physical storage. Logical storage uses the concept of tablespaces. A tablespace is physically implemented through one or more datafiles. Datafiles are subdivided into segments which are subdivided into extents which may be of several types depending on what they store or their usage:
 

Table segment Index segment


 

Rollback segment Temporary segment

A single segment extent cannot span multiple datafiles and must be contiguous. The table segment contains data that corresponds to a table object in the database. These will usually be the most common type of segments since they store the actual data contained in the database. The index segment contains table index information. The index segments will contain data and ROWID pointers and will be stored as either B-tree (the most common), cluster (either B-tree or hash) or a bitmapped format, depending on the type of index object the segment is assigned to. The rollback segment contains records of changes for multiple transactions. Each transaction gets assigned to a single rollback segment extent. The most costly statement in terms of rollback segment space usage is an update because it must capture both the before and after image; the least expensive is a delete because the rollback only captures the deleted ROWIDs. The Trace Files (*.trc) The trace files are created either at process startup, when a process abnormally exits, or if certain errors occur in a process. Trace files are forced to be generated by setting initialization parameters, enabling session tracing, or by turning on tracing at the process level with the ALTER SESSION command. Traces are used to track internal process errors and to track what a user process actually does in the case of session traces. A special type of trace called a core dump is used for severe errors; it's a dump of the state of the memory in the memory core area of the SGA. Session tracing should be used to capture information for tuning session SQL processes. Using a trace file and the TKPROF utility all of a processes SQL statements can be analyzed and the statements execution plans generated. Trace files are placed in the locations specified in the initialization file using the following parameters:
  


Trace file sizes are limited by the MAX_DUMP_FILE_SIZE initialization parameter. The trace file destinations and core dump destinations should be monitored for space usage, and periodically the files should be cleaned out.


Watch out for core dumps in development environments as they take up a large amount of disk space. (Any abnormal termination of a C process using Oracle may result in a core dump, as will abnormal terminations of other third party tools, such as Visual Basic and Power Builder). If the locations for these dump files fill up, it could result in the instance stalling when it attempts to write to a trace file. Archive Logs If the database is created in ARCHIVELOG Mode, the redo entries in the redo log files are written into Archive Log files. Hence, these files are exact copies of the online redo log files and are often called Offline Log Files. ARCn, Archiver Process, is responsible for creating these offline log files, as and when the redo log files get full. In a single thread instance, these files are created sequentially. Archive log file plays an important role in database recovery and need to be applied in the order they were generated. Any loss of an archive log file will require the database to be opened with the resetlogs option and results in a loss of data. Archive Log files also provide the capability to recover a tablespace to any specific point in time in the database. For any application considered to be production- or missioncritical, archive logging must be turned on. These files can be stored to disk, tape, or even optical storage such as WORM. Using operating system backups such as BACKUP on VMS or TAR on UNIX, and with the application of archive logs, a database can be quickly recovered after disaster. Archive logs can only be used to recover when cold or hot backups are used for Oracle backup. After each successful hot or cold backup of an Oracle database, the associated archive and export files may be removed and either placed in storage or deleted. In an active database these files may average tens of megabytes per day; storage for this amount of data needs to be planned for. Just for example, at one installation doing Oracle development with no active production databases, 244 MB of archives and over 170 MB of exports were generated in a one-week period. If archive logging is turned on and you run out of archive disk space, the database stops after the last redo log is filled. Plan ahead and monitor disk usage for instances using archive logging. Archive Log Files are also required for configuring and applying changes to the standby servers, Oracle streams and dataguard. Log Miner also reads these files to capture changes done in the database. The Oracle Instance Oracle defines the Oracle instance as the shared global area (SGA) and the required background processes. The base and mandatory processes are SMON, PMON, DBWR, LGWR, CKPT and MMAN (Memory Manager). MMAN process is introduced in Oracle 10g and is used when Automatic Shared Memory Managemet is used. Depending upon the database and application type, there are other optional processes. Some of them are, ARCH, RECO (required on distributed databases), LCKn (where n is 0 to 9, one per parallel instance required), Dnnn (dispatcher process for multi-threaded server MTS),


Snnn (server process for MTS), Pnnnn (parallel query slave process for Parallel Query Option [PQO]), SNPn (snapshot/job queue processes). Apart from these background processes, there are some other mandatory background processes when using Real Application Clusters. These are LCKn, LMDn, LMON. Some memory from the SGA is allocated for these background processes and can be seen in FIXED VALUE when querying SHOW SGA. The combination of the SGA and processes is the instance. If any of the required processes dies, the Oracle instance dies. Loss of optional processes such as ARCH, RECO, LCKn or the server or dispatcher processes may result in instance failure. This type of failure is an instance failure and is automatically recovered on the next startup or by one of the other instances in a parallel server configuration as soon as the failed instance's heartbeat loss is detected. The SGA and its Contents The SGA is made up of several memory components, the largest of these is usually the database base buffer cache, followed by the shared pool area, and the redo log buffers. The shared pool consists of the shared SQL area (also known as the library cache) where SQL and PL/SQL statements are kept, the data dictionary cache where data dictionary related entries are kept, control structures (latches and locks), control structures (such as character set data, and, if MTS is being used, request and response queues). The SGA also contains the sort extent pool in non-MTS instances; in MTS instances the User Global Area (UGA) contains the sort extent pool. The Data Base Buffer Cache The major component of the SGA is the database buffers. The database buffers are controlled by the initialization parameters DB_BLOCK_SIZE and DB_BLOCK_BUFFERS or can be specified using DB_CACHE_SIZE. The DB_BLOCK_SIZE sets the size of database blocks used in all other components of the SGA. The DB_BLOCK_SIZE is a multiple of the operating system block size which is usually a multiple of 512 bytes. On most systems the DB_BLOCK_SIZE is a minimum of 2k bytes. Either of the two can be used to configure the buffer cache. For example: DB_BLOCK_SIZE = 8192 DB_BLOCK_BUFFERS = 10000 So Buffer Cache Size will be = 8192 * 10000 = 78MB or a single parameter: DB_CACHE_SIZE = 78M We strongly recommend using DB_CACHE_SIZE. The reason being db_cache_size is dynamically modifiable and provides a dynamic performance view called Buffer Cache Advisory which helps in tuning the parameter. Using either of the two mechanism to configure the buffer cache, the internal mechanism remains the same. The database buffer area is divided into the dirty list (buffers which have been modified and are waiting to be written to disk) and the least recently used


(LRU) list. Free buffers are buffers that have not been modified and are available for use. If an Oracle server (user) process attempts to move a buffer to the dirty list and the list is full, it signals DBWR, and DBWR writes the dirty buffers back to disk, placing them on the free list. The only action which adds blocks to the database buffer cache is a read by a server process at the request of a user. The database buffer cache stores copies of data blocks and is shared by all users. Alternatively, multiple buffer pools can be used based on the application understanding. DB_KEEP_CACHE_SIZE and DB_RECYCLE_CACHE_SIZE are two different parameters which are used to configure KEEP and RECYCLE buffer pools in the SGA. KEEP pool can be used to maintain frequently accessed objects whereas a RECYCLE pool can be used to prevent large objects from consuming unnecessary space in cache. The size of the buffers in the KEEP and RECYCLE pools is the primary block size specified by the DB_BLOCK_SIZE parameter. Starting with Oracle 9i, Oracle introduced multiple buffer size sub-cache to accommodate and make use of multiple block size objects. DB_nK_CACHE_SIZE parameter specifies the value of nK Buffers. The Shared Pool The next largest component of the SGA is the shared pool. The shared pool is controlled in size by the initialization parameter SHARED_POOL_SIZE. The parameter SHARED_POOL_SIZE is set in bytes and is generally set to 10-20 percent of available memory. Although in some heavy ad hoc environments it can be much larger. The shared pool holds the parsed SQL and PL/SQL code in the area known as the shared SQL area or library caches. In dedicated server environments the shared pool can be smaller than the equivalent multi-threaded server environments (MTS). This happens because, in a dedicated server system, the sort areas used by each process are kept in the users Process Global Area (PGA) while in an MTS environment the User Global Area (UGA) of the Shared Pool holds the sort areas. In both environments the temporary extents used for sorting are tracked using the Sort Extent Pool (SEP) which is a part of the shared pool. Anytime a process needs a temporary segment for sorting, it checks the SEP first before allocating a new segment. The Java Pool Java uses the shared pool and the Java pool. The default for the Java pool is 20MB. If you get memory-related errors when dealing with Java objects, increase the size of the Java pool. Error reporting in out-of-memory situations is inconsistent and in some cases may result in a core dump. If you suspect that memory usage is the cause of a core dump, adjust the JAVA_POOL_SIZE in the init.ora file. JAVA_POOL_SIZE should be set to 50,000,000 or


higher for large applications. The default value of 20,000,000 should be adequate for typical Java Stored Procedure usage. The Stream Pool This memory area is newly introduced in Oracle 10g and is used to scale up the Stream Replication. In previous versions of Oracle 10% of the Shared Pool was used for Streams replication thus causing unnecessary waste of memory and overhead on Shared Pool. Stream Pool, if configured using STREAMS_POOL_SIZE, is a separate memory area assigned in the SGA. This parameter can be specified in bytes, K, M or G and can be dynamically modified. The Large Pool This is an optional pool and is specified by using the LARGE_POOL_SIZE initialization parameter. Unlike Shared Pool, Large Pool does not have an LRU list. Hence, Oracle does not attempts to age out the objects in the large pool. The memory to this pool is used and is advisable when any of these instances are configured: 1. If the queries use Parallel query slaves. Otherwise memory from the shared pool will be used to cache parallel execution message buffers, or 2. Shared Server Configuration is used. Else, the session memory will be used from the Shared Pool, or 3. RMAN is used to take backups. Else memory from the shared pool will be used to cache I/O buffers during backups and recovery operations. Large Pool will only be used during RMAN backup and recovery operation, when either of these two parameters are set: DBWR_IO_SLAVES to non-zero or BACKUP_TAPE_IO_SLAVES to True. Other Shared Global Area Sections The rest of the SGA is set with other initialization parameters such as:

LOG_ARCHIVE_BUFFER_SIZE which sets the size of each archival buffer in log file blocks (equivalent to operating system blocks) LOG_ARCHIVE_BUFFERS which sets the number of LOG_ARCHIVE_BUFFER_SIZE sections of memory to allocate for archive log writing LOG_BUFFER which sets the size of the redo circular buffer.



The PCM Cache in Parallel Server


In the parallel server enabled database, the parameter GC_FILES_TO_LOCKS and GC_DB_LOCKS can control a large if not the largest section of SGA in the parallel server configuration. The PCM (parallel cache management) lock cache is controlled by the GC_FILES_TO_LOCKS parameter that maps the PCM locks to database block buffer size sets of datafile blocks. While the GC_DB_LOCKS sets the maximum number of PCM locks, the locks aren't mapped to an actual cache site until they are mapped to datafile blocks using the GC_FILES_TO_LOCKS parameter. Each cache entry can take up to 115 bytes of memory storage in the SGA, this memory is not taken from other SGA components but is additive. The PGA and its Contents The Process Global Area (PGA) contains the user's session variables (session information) and arrays (stack space). Each user is assigned a PGA and in multithreaded server systems, a UGA. The following parameters determine the PGA size:

OPEN_LINKS - Determines the number of database links that each process can use simultaneously. DB_FILES - Sets the soft limit for the number of database files which can be simultaneously accessed by the database (The actual limit is set by the database build parameter MAX_DATABASE_FILES). LOG_FILES - Sets the soft limit on the maximum number of log files (redo logs) available for the system (the actual maximum is set by the database build parameter MAX_LOG_FILES and MAX_LOG_GROUPS).



Though not a defined part of the SGA, the user's PGA interacts intimately with the SGA. The Oracle Instance Processes The instance consists not only of the SGA and its components, but also the set of processes which interacts with the SGA. Oracle's processes are listed below: DBWn - Database Writer Note that Oracle changed the naming format of the DBWR process to DBWn as of Oracle8, where n can be a number from 0 through 9. If you use multiple DB writer processes, you should note that the db_writers parameter has changed to db_writer_processes. The DBWn (Dirty Buffer Writer) process writes modified (or dirty) blocks from the database buffer cache to the disk files. Blocks are not written on commit, only the rollback segment segments and header are affected by a commit. In general, DBWn writes only when more data needs to be read into the system global area and too few


database buffers are free. The least recently used data is written to the datafiles first. DBWR's entire purpose in life is to manage the database buffer cache to that the server processes can always find free buffers. As of Oracle 7.3, if blocks have their lock bit set (that is, they were modified by a transaction) then as they are written out of the cache, DBWn resets their lock bits back to unlocked status. In versions prior to 7.3 the blocks were written as-is and the next process to read them reset the lock bits. A dirty buffer is any buffer that has had any of its bits twiddled. Even if only the lock bit is turned on and no other changes are performed the block must be re-written to disk. Writes are always deferred if needed to optimize system I/O. It's important to note that DBWn only writes. It does no reading other than from cache; the dedicated and MTS serer processes do all reading of disks. It is the only process that writes changed blocks from the database buffer cache to disk. DBWn behavior is controlled by initialization parameters. The initialization parameter ASYNC_WRITE tells DBWn to enable asynchronous writing to database files on systems which allow this to be switchable. On systems where asynchronous writes are the normal mode, ASYNC_WRITE is not required to be set. LGWR - Log Writer The LGWR (Log Writer) process writes out the redo buffer to the redo logs on a commit. At specific times, all modified database buffers in the system global area are written to the datafiles by DBWR; this event is called a checkpoint. The LGWR process is responsible for signaling DBWR at checkpoints and updating all the datafiles and control files of the database to indicate the most recent checkpoint. SMON - System Monitor The SMON (Server Monitor) process monitors the Oracle instance, recovers temporary segment space when it is no longer needed. It coalesces contiguous areas of freespace in database tablespaces where the default storage parameter, PCTINCREASE, is not set to zero, and recovers dead transactions skipped during crash and instance recovery because of file-read or offline errors. These transactions are eventually recovered by SMON when the tablespace or file is brought back online. If SMON dies, the instance will crash requiring instance recovery on restart. PMON - Process Monitor The PMON (Process Monitor) process cleans up failed transactions. This clean up involves cleaning out cache entries, releasing locks and freeing other process resources. If MTS is being used, PMON watches for dead dispatcher and server processes and restarts them as needed. If PMON dies, the instance will crash requiring instance recovery on startup. In the situation where an instance crashed and is then restarted,


PMON will rollback any uncommitted transactions using the information in the rollback segments. ARCH - Archiver Process The ARCH (Archiver) process is responsible for copying filled redo log files to the archive log destination. The ARCH process is only present if archive logging is enabled for the instance. If the archive log destination is full, the archive log process will silently hang, only reporting a problem to new logins and the alert log. No redo log will be reused until it has been archived if archive logging is in effect for the database. The ARCHIVE_LOG_START parameter must be set to TRUE and the ARCHIVELOG mode must be set in the database for archive logging to be in effect. If only the ARCHIVELOG mode is set via the ALTER DATABASE command, database processing will continue until all available redo logs are filled, and then Oracle will hang with an error in the alert log indicating that all online redo logs need archiving. If the ARCH process abnormally terminates, the database will continue operation until all online redo logs are filled before it will hang. RECO - Distributed Transaction Recoverer The RECO process recovers failed distributed transactions and is only present if the distributed database option is installed in the database system. The RECO process is started only if the DISTRIBUTED_TRANSACTIONS initialization parameter is set to a value greater than zero. For databases with the distributed option, the default for DISTRIBUTED_TRANSACTIONS is 32, so if you want to disable the RECO process, you must explicitly set this parameter to zero in the parameter file. LCKn - Lock Process This process is used for inter-instance locking in a parallel server environment. These multiple processes, named LCK0 through LCK9, are set via the GC_LCK_PROCS INIT.ORA parameter. Dnnnn and Snnnn Process(es) The Dnnnn (Dispatcher) process(es) (nnnn is a sequential number) distribute instance attachments to Snnnn (server) processes. Snnnn (server) processes (nnnn is a sequential number) take the place of dedicated servers for user connection in multi-threaded server enabled systems (MTS). Dnnnn and Snnnn processes are only present in MTS systems. The Dnnnn and Snnnn processes are controlled by the MTS_DISPATCHERS, MTS_MAX_DISPATCHERS, MTS_SERVERS and MTS_MAX_SERVERS initialization parameters. The MTS_DISPATCHERS and MTS_SERVERS parameters set the minimum number of Dnnnn and Snnnn processes created at instance startup. The MTS_MAX_DISPATCHERS and MTS_MAX_SERVERS parameters set the maximum number of Dnnnn and Snnnn processes allowed to be automatically started as needed.


In an MTS system the SQLNET listener process determines if a user connection should be dedicated or should use a shared server process. If the listener determines that a shared connection is appropriate, it gives the user process the address of a dispatcher process. In a dedicated server environment both the D and S processes are combined and a single process is started for each user; this single process is simply called a server process. This dedicated environment is also referred to as being a single-tasked or combined user and server process environment. LISTENER If you are running TCP/IP, this process will be running as well (only one per node). CKPTxx If the CKPT process (optional in Oracle 7.x and mandatory in ORACLE8) is present the checkpoint responsibilities of LGWR are assumed by the CKPT process. The CKPT process is controlled by the CHECKPOINT_PROCESS initialization parameter in versions less than 8.0. The LGWR process also allows for recovery of committed data in the database buffer cache at the time of an instance failure. Checkpoints help to reduce the time required to recover an instance. Checkpoints cause DBWR to write all of the blocks that have been changed since the last checkpoint to the datafile. The datafile headers and control files will also be updated to record the checkpoint. The CKPT background process may be added to split the LGWR functions of signaling a checkpoint and copying the redo log entries. CKPT is enabled by setting the INIT.ORA parameter CHECKPOINT_PROCESS to TRUE. If the LGWR or CKPT process needs a file that the DBWR process is checkpointing, it waits until DBWR completes the checkpoint operation. SNPnn The SNPnm (Snapshot) process (where n is an integer) was initially used strictly for snapshot processing. When job queues were added to the Oracle system, the SNP rpocesses were also tasked with checking and executing jobs for the job queues. A failed snapshot process may inhibit job queue or snapshot processing and will generate a trace file, but otherwise will not affect Oracle instance operation. Snapshot processes (or job queue processes) failure will result in an alert log entry. The job queue processes are controlled by the JOB_QUEUE_PROCESSES, JOB_QUEUE_INTERVAL and JOB_QUEUE_KEEP_CONNECTIONS (in some installations these may begin with SNAPSHOT instead of JOB) initialization parameters. The JOB_QUEUE_PROCESSES sets the number of SNP processes created at instance


startup. The JOB_QUEUE_INTERVAL parameter sets the wakeup interval for the processes (how often they wakeup and check the job and snapshot queue tables). The JOB_QUEUE_KEEP_CONNECTIONS initialization parameter tells the snapshot refresh process to keep database links open between process runs. The QMNnn Process Associated with Oracle's advanced queuing, this process is called the queue manager process. To enable this optional process, set the init.ora parameter aq_tm_processes. This parameter can be set to only 0 or 1. The process monitors the Oracle advanced queuing message queues. On certain platforms, there are problems enabling the QMN process. Oracle has created a workaround document to deal with this issue. Reference Oracle bug 638191 if you have a problem starting this process. Some Oracle documentation reports that you can start up to 10 of these processes, but this is not the case in Oracle8.0.5, where only one process may be started. EXTPROC These are the callout queues, there is one for each session performing call outs. Currently these processes are not multi-threaded. As of the 8.0.2 beta this was not working with environments where multi-threaded server was enabled. This functionality has been promised in 8.0.3. ARCHMON - Archive Monitor On a Unix system, this additional process may be present. This process monitors the archive process and writes the redo logs to the archives. It will require a dedicated window or terminal on BSD systems. Pnnnn Processes Pnnnn (Parallel Query Slave) processes (nnnn is a sequential number) are started and stopped by the parallel query server. A single parallel query can result in dozens of Pnnnn processes depending on table size and number of processors and disks as well as initialization parameters. Loss of a Pnnnn process may result in a failed query but otherwise shouldn't affect instance operations. On some Oracle versions, abnormal or forced termination of a Pnnnn process can result in instance termination; this is a known Oracle bug. The Pnnnn processes are controlled by the initialization parameters PARALLEL_MIN_SERVERS, PARALLEL_MAX_SERVERS, PARALLEL_MIN_PERCENT and PARALLEL_SERVER_IDLE_TIME. The PARALLEL_MIN_SERVERS parameter sets the minimum number of parallel query slave processes to start at instance startup. The PARALLEL_MAX_SERVERS parameter is used to control the maximum number of parallel query servers that can be started for


the entire instance. The PARALLEL_MIN_PERCENT parameter determines the minimum number of parallel query slave processes that must be available for a query to be processed in parallel. The PARALLEL_SERVER_IDLE_TIME parameter sets the number of seconds a slave process can be idle before it is removed. MMON MMON is the acronym for Memory Monitor, a new process introduced in Oracle Database 10g associated with the Automatic Workload Repository (AWR). AWR gets the necessary statistics for automatic problem detaction and tuning with the help of MMON. MMON writes out the required statistics for AWR on a regularly scheduled basis to disk. MMON performs various manageability related tasks like issuing an alert whenever a given metrics violates given threshold values, taking snapshots while spawning additional MMON slave processes and capturing statics values for SQL objects recently modified. MMNL Memory Monitor Light (MMNL) is another new process in Oracle Database 10g, which assists the AWR to write the full statistics buffers to disk on an as-needed basis. MMNL captures session history and performs metrics computations. MMAN This is used from internal database tasks. RBAL RBAL coordinates rebalance activities on disk groups in an ASM instance and performs global opens on ASM disks. RBAL is an ASM related process to perform the rebalancing of disks controlled by an ASM instance. ORBn ORBn performs actual data extent movement for rebalance in an ASM instance and may vary from instance to instance from ORB0 to …. OSMB This process is present on a database instance and accountable for communication between the database instance and ASM managed disk groups. RVWR Recovery Writer (RVWR) is a new background process introduced in Oracle Database 10g to write flashback logs with pre-images of data blocks to disk. The location can be specified by the db_recovery_file_dest parameter.


ASMB ASM stands for Automatic Storage Management. ASMB is the background process used to pass the necessary information to and from the Cluster Synchronization Services to manage the disk resources used by ASM. ASMB also updates statistics and maintains the heartbeat mechanism for storage management ARBx The RBAL process to do the actual rebalancing operations on ASM controlled disk recources uses ARBx processes. The total number of ARBx processes started is controlled by the asm_power_limit_parameter. CTWR Change Tracking Writer (CTWR) is a new process that works with the new block changed tracking features in Oracle Database 10g. Block Change Tracking is used to optimize incremental backups. A new file is introduced. The CTWR background process tracks changed blocks as redo is generated. RMAN database backups automatically use the change-tracking file. You can use the V$BLOCK_CHANGE_TRACKING view to monitor the usage.

Logical Structures Overview Tablespaces A Database is divided into logical structures called Tablespaces. A tablespace groups together other logical objects like Tables, Indexes etc. Each Database is comprised of one or more Tablespaces which in turn consists of one or more datafiles. Tablespaces can be created as Locally Managed (LMT) or Dictionary Managed (DMT). Due to performance reasons, we recommend creating Locally Managed Tablespaces. The free space in the LMT is stored in the header of each datafile as a bit. Therefore, it is easier for Oracle to check for the free space instead of checking the data dictionary objects as in DMT. This reduces the number of recursive calls made to the data dictionary on behalf of a user session and thus improves the performance greatly. SYSTEM and SYSAUX Tablespaces Every Oracle Database contains a SYSTEM Tablespace and a SYSAUX Tablespace. These tablespaces are created automatically. The SYSTEM Tablespace contains the Data Dictionary called as the Metadata of Oracle Database. SYSAUX tablespace contains system-related tables, such as statistics repository, that are not part of the core data dictionary. SYSAUX tablespace has been introduced as an auxiliary tablespace to SYSTEM tablespace to reduce load on the system tablespace. Many functionalities which used SYSTEM Tablespaces as default now use the SYSAUX tablespace. Other Oracle


products, like OEM, Logical Standby, Log Miner, Data Mining, Oracle Streams, Statspack etc., which required a dedicated tablespace to store their own repository can now use SYSAUX tablespace thus reducing the number of tablespaces in a Database. The SYSAUX Tablespace is created at the time of database creation. The SYSAUX Datafile is created using the SYSAUX DATAFILE clause in the CREATE DATABASE statement. The SYSAUX Tablespace should have a minimum size of 250m. The SYSAUX Tablespace needs to be PERMANENT with extent management local and ASSM (Automatic Segment Space Management). SYSAUX Tablespace cannot be made read only. Hence, proper care should be taken while creating the SYSAUX tablespace as the tablespace attributes are not modifiable once they are set. A simple Oracle Database 10g Installation will have following as its mandatory tablespace components: 1. The System Tablespace: Tablespaces , as mentioned earlier, are logical and are accessible only when the Database is UP and Running. Ttablespaces consist of one or more datafiles which are physically located in the Operating Systems. The System Tablespace stores the metadata of Oracle Database and hence is managed internally by Oracle. 2. The SYSAUX Tablespace: This tablespace is new in Oracle Database 10g. It stores non data dictionary objects such as the AWR tables. SEGMENTS All Tablespaces, discussed above, contains segments. A segment is associated to single tablespace, except partitioned tables and indexes, in which each partition can be assigned to different tablespaces. Different types of Segments are: 1. 2. 3. 4. Table Segment Index Segment Temporary Segments Undo Segments

A basic structure of an Oracle Database is a Table. Oracle 10g supports many types of Tables, including the following: Normal Heap Table: These are normal heap tables required to store rows inserted by an application. These rows can be further manipulated using DML commands. These tables are created using a simple CREATE TABLE command. Index-Organized Tables (IOT's): IOT's allow storing of data in a sorted manner using a primary key, similar to an index. An IOT stores the table as if it is an index. These are used to improve performance of the application queries in which the table is frequently accessed based on the primary key. There are certain limitations when using IOT's. If the


table is frequently updated or is frequently accessed using a non-primary key column, it might impact the performance. No additional indexes can be created on these tables. External Tables: Contents in a flat file are considered as a table and users can access these by directly querying against these external tables. Partitioned Tables: These are used to create multiple sub-tables based on some keys. Each sub-table will store a particular subset of data based on the partitioned key. These are particularly useful for large tables, since the queries involve only an identifiable subset of data in the appropriate partitions without accessing other partitions. These are used for high availability and to improve query performance. Materialized Views: These are also used to improve query performance. Unlike normal views, which do not store the data and only stores a query definition, materialized views store the data itself which can be retrieved without actually querying these views. Materialized Views were first introduced in Oracle 8 and were called Snapshots. These were then renamed to Materialized View with some more features. Snapshot v/s Materialized Views The concept of Snapshots was originally taken from Oracle Discoverer, but the queries executed from applications other than Discoverer were unable to take advantage of the pre-computed data. Hence, Oracle came out with the concept of a Snapshot in Oracle 8. The drawback was that the queries were to be rewritten by providing the name of the Snapshot instead of the base tables. Users, who knew that the data required by them was pre-computed and stored in the snapshot, could execute their queries against the snapshot. But those users who were unaware of this still executed the queries against the base tables, thus consuming resource. Oracle then introduced materialized views. Mviews were similar to snapshots but the queries which were to be rewritten manually were rewritten by the optimizer automatically. Thus the application and the queries remained unchanged. Initialization parameters QUERY_REWRITE_ENABLED and QUERY_REWRITE_INTEGRITY dictate the behaviour of the materialized views. Temporary Tables Temporary Tables are created and used to store data required for temporary processing. Multiple users can insert and manipulate data into these tables. A user can see only the data entered by his session and not another session. These tables are created using the CREATE GLOBAL TEMPORARY TABLE clause. When the data is inserted into these tables, the segments allocated to these are allocated from a temporary tablespace. Indexes can be created on these tables. These tables can be created using the DELETE ON COMMIT clause which deletes rows once commited, or the PERSERVE ROWS clause which preserves the rows in the table even after commit but only through the duration of a session. Clustered Tables


Clustered Tables are used to store data of two tables which are joined and queried frequently. Dropped Tables Oracle 10g introduced this new functionality which enables a user to quickly recover a dropped table. This can be done using the FLASHBACK TABLE TO BEFORE DROP command. Flashback can be used to recover multiple tables at a time or the entire database to a prior point in time. Objects dropped using the DROP command are due to release space unless the PURGE command is issued to finally purge the document. Once the object is dropped, it is recorded in the USER's RECYCLEBIN view. This view can be queried based on the object_name recorded in the ORIGINAL column or based on Tablespace_name recorded in TS_NAME columm. Index Segment Indexes are used to speed up the access to the tables. Oracle provides many types of Indexes. These are: B*tree Indexes These are the default and are used to access the tables based on some search condition made on index columns. These are created using the CREATE INDEX command. It stores the rowid of the table row along with the column value on which it is created. Bitmap Indexes Bitmap Indexes are mostly created and used in Datawarehouse applications. Answer to the Adhoc Queries are retrieved quickly using Bitmap Indexes since the optimizer has the ability to use more than one index on a table in a single query. Reverse Key Indexes Reverse Key Indexes are well suited for columns in which the data is entered via sequence. A normal index will have a severe I/O contention and creating a reverse key index will eliminate this contention. A reverse key index is not advisable if the column on which it is created is queried with range predicates. It is well suited only if the column is queried using equality predicate. Function Based Indexes These can be created on functions that are commonly used in the query to return some rows based on the input. The output of the function should be deterministic otherwise the optimizer will not use these indexes. Partitioned Indexes


These are created on partitioned tables and are either local to table partitions or global to all rows in a table. Text Indexes Text is stored in the database by either physically storing the text in a table or using a pointer to an external file. These indexes speed up a text search. Oracle 10g provides CONTEXT, CTXCAT and CTXRULE indexes to speed up a text search. ORACLE BLOCKS When a datafile is created and assigned to a tablespace, Oracle formats the file into blocks for the actual storage. The size of each Oracle block is determined by the DB_BLOCK_SIZE initialization parameter, which is specified at the time of creation of database. Normally, Oracle block size is in multiples of OS Block size, which is generally 512 bytes. An Oracle block is a basic unit of storage into which all Oracle objects store data. Oracle block is divided into three distinct segments: 1. Fixed Block Header 2. Variable Block Header, and 3. Data Storage Area Structure of an Oracle Block Fixed Block Header Variable Block Header ROW1 ROW2 ROW3 … Row 1 Data Row 2 Data Row 3 Data Free Space Fixed Block Header The fixed block header is exactly the same size for every block of a table, regardless of the Oracle Block size. The size depends on the INITRANS setting which is 2 by default. Variable Block Header The Variable Block Header immediately follows the fixed block header in the block, and is dependent on the number of rows stored in the block.


Although Oracle stores rows of data in Oracle blocks, which exist in the context of a particular operating system file, the Administrator has little control over which particular block is used for a particular database object. However, the administrator does control which tablespace an object is created in and how much space within that tablespace is allocated to the object. This control is made at the time of table creation. A sample create table command is show below: CREATE TABLE TEST ( EMPID NUMBER(4) primary key, ENAME VARCHAR2(30), DOB DATE, SAL NUMBER(10,2)) PCTFREE 0 PCTUSED 50 INITRANS 2 MAXTRANS 20 TABLESPACE USER_DATA STORAGE (INITIAL 10M NEXT 10M PCTINCREASE 0 MINEXTENTS 2); Space usage within the blocks is determined by the PCTFREE, PCTUSED, INITRANS, and MAXTRANS parameters, whereas the space usage within the tablespace or datafile is determined by the STORAGE clause. PCTFREE sets the percentage of the block space reserved for future updates of the rows inserted into the block. The default setting for PCTFREE is 10 percent. If records within a block are rarely updated after they are inserted, you should set PCTFREE lower to allow for full space usage. If records within a block are subject to many updates, you should set PCTFREE higher to allow for more growth. Once the PCTFREE threshold is reached, no additional rows will be inserted until the PCTUSED threshold is reached due to deletion of data. If PCTFREE is set too low, row chaining and row migration will result because updates to the record will not fit into the block. ROW Chaining and Migration are performance killers as an additional I/O is incurred when accessing these rows. PCTUSED sets the threshold for determining when the block is available for more rows to be inserted. The default setting for PCTUSED is 40 percent. If data is static, you will be able to set the PCTUSED lower and more fully use the space. If large amounts of data are inserted and deleted, you should set the PCTUSED higher to prevent block fragmentation. INITRANS is the initial number of concurrent transactions allocated in each block header when the block is allocated. The default is 2, which is the minimum. The maximum is 255. The size of each transaction entry is operating system-dependent. MAXTRANS is the maximum number of concurrent transactions for the blocks within the tablespace. The default is 255, which is the maximum. The minimum setting is 2. When an object is created, there must be sufficient contiguous space for the initial extent. For rollback segments, an initial and at least one additional extent is required during creation. If sufficient contiguous free space is not available, the extent cannot be created and the user will not be able to create the table or rollback segment. When an object


needs an additional extent, there must be sufficient contiguous free space to meet the NEXT_EXTENT size requirement. If there is not sufficient contiguous space for the next extent to be created, the end user will receive an error message that there is not sufficient space to create the extent. The settings for objects created take the defaults for the tablespace within which they are created unless you specify otherwise. The table below shows the extent allocation parameters and the default settings. Parameter INITIAL NEXT MAXEXTENTS MINEXTENTS PCTINCREASE Description Size in bytes for the first extent; default is 5 Oracle blocks Size in bytes for the next extent to be allocated; default is 5 Oracle blocks Total number of extents allowed; default is operating systemdependent Total number of extents to be allocated when the object is created; default is 1 for data, except rollback segments, which is 2 Percent the next extent will increase over the current next extent setting; default is 50 percent; recommended setting is 0 percent

PCTFREE and PCTUSED play an important role in FREELIST management. Administrators face a difficult task in deriving an optimal value for these parameter to reduce freelist contention. Automatic Segment Space Management (ASSM) eliminates this task and maintains it automatically thus reducing I/O Contention. When using ASSM, the user needs to mention a proper value of PCTFREE, to keep aside some percentage of the row free for future updates. The SYSAUX Tablespace The SYSAUX tablespace is a new tablespace component in Oracle10g. When you upgrade your Oracle database to Oracle 10g, you need to create a new tablespace called SYSAUX. Also, when you create your first Oracle 20g database, Oracle will create SYSAUX. The SYSAUX Tablespace The SYSAUX tablespace is a new tablespace that is required in Oracle Database 10g. SYSAUX is a secondary tablespace for storage of a number of database components that were previously stored in the SYSTEM tablespace. It is created as a locally managed tablespace using automatic segment space management. The SYSAUX Tablespace should have a minimum size of 250M. SYSAUX Tablespace needs to be PERMANENT with extent management local and ASSM (Automatic Segment Space Management). The SYSAUX Tablespace cannot be made read only. Hence, proper care should be taken while creating the SYSAUX tablespace as the tablespace attributes are not modifiable once these are set.


Previously, many Oracle features required their own separate tablespaces (such as the RMAN recovery catalog, Ultra Search, Data Mining, XDP, and OLAP). This increases the management responsibility of the DBA. The SYSAUX tablespace consolidates these tablespaces into one location, which becomes the default tablespace for these Oracle features. By default, when you create an Oracle database, Oracle creates the SYSAUX tablespace for you. If you are using Oracle Managed Files (OMF), the tablespace is created in the appropriate OMF directory. If you use the sysaux datafile clause in the CREATE DATABASE statement, the SYSAUX tablespace datafile(s) will be created in the location you define. Finally, if no sysaux datafile clause is included and OMF is not configured, Oracle creates the SYSAUX tablespace in a default location that is OS-specific. Here is an example of a create database statement with the sysaux datafile clause in it: CREATE DATABASE my_db DATAFILE 'c:\oracle\oradata\my_db\my_db_system_01.dbf' SIZE 300m SYSAUX DATAFILE 'c:\oracle\my_db\my_db_sysaux_01.dbf' SIZE 100m DEFAULT TEMPORARY TABLESPACE dtemp_tbs tempfile 'c:\oracle\my_db\my_db_temp_01.dbf' SIZE 100m UNDO TABLESPACE undo_tbs_one DATAFILE 'c:\oracle\my_db\my_db_undo_tbs_one_01.dbf' SIZE 100m; When you migrate to Oracle Database 10g, the SYSAUX tablespace needs to be created as a part of that migration. This is done after mounting the database under the new Oracle Database 10g database software. Once you have mounted it, open the database in migrate mode with the startup migrate command. After the database is open, you can create the SYSAUX tablespace. Here is the create tablespace statement to use in order to perform this operation: CREATE TABLESPACE sysaux DATAFILE 'c:\oracle\oradata\my_db\my_db_sysaux_01.dbf' SIZE 300m EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO; The SYSAUX tablespace must be created with the attributes shown in the preceding example. The following restrictions apply to the usage of the SYSAUX tablespace:

When migrating to Oracle 10g, you can create the SYSAUX tablespace only when the database is open in migrate mode. When migrating to Oracle 10g, if a tablespace is already named SYSAUX, you need to remove it or rename it while you are in migrate mode. Once you have opened your Oracle 10g database, you cannot drop the SYSAUX tablespace. If you try, an error will be returned. You cannot rename the SYSAUX tablespace during normal database operations.






The SYSAUX tablespace cannot be transported to other databases via Oracle's transportable tablespace feature. Once the SYSAUX tablespace is in place and the database has been upgraded, you can add or resize datafiles associated with a SYSAUX tablespace just as you would any other tablespace through the ALTER TABLESPACE command, as shown in this example:

ALTER TABLESPACE sysaux ADD DATAFILE 'd:\oracle\oradata\my_db\my_db_sysaux_01.dbf' SIZE 200M; Managing Occupants of the SYSAUX Tablespace Each set of application tables within the SYSAUX tablespace is known as an occupant. The V$SYSAUX_OCCUPANTS view manages occupants in the SYSAUX tablespace. This view allows you to monitor the space usage of occupant application objects in the SYSAUX tablespace, as shown in this example: SELECT occupant_name, space_usage_kbytes FROM v$sysaux_occupants; In this case, Oracle will display the space usage for the occupants, such as the RMAN recovery catalog. If you determine that you need to move the occupants out of the SYSAUX tablespace, then the MOVE_PROCEDURE column of the V$SYSAUX_OCCUPANTS view will indicate the procedure that you should use to move the related occupant from the SYSAUX tablespace to another tablespace. This can also be a method of "reorganizing" your component object tables, should that be required. Note: The loss of the SYSAUX tablespace is not fatal to your database. In our testing it appears that the only real impact is that certain functionality related to the occupants of the SYSAUX tablespace is lost. CREATE TABLE - Insights Tables are the most common structures in an Oracle database to store your application data. These structures contain header and data sections. The segment header contains freelist information, the block header contains transaction entries and the row header contains row length (one byte if less than 256 bytes, else 3 bytes if longer than 256 bytes) and data type indicators. The data, or row sections contain the actual data values. The header grows from the bottom up while the data area grows from the top down. Table placement in relation to other database objects is important for optimal database performance. You should attempt to minimize disk contention by placing tables away from other IO intensive database objects such as redo logs, rollback segments, and other highly active database segments. For some large tables you may want to consider manual striping against several disks however this is a labor-intensive operation and


before considering its use you must thoroughly understand how your data is used. This manual striping of tables involves manually striping tablespace datafiles across several disks and then sizing the table extents such that they just fill a single tablespace datafile. Table partitioning has made manual striping a much easier task. All other database objects either depend completely on tables for existence (such as indexes) or, operate against tables (functions, procedures, packages) or, provide data for tables (sequences). Most tables are relational tables. Oracle 8 introduced the ability to create object tables. Oracle 8i introduced the ability to create index-organized tables and XML-type tables. Requirements

To create a table in your schema, you must have the CREATE TABLE system privilege and quota on the tablespace you wish to create the table in, or the UNLIMITED TABLESPACE system privilege. To create a table in a different schema, you must have CREATE ANY TABLE system privilege. The owner of the table must have quota on the tablespace you wish to create the table in, or the UNLIMITED TABLESPACE system privilege.


Constraint Insights

The foreign key constraint is enforced; however, no index is automatically generated. It is suggested that indexes be maintained on foreign keys or else excessive full table scans may result on DML operations.

Cache Insights

You can specify table caching on a per-table basis with the CACHE clause of the CREATE TABLE, ALTER TABLE, CREATE CLUSTER, and ALTER CLUSTER commands. You should use a hint to specify the caching behavior if you want a table treated differently upon each access.

Parameter Insights

PCTFREE and PCTUSED Should be set to a small value if the table is infrequently updated (such as reference or code lists). If a value of 0 (zero) is specified then no space is reserved for update activity and any update will force a row to chain once the data block is filled. Should be set to a high value for tables which have large amounts of update activity. A high value will leave unused space, in infrequently updated tables this equates to wasted space.


For example, for a high use (update activity) table you may wish to set PCTFREE to 40 and PCTUSED to 60 or less (remember, the sum cannot exceed 100) . For a low update table (such as a history table) you may wish to set PCTFREE to 5 and PCTUSED to 95 or less. For a table where you have initially loaded data, but expect updates, run statistics to determine initial average row length, then allow updates to occur and re-run the statistics. By comparing the first average row length to the second you will see the amount that an average update changes the row, set PCTFREE accordingly. growth rate = ((L2-L1)/L1)*100 Where: L2 = Second value for average row length L1 = First value for average row length (before updates) The sum of PCTUSED and PCTFREE can never exceed 100, but it should be less than 100. If PCTUSED plus PCTFREE equals 100, then the block can be bounced on and off the freelist with every UPDATE and DELETE on this block.


INITRANS is used to set the minimum number of transactions that can access a data block. Normally this parameter (which defaults to 1) should not be changed. MAXTRANS is used to specify the maximum number of transactions that can access the datablocks within a table. This parameter (which defaults to 255) shouldn't be altered from its default.


Creating Tables with Default Values You define a default value when you issue a create table statement, as shown here: CREATE TABLE my_table ( id NUMBER PRIMARY KEY, uname VARCHAR2(30) DEFAULT 'UNKNOWN'); In this case, we have defined a default value for the uname column. This default value will be used only if the default keyword is used in an insert or update of the column. Materialized Views - Overview The concept of snapshots was expanded in Oracle 8i to include materialized views. Materialized views are similar to snapshots except that they can reside in the same database as their master table(s). Another powerful feature of materialized views is that they can be the subject of DIMENSIONS, that explains to the optimizer how a


hierarchical or parent-child relationship in a materialized view is constructed, thus allowing query re-write. Query re-write is when the optimizer recognizes that a query on a materialized views base table(s) can be re-written to use the materialized view to operate more efficiently. Materialized Views (Snapshots) Another feature of Oracle that needs administration is the snapshot, now known as a materialized view. Materialized views are copies of either an entire single table or set of its rows (simple snapshot) or a collection of tables, views, or their rows using joins, grouping, and selection criteria (complex snapshots). Materialized views are very useful in a distributed environment where remote locations need a query-able copy of a table from the master database. Instead of paying the penalty for using the network to send out the query and get back the data, the query is against a local table image and is thus much faster. With later versions of Oracle 7 and in Oracle 8 and Oracle 8i, snapshots can be made update-able. The new materialized view is actually a special form of “same database” snapshot. Materialized views are asynchronous in nature; it reflects a table’s or a collection’s state at the time the snapshot was taken. A simple snapshot or materialized view can be periodically refreshed by the use of either a snapshot log containing only the changed rows for the snapshot (fast refresh), or a totally new copy (complete refresh). In most cases, the fast refresh is quicker and just as accurate. A fast refresh can only be used if the snapshot or materialized view has a log, and that log was created prior to the creation or last refresh of the snapshot. For a complex snapshot or materialized view, a complete refresh is required. It is also possible to allow the system to decide which to use. One problem with a snapshot or materialized view log is that it keeps a copy of each and every change to a row. Therefore, if a row undergoes 200 changes between one refresh and the next, there will be 200 entries in the snapshot or materialized view log that will be applied to the snapshot at refresh. This could lead to the refresh of the snapshot or materialized view taking longer than a complete refresh. Each snapshot or materialized view should be examined for the amount of activity it is seeing and if this is occurring with any of them, the snapshot or materialized view log should be eliminated or the refresh mode changed to COMPLETE. A materialized view is simply a snapshot that is contained in the current instance instead of a remote instance. Other than the keyword MATERIALIZED VIEW the CREATE SNAPSHOT and CREATE SNAPSHOT LOG commands are identical to the CREATE MATERIALIZED VIEW and CREATE MATERIALIZED VIEW LOG commands. If you wish query rewrite to be available on the materialized views created, the owner of the underlying tables and the materialized view must have QUERY REWRITE or, the creator of the materialized view must have GLOBAL QUERY REWRITE privilege.


In a data warehousing situation a materialized view can be used by Oracle to re-write queries on the fly that the optimizer determines would profit from using the materialized view rather than the base tables. You should take this into consideration when the concurrency of the data is important since a materialized view is only as current as its last refresh. Materialized View Logs In order to facilitate fast refreshes of materialized views, you must create a materialized view log. If a materialized view log is not available for a materialized view, a fast refresh cannot be done and a complete refresh is the only refresh mechanism. A complete refresh involves truncating the materialized view table and then repopulating the materialized view by re-executing its build query. A fast refresh uses the information in the materialized view log to only apply changes to the materialized view. A materialized view can have a materialized view log to speed refresh operations. The materialized view log must exist before a materialized view will use it. Partitioned Tables Oracle 8 provides the next step in partitioning data with the advent of partitioned tables and partitioned indexes. Now, rather than just striping extents, or creating a partitioned view, DBAs can partition based on data values across multiple tablespaces. Oracle supports very large partitioning schemes, allowing you to create up to 64,000 partitions for a single object. A partitioned table has to be a straight relational table in Oracle 8, in Oracle8i this restriction is removed and you must be careful to allow for all LOB or Nested storage to be carried through to all partition storage areas. A partitioned table is used to split up a table's data into separate physical as well as logical areas. This gives the benefits of being able to break up a large table in more manageable pieces and allows the Oracle kernel to more optimally retrieve values. Equipartitioned And Nonequipartitioned Objects You can create equipartitioned objects or nonequipartitioned objects. Two tables or indexes are said to be equipartitioned if they have identical logical partitioning attributes. However, objects don't have to be of the same type to be equipartitioned; a table and an index can be equipartitioned if they have identical logical partitioning attributes. The benefits of equipartitioning include:

Offering possible performance increases in SQL operations. Equipartitioning table partitions and index partitions can reduce the number of sorts and joins that the statement requires.



Making tablespace recovery much easier. For example, you may equijoin a parent and child table, placing them in the same tablespace. Thus, you could recover corresponding table and index partitions to a point in time using tablespace point in time recovery available with Oracle8. Equipartitioning might also make development platforms easier to use if you often have to recover certain large tables because of data-destructive testing. If you can partition the data that is being changed into its own partition and its own tablespace, recovery back to the point in time before the destructive testing might be quicker. Creation and maintenance of equipartitioned indexes on a partitioned table is generally easier than creation and maintenance on nonequipartitioned indexes.


The Partition Key Of A Partitioned Table A partition key may consist of up to 16 columns and is an ordered list of the values for the partitioning columns. A partition key may not contain the LEVEL, ROWID, or MLSLABEL pseudocolumns or any column that is of type ROWID. The partition key is sorted by its binary value. Partitioning allows you to size each partition as you wish. Controlling the size of each partition is handy in cases where the data is cyclical: Large volumes come in during well-defined times, and smaller volumes come in during others. You can then more effectively manage your disk space, allocating a smaller amount of space to one partition and allocating a larger amount to another. Partitioned Table Example In this example we have a sales entity that will store results from sales for the last twelve months. This type of table is a logical candidate for partitioning because: 1. Its values have a clear separator (months). 2. It has a sliding range (the last year). 3. We usually access this type of date by sections (months, quarters, years). The DDL for this type of table would look like this: CREATE TABLE sales ( acct_no NUMBER(5), sales_person VARCHAR2(32), sales_month NUMBER(2), amount_of_sale NUMBER(9,2), po_number VARCHAR2(10)) PARTITION BY RANGE (sales_month) ( PARTITION sales_mon_1 VALUES LESS THAN (2), PARTITION sales_mon_2 VALUES LESS THAN (3), PARTITION sales_mon_3 VALUES LESS THAN (4), PARTITION sales_mon_4 VALUES LESS THAN (5),


PARTITION sales_mon_5 VALUES LESS THAN (6), PARTITION sales_mon_6 VALUES LESS THAN (7), PARTITION sales_mon_7 VALUES LESS THAN (8), PARTITION sales_mon_8 VALUES LESS THAN (9), PARTITION sales_mon_9 VALUES LESS THAN (10), PARTITION sales_mon_10 VALUES LESS THAN (11), PARTITION sales_mon_11 VALUES LESS THAN (12), PARTITION sales_mon_12 VALUES LESS THAN (13), PARTITION sales_bad_mon VALUES LESS THAN (MAXVALUE)); In the above example we created the sales table with 13 partitions, one for each month plus an extra to hold improperly entered months (values >12). Always specify a last partition to hold MAXVALUE values for your partition values. The MAXVALUE clause allows you to insert partition key values into a partitioned table greater than the maximum partitioned boundary. The MAXVALUE clause represents all values possible that are higher than the greatest partition boundary in a partitioned table Subpartitioning New to Oracle 8i is the concept of subpartitioning. Subpartitioning allows a table partition to be further subdivided to allow for better spread of large tables. In the next example we create a table for tracking the storage of data items stored by various departments. We partition by storage date on a quarterly basis and do a further storage subpartition on data_item. The normal activity quarters have 4 partitions, the slowest has 2 and the busiest has 8. CREATE TABLE test5 (data_item INTEGER, length_of_item INTEGER, storage_type VARCHAR(30), owning_dept NUMBER, storage_date DATE) PARTITION BY RANGE (storage_date) SUBPARTITION BY HASH(data_item) SUBPARTITIONS 4 STORE IN (data_tbs1, data_tbs2, data_tbs3, data_tbs4) (PARTITION q1_1999 VALUES LESS THAN (TO_DATE('01-apr-1999', 'dd-mon-yyyy')), PARTITION q2_1999 VALUES LESS THAN (TO_DATE('01-jul-1999', 'dd-mon-yyyy')), PARTITION q3_1999 VALUES LESS THAN (TO_DATE('01-oct-1999', 'dd-mon-yyyy')) (SUBPARTITION q3_1999_s1 TABLESPACE data_tbs1, SUBPARTITION q3_1999_s2 TABLESPACE data_tbs2), PARTITION q4_1999 VALUES LESS THAN (TO_DATE('01-jan-2000', 'dd-mon-yyyy')) SUBPARTITIONS 8 STORE IN (q4_tbs1, q4_tbs2, q4_tbs3, q4_tbs4, q4_tbs5, q4_tbs6, q4_tbs7, q4_tbs8),


PARTITION q1_2000 VALUES LESS THAN (TO_DATE('01-apr-2000', 'dd-mon-yyyy'))): / The items to notice in the above code example is that the partition level commands override the default subpartitioning commands, thus, partition Q3_1999 only gets two subpartitions instead of the default of 4 and partition Q4_1999 gets 8. The main partitions are partitioned based on date logic while the subpartitions use a hash value calculated off of a VARCHAR2 value. The subpartitioning is done on a round robin fashion depending on the hash value calculated filling the subpartitions equally. Note that no storage parameters were specified in the example. The tablespaces were created such that the default storage for the tablespaces matched what we needed for the subpartitions. This made the example code easier to write and clearer to use for the visulaization of the process involved. Modifying Table Partitions You can modify a table or table partition in any of the following ways. You cannot combine partition operations with other partition operations or with operations on the base table in one ALTER TABLE statement. ADD PARTITION Use ALTER TABLE ADD PARTITION to add a partition to the high end of the table (after the last existing partition). If the first element of the partition bound of the high partition is MAXVALUE, you cannot add a partition to the table. You must split the high partition.
 

You can add a partition to a table even if one or more of the table indexes or index partitions are marked UNUSABLE. You must use the SPLIT PARTITION clause to add a partition at the beginning or the middle of the table.

The following example adds partition JAN98 to tablespace YR98: ALTER TABLE sales ADD PARTITION jan98 VALUES LESS THAN( '980201' ) TABLESPACE yr98; DROP PARTITION ALTER TABLE DROP PARTITION drops a partition and its data. If you want to drop a partition but keep its data in the table, you must merge the partition into one of the adjacent partitions.



If you drop a partition and later insert a row that would have belonged to the dropped partition, the row will be stored in the next higher partition. However, if you drop the highest partition, the insert will fail because the range of values represented by the dropped partition is no longer valid for the table. This statement also drops the corresponding partition in each local index defined on table. The index partitions are dropped even if they are marked as unusable. If there are global indexes defined on table, and the partition you want to drop is not empty, dropping the partition marks all the global, nonpartitioned indexes and all the partitions of global partitioned indexes as unusable. When a table contains only one partition, you cannot drop the partition. You must drop the table.




The following example drops partition JAN96: ALTER TABLE sales DROP PARTITION jan96; EXCHANGE PARTITION This form of ALTER TABLE converts a partition to a nonpartitioned table and a table to a partition by exchanging their data segments. You must have ALTER TABLE privileges on both tables to perform this operation. The statistics of the table and partition, including table, column, index statistics, and histograms, are exchanged. The aggregate statistics of the partitioned table are recalculated. The logging attribute of the table and partition is exchanged. The following example converts partition FEB97 to table SALES_FEB97: ALTER TABLE sales EXCHANGE PARTITION feb97 WITH TABLE sales_feb97 WITHOUT VALIDATION; MODIFY PARTITION Use the MODIFY PARTITION options of ALTER TABLE to:
  

Mark local index partitions corresponding to a table partition as unusable. Rebuild all the unusable local index partitions corresponding to a table partition. Modify the physical attributes of a table partition.

The following example marks all the local index partitions corresponding to the APR96 partition of the procurements table UNUSABLE: ALTER TABLE procurements MODIFY PARTITION apr96


UNUSABLE LOCAL INDEXES; The following example rebuilds all the local index partitions which were marked UNUSABLE: ALTER TABLE procurements MODIFY PARTITION jan98 REBUILD UNUSABLE LOCAL INDEXES; The following example changes MAXEXTENTS for partition KANSAS_OFF: ALTER TABLE branch MODIFY PARTITION kansas_off STORAGE(MAXEXTENTS 100) LOGGING; MOVE PARTITION This ALTER TABLE option moves a table partition to another segment. MOVE PARTITION always drops the partition's old segment and creates a new segment, even if you do not specify a new tablespace.

If partition_name is not empty, MOVE PARTITION marks all corresponding local index partitions and all global nonpartitioned indexes and all the partitions of global partitioned indexes as unusable. ALTER TABLE MOVE PARTITION obtains its parallel attribute from the PARALLEL clause, if specified. If not specified, the default PARALLEL attributes of the table, if any, are used. If neither is specified, it performs the move without using parallelism. The PARALLEL clause on MOVE PARTITION does not change the default PARALLEL attributes of table.



The following example moves partition STATION3 to tablespace TS097: ALTER TABLE trains MOVE PARTITION station3 TABLESPACE ts097 NOLOGGING; MERGE PARTITION While there is no explicit MERGE statement, you can merge a partition using either the DROP PARTITION or EXCHANGE PARTITION clauses. You can use either of the following strategies to merge table partitions. If you have data in partition PART1, and no global indexes or referential integrity constraints on the table, PARTS, you can merge table partition PART1 into the next highest partition, PART2. To merge partition PART1 into partition PART2:


1. Export the data from PART1. 2. Issue the following statement: ALTER TABLE PARTS DROP PARTITION PART1;

3. Import the data from step 1 into partition PART2. Note: The corresponding local index partitions are also merged. Here is another way to merge partition PART1 into partition PART2: 1. Exchange partition PART1 of table PARTS with "dummy" table PARTS_DUMMY. 2. Issue the following statement: ALTER TABLE PARTS DROP PARTITION PART1;

3. Insert as SELECT from the "dummy" tables to move the data from PART1 back into PART2. SPLIT PARTITION The SPLIT PARTITION option divides a partition into two partitions, each with a new segment, new physical attributes, and new initial extents. The segment associated with the old partition is discarded.

This statement also performs a matching split on the corresponding partition in each local index defined on table. The index partitions are split even if they are marked unusable. With the exception of the TABLESPACE attribute, the physical attributes of the LOCAL index partition being split are used for both new index partitions. If the parent LOCAL index lacks a default TABLESPACE attribute, new LOCAL index partitions will reside in the same tablespace as the corresponding newly created partitions of the underlying table. If you do not specify physical attributes (PCTFREE, PCTUSED, INITRANS, MAXTRANS, STORAGE) for the new partitions, the current values of the partition being split are used as the default values for both partitions.





If partition_name is not empty, SPLIT PARTITION marks all affected index partitions as unusable. This includes all global index partitions as well as the local index partitions which result from the split. The PARALLEL clause on SPLIT PARTITION does not change the default PARALLEL attributes of table.


The following example splits the old partition STATION5 creating a new partition for STATION9: ALTER TABLE trains SPLIT PARTITION STATION5 AT ( '50-001' ) INTO ( PARTITION station5 TABLESPACE train009 (MINEXTENTS 2), PARTITION station9 TABLESPACE train010 ) PARALLEL ( DEGREE 9 ); TRUNCATE PARTITION Use TRUNCATE PARTITION to remove all rows from a partition in a table. Freed space is deallocated or reused depending on whether DROP STORAGE or REUSE STORAGE is specified in the clause.

This statement truncates the corresponding partition in each local index defined on table. The local index partitions are truncated even if they are marked as unusable. The unusable local index partitions are marked valid, resetting the UNUSABLE indicator. If there are global indexes defined on table, and the partition you want to truncate is not empty, truncating the partition marks all the global nonpartitioned indexes and all the partitions of global partitioned indexes as unusable. If you want to truncate a partition that contains data, you must first disable any referential integrity constraints on the table. Alternatively, you can delete the rows and then truncate the partition.



The following example deletes all the data in the part_17 partition and deallocates the freed space: ALTER TABLE shipments TRUNCATE PARTITION part_17 DROP STORAGE; RENAME Use the RENAME option of ALTER TABLE to rename a table or to rename a partition. The following example renames a table:


ALTER TABLE emp RENAME TO employee; In the following example, partition EMP3 is renamed: ALTER TABLE employee RENAME PARTITION emp3 TO employee3; Modifying a Table's Sub-partitions The addition of sub-partitioning including the ability to do a mixed partition (hash and range) adds a whole new level of complexity to the Oracle ALTER commands. Let's examine a few ALTER commands for use with Oracle8i and table subpartitions. MODIFY PARTITON partition_name ADD SUBPARTITION This form of the ALTER TABLE command adds a subpartition. In a table where a new subpartition is added, any local indexes involving the subpartition must be rebuilt. To add subpartions, the table must already be composit partitioned (have subpartitions.) An example would be to add 4 bi-weekly subpartitions to the SALES table for each month (assuming a weekly subpartition already exists): ALTER TABLE sales MODIFY PARTITION feb97 ADD SUBPARTITION biweek1 TABLESPACE feb_subs; ALTER TABLE sales MODIFY PARTITION feb97 ADD SUBPARTITION biweek2 TABLESPACE feb_subs; ALTER TABLE sales MODIFY PARTITION feb97 ADD SUBPARTITION biweek3 TABLESPACE feb_subs; ALTER TABLE sales MODIFY PARTITION feb97 ADD SUBPARTITION biweek4 TABLESPACE feb_subs; EXCHANGE SUBPARTITION This form of ALTER TABLE converts a partition to a nonpartitioned table and a table to a partition by exchanging their data segments. You must have ALTER TABLE privileges on both tables to perform this operation.

The statistics of the table and partition, including table, column, index statistics, and histograms, are exchanged. The aggregate statistics of the partitioned table are recalculated. The logging attribute of the table and partition is exchanged.

The following example converts partition FEB97 to table SALES_FEB97: ALTER TABLE sales


  

Mark local index partitions corresponding to a table partition as unusable. Rebuild all the unusable local index partitions corresponding to a table partition. Modify the physical attributes of a table partition.

The following example marks all the local index partitions corresponding to the APR96 partition of the procurements table UNUSABLE: ALTER TABLE procurements MODIFY PARTITION apr96 UNUSABLE LOCAL INDEXES; The following example rebuilds all the local index partitions which were marked UNUSABLE: ALTER TABLE procurements MODIFY PARTITION jan98 REBUILD UNUSABLE LOCAL INDEXES; The following example changes MAXEXTENTS for partition KANSAS_OFF: ALTER TABLE branch MODIFY PARTITION kansas_off STORAGE(MAXEXTENTS 100) LOGGING; MOVE PARTITION This ALTER TABLE option moves a table partition to another segment. MOVE PARTITION always drops the partition's old segment and creates a new segment, even if you do not specify a new tablespace.

If partition_name is not empty, MOVE PARTITION marks all corresponding local index partitions and all global nonpartitioned indexes and all the partitions of global partitioned indexes as unusable. ALTER TABLE MOVE PARTITION obtains its parallel attribute from the PARALLEL clause, if specified. If not specified, the default PARALLEL attributes of the table, if any, are used. If neither is specified, it performs the move without using parallelism. The PARALLEL clause on MOVE PARTITION does not change the default PARALLEL attributes of table.



The following example moves partition STATION3 to tablespace TS097:


ALTER TABLE trains MOVE PARTITION station3 TABLESPACE ts097 NOLOGGING; MERGE PARTITION While there is no explicit MERGE statement, you can merge a partition using either the DROP PARTITION or EXCHANGE PARTITION clauses. You can use either of the following strategies to merge table partitions. If you have data in partition PART1, and no global indexes or referential integrity constraints on the table, PARTS, you can merge table partition PART1 into the next highest partition, PART2. To merge partition PART1 into partition PART2: 1. Export the data from PART1. 2. Issue the following statement: ALTER TABLE PARTS DROP PARTITION PART1;

3. Import the data from step 1 into partition PART2. Note: The corresponding local index partitions are also merged. Here is another way to merge partition PART1 into partition PART2: 1. Exchange partition PART1 of table PARTS with "dummy" table PARTS_DUMMY. 2. Issue the following statement: ALTER TABLE PARTS DROP PARTITION PART1;

3. Insert as SELECT from the "dummy" tables to move the data from PART1 back into PART2. SPLIT PARTITION The SPLIT PARTITION option divides a partition into two partitions, each with a new segment, new physical attributes, and new initial extents. The segment associated with the old partition is discarded.

This statement also performs a matching split on the corresponding partition in each local index defined on table. The index partitions are split even if they are


marked unusable.

With the exception of the TABLESPACE attribute, the physical attributes of the LOCAL index partition being split are used for both new index partitions. If the parent LOCAL index lacks a default TABLESPACE attribute, new LOCAL index partitions will reside in the same tablespace as the corresponding newly created partitions of the underlying table. If you do not specify physical attributes (PCTFREE, PCTUSED, INITRANS, MAXTRANS, STORAGE) for the new partitions, the current values of the partition being split are used as the default values for both partitions. If partition_name is not empty, SPLIT PARTITION marks all affected index partitions as unusable. This includes all global index partitions as well as the local index partitions which result from the split. The PARALLEL clause on SPLIT PARTITION does not change the default PARALLEL attributes of table.




The following example splits the old partition STATION5 creating a new partition for STATION9: ALTER TABLE trains SPLIT PARTITION STATION5 AT ( '50-001' ) INTO ( PARTITION station5 TABLESPACE train009 (MINEXTENTS 2), PARTITION station9 TABLESPACE train010 ) PARALLEL ( DEGREE 9 ); TRUNCATE PARTITION Use TRUNCATE PARTITION to remove all rows from a partition in a table. Freed space is deallocated or reused depending on whether DROP STORAGE or REUSE STORAGE is specified in the clause.

This statement truncates the corresponding partition in each local index defined on table. The local index partitions are truncated even if they are marked as unusable. The unusable local index partitions are marked valid, resetting the UNUSABLE indicator. If there are global indexes defined on table, and the partition you want to truncate is not empty, truncating the partition marks all the global nonpartitioned indexes and all the partitions of global partitioned indexes as unusable.




If you want to truncate a partition that contains data, you must first disable any referential integrity constraints on the table. Alternatively, you can delete the rows and then truncate the partition.

The following example deletes all the data in the part_17 partition and deallocates the freed space: ALTER TABLE shipments TRUNCATE PARTITION part_17 DROP STORAGE; RENAME Use the RENAME option of ALTER TABLE to rename a table or to rename a partition. The following example renames a table: ALTER TABLE emp RENAME TO employee; In the following example, partition EMP3 is renamed:
ALTER TABLE employee RENAME PARTITION emp3 TO employee3;

Buffer Cache - Overview
Click on each link below for information on that topic.
    

Cache Hit Ratio Buffer Aging and the LRU Lists DBWR and the Dirty List Full Table Scans: CACHE and NOCACHE Buffer Cache Problems

All Oracle data is obtained by users from the buffer cache. The basic purpose of the cache is to minimize physical disk I/O by holding (buffering) copies of requested data blocks in memory. Data that is buffered in the cache can be served to users at memory access speed, much faster than going to disk for it. The large performance penalty of disk I/O makes tuning the buffer cache an extremely important task for the DBA. Oracle data is stored on disk in identically sized units called blocks. Block size is determined at database creation by the DB_BLOCK_SIZE initialization parameter. The buffer cache is also divided into buffers of this same size, and each buffer can hold exactly one database block. Thus, the block is the basic unit of data transfer between disk and memory in Oracle. 63

The figure below shows a schematic of block movement between disk and the buffer cache through user processes and the DBWR background process. User processes obtain blocks from the cache if they can, otherwise they read from disk into the cache. DBWR is responsible for writing dirty (modified) blocks out from the cache to disk.

Cache Hit Ratio
The buffer cache is a shared resource, accessible by all users. When a user process requests data, Oracle firsts looks for that data's block in the cache. If the data is buffered in the cache it is returned to the requestor immediately. This is called a cache hit. When the data is not found, a cache miss occurs and the user process reads the data from disk into an available buffer in the cache. The cache hit ratio is the overall percentage of data requests that are served directly from the cache. In Oracle, the buffer cache hit ratio is normally computed using the following formula:
Cache Hit Ratio = 100 * (1 - physical reads/logical reads)

In this formula, 'physical reads' corresponds to cache misses and 'logical reads' corresponds to total data requests. Tuning the buffer cache for optimum performance usually involves adding buffers to the cache until the hit ratio has been maximized. The number of buffers in the cache is specified by the DB_BLOCK_BUFFERS initialization parameter.

Buffer Aging and LRU Lists
An Oracle database typically has many more data blocks on disk than memory buffers in the buffer cache. Since not all blocks can be buffered in the cache at once, new block requests (cache misses) must find room in the cache to be read in from disk. When this happens, another block in the cache is usually discarded since the cache is normally full (and fixed in size). The buffer cache is carefully designed to favor keeping frequently requested blocks in memory and allow less popular blocks to be replaced by new block requests. These decisions about which blocks to replace are made using a least recently used (or LRU) algorithm. This algorithm uses a data structure called the LRU list. This list basically orders the buffers in the cache according to when they were last accessed by a user. When a block is accessed, it is moved to the MRU (most recently used) end of the list. Blocks in the cache that are not accessed for awhile will find more and more blocks ahead of them in the list, and they will be closer to the LRU end of the list. This is also known as buffer aging in the LRU list. Buffers are replaced in the cache from the least recently used end of the LRU list. This helps insure that frequently accessed buffers are not discarded, as they are regularly


moved to the MRU end of the list with each access. This mechanism of keeping the most requested blocks in the buffer cache is normally very effective at minimizing disk I/O. Managing a single LRU list can sometimes be a bottleneck in a heavily loaded database. The buffer cache can be divided into multiple working sets of buffers, each of which is effectively an individual LRU list. The number of working sets (LRU lists) used by Oracle is determined by the DB_BLOCK_LRU_LATCHES initialization parameter. This parameter is also important for configuring multiple buffer pools, as each pool must have at least one working set (and thus LRU latch).

DBWR and the Dirty List
In addition to the LRU list Oracle keeps a list of buffers that contain data that has been modified by users. This list is called the dirty list. Changed data committed by users must eventually be written to disk, as this is the permanent storage of the database. The DBWR background process is responsible for moving blocks from the dirty list to their permanent locations in disk files. Dirty blocks cannot be replaced in the cache until they have been written to disk, otherwise the changes would be lost. An overabundance of dirty buffers can negatively impact cache efficiency by reducing available slots for new blocks. This can happen when DBWR is unable to keep up with the volume of update activity. Multiple DBWR processes can be configured in this case to increase the capacity to write out dirty blocks.

Full Table Scans: CACHE and NOCACHE
The LRU aging algorithm of Oracle's buffer cache treats blocks accessed for full table scans differently than blocks accessed randomly based on the observation that full scans could flood the MRU end of the list with blocks that have low probability of re-use. To avoid the negative impact on cache performance that this could cause, Oracle places blocks from full scans at the LRU end of the list, allowing them to be quickly replaced rather than aging other blocks. In certain cases (like small lookup tables) full table scan blocks may actually be good candidates for re-use and would be better placed at the MRU end of the list. Oracle provides the ability to override the default cache treatment of full table scans (for large or small tables) at the table level using the CACHE or NOCACHE clauses of the CREATE TABLE and ALTER TABLE statements. Specifying CACHE indicates that the table's blocks should be placed at the MRU end of the LRU list when read during a full table scan.


Full scan LRU treatment of blocks can also be specified at the individual SQL statement level using the CACHE and NOCACHE hints.

Buffer Cache Problems
The LRU algorithm of the Oracle buffer cache is normally very good at providing efficient minimization of physical disk I/O. However, there are some situations where normal buffer cache aging may not be the best option for overall performance, for instance:
 

Blocks that should not go to the MRU end of the list Blocks that should be excluded from aging and stay in the cache

The first situation can occur when very large tables are accessed randomly by users with very little block use overlap between users. In this case, the MRU end of the list is flooded by blocks that will not result in subsequent cache hits yet age other blocks down to the LRU end of the list. These other blocks may be replaced when they could have resulted in cache hits had they been kept. The second situation occurs when there are data blocks that will definitely be requested regularly and we want to insure that physical disk I/O is not necessary to obtain their data. An example might be small lookup tables, or perhaps specific indexes.
The multiple buffer pool feature allows greater control over buffer cache usage to help address these problems and obtain even better cache performance.

Buffer Cache - Data Dictionary Views
Click on each link below for information on that topic.
  


There are several Oracle fixed tables and views that provide valuable information about the buffer cache and the new buffer pools.

The X$BH Fixed Table

Perhaps the most complete source of information about the Oracle buffer cache is the X$BH fixed table. This table contains one row for every buffer in the buffer cache. There are more than 50 columns in this table, and they are not officially documented. Much of the information is relevant mostly to Oracle Parallel Server environments, where buffer cache coordination between instances introduces additional complexity. The following table includes interesting and useful information on some of the columns in X$BH. Column ADDR BUF# OBJ TS# FILE# DBARFIL DBABLK SET_DS CLASS STATE FLAG Data Type RAW Description Address of buffer in SGA

NUMBER Sequential number of buffer in cache NUMBER Object id of segment owning block in the buffer NUMBER Tablespace number owning block in the buffer NUMBER File number owning block in the buffer NUMBER Relative file number of the block in the buffer NUMBER Block number in file of block in the buffer RAW Joins to X$KCBWDS.ADDR to identify working set of buffer NUMBER Block class (1 = data block, 4 = header block) NUMBER Used to determine status of block (e.g. FREE) NUMBER Encodes various information about the block, including dirty status and whether read sequentially or random Address of next buffer on LRU list Address of previous buffer on LRU list


As with the other internal fixed tables, X$BH can only be queried by the SYS user. Oracle provides a view called V$BH which externalizes X$BH, however V$BH is designed specifically for Oracle Parallel Server installations and will not display information when queried from a non-OPS instance. This is unfortunate, since X$BH has such interesting information. Another shortcoming of X$BH is that it does not tell directly which buffer pool a block currently belongs to, which can be very useful information when testing this new feature.

A view called Z$BUFFER_CACHE has been created to overcome some of the shortcomings of X$BH. This view externalizes some of the useful X$BH information noted above as well as joining with other X$ tables to determine the buffer pool of each block. One interesting fact is that these other X$ tables have changed somewhat between Oracle8.0 and Oracle8i, so two different versions of the view definition are required.


The following code segments define the view Z$BUFFER_CACHE. They must be run as SYS and then SELECT privilege on the view granted out to other users requiring buffer cache information.
rem ********************************************* rem * Externalizes useful buffer cache information rem * from the X$BH and X$KCBWBPD fixed tables, rem * Oracle8 version. rem ********************************************* CREATE OR REPLACE VIEW z$buffer_cache (buf_addr ,buf_no ,dba_file ,dba_blk ,tbs_id ,obj_id ,blk_class ,status ,pool ,dirty ,io_type ,nxt_repl ,prv_repl ) AS SELECT bh.addr ,bh.buf# ,bh.dbarfil ,bh.dbablk ,bh.ts# ,bh.obj ,bh.class ,DECODE(bh.state,0,'FREE',1,'XCUR',2,'SCUR', 3,'CR',4,'READ',5,'MREC',6,'IREC') ,bp.bp_name ,DECODE(BITAND(bh.flag,1),0,'N','Y') ,DECODE(BITAND(bh.flag,524288),0,'RANDOM','SEQUENTIAL') ,nxt_repl ,prv_repl FROM x$kcbwbpd bp ,x$bh bh WHERE bp.bp_size > 0 AND bh.buf# >= bp.bp_lo_bnum AND bh.buf# <= bp.bp_hi_bnum AND bh.inst_id = USERENV('Instance') AND bp.inst_id = USERENV('Instance'); rem ********************************************* rem * Externalizes useful buffer cache information rem * from the X$BH, X$KCBWBPD and X$KCBWDS fixed rem * tables, Oracle8i version. rem ********************************************* CREATE OR REPLACE VIEW z$buffer_cache


(buf_addr ,buf_no ,dba_file ,dba_blk ,tbs_id ,obj_id ,blk_class ,status ,pool ,dirty ,io_type ,nxt_repl ,prv_repl ) AS SELECT bh.addr ,bh.buf# ,bh.dbarfil ,bh.dbablk ,bh.ts# ,bh.obj ,bh.class ,DECODE(bh.state,0,'FREE',1,'XCUR',2,'SCUR', 3,'CR',4,'READ',5,'MREC',6,'IREC') ,bp.bp_name ,DECODE(BITAND(bh.flag,1),0,'N','Y') ,DECODE(BITAND(bh.flag,524288),0,'RANDOM','SEQUENTIAL') ,ds.nxt_repl ,ds.prv_repl FROM x$kcbwbpd ,x$bh ,x$kcbwds WHERE AND AND AND AND AND AND AND bp.bp_size > 0 ds.set_id >= bp.bp_lo_sid ds.set_id <= bp.bp_hi_sid bh.buf# >= ds.start_buf# bh.buf# <= ds.end_buf# bh.inst_id = USERENV('Instance') bp.inst_id = USERENV('Instance') ds.inst_id = USERENV('Instance'); bp bh ds

The Z$BUFFER_CACHE view can answer all kinds of useful queries about the buffer cache and pools. For instance, the following SQL shows buffer counts by pool, object owner, type and I/O type:
SELECT pool ,owner ,object_type ,io_type ,count(*) FROM z$buffer_cache BC ,dba_objects O WHERE BC.obj_id = O.object_id


GROUP BY pool,owner,object_type,io_type;


The V$BUFFER_POOL view supplies basic information about how many LRU working sets and buffers are assigned to each pool, and the range of buffers assigned to each pool. The underlying fixed table is X$KCBWBPD, which is joined to X$BH in the Z$BUFFER_CACHE view to associate each buffer with the pool it belongs to. Normally V$BUFFER_POOL will not be very useful except to verify the buffer pool configuration. The V$BUFFER_POOL_STATISTICS view contains lots of information about buffer cache performance. It is created by the CATPERF.SQL script located in the <ORACLE_HOME>/rdbms/admin directory. CATPERF.SQL is not executed automatically when the database is created so it must be explicitly executed (as SYS) to create the view. The following table lists the columns in V$BUFFER_POOL_STATISTICS. Column Data Type Description ID NAME SET_MSIZE CNUM_REPL CNUM_WRITE CNUM_SET BUF_GOT SUM_WRITE SUM_SCAN FREE_BUFFER_WAIT WRITE_COMPLETE_WAIT BUFFER_BUSY_WAIT FREE_BUFFER_INSPECTED NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER Buffer pool number Maximum set size Total buffers on LRU lists Total buffers on dirty lists Total working set buffers Total buffers gotten Total buffers written Total buffers scanned Total free buffer wait events (V$SYSTEM_EVENT) Total write complete wait events (V$SYSTEM_EVENT) Total buffer busy wait events (V$SYSTEM_EVENT) Total free buffer inspected (V$SYSSTAT) Total dirty buffers inspected (V$SYSSTAT) Total block changes (V$SYSSTAT) Total block gets (V$SYSSTAT) Total consistent gets (V$SYSSTAT) Total physical reads 70 VARCHAR2(20) Buffer pool name


(V$SYSSTAT) Total physical writes (V$SYSSTAT) The statistics in V$BUFFER_POOL_STATISTICS represent totals across all working sets for each pool. Many of the columns give performance statistics at the buffer pool level that were previously only available at the instance level in either V$SYSSTAT or V$SYSTEM_EVENT. The V$BUFFER_POOL_STATISTICS view is key to analyzing the performance of multiple Oracle buffer pools. For instance, we can calculate the individual buffer pool hit ratios using the following SQL: PHYSICAL_WRITES NUMBER
SELECT name buffer_pool ,100 * (1-(physical_reads/(db_block_gets+consistent_gets))) hit_ratio FROM v$buffer_pool_statistics;

Buffer Pools - Overview
The buffer cache can be divided into three separate pools of buffers to help segregate blocks with different LRU aging requirements. These pools are known as the RECYCLE, KEEP and DEFAULT buffer pools.

In the first case, random access blocks flooding the MRU end of the list will not have a high re-use rate, yet they can age more desirable blocks out of the cache. Physical disk reads will be incurred when these other blocks are subsequently requested again. It would be better if these random access blocks could "age faster" in the cache and be quickly replaced themselves rather than pushing other blocks out. The Oracle8 RECYCLE buffer pool is specifically designed to provide working sets of buffers that can be rapidly aged out of the cache. By isolating blocks from large, randomly accessed tables away from the rest of the buffer cache the RECYCLE pool relieves the pressure these blocks can place on the LRU list. This allows buffers from other objects to age less quickly and increases the chances of subsequent cache hits. The RECYCLE pool can also be used to place a strict upper bound on the number of buffers any particular object will consume in the buffer cache. Since blocks from objects assigned to the RECYCLE pool will not likely be needed again soon, the pool itself can often be far smaller than the number of buffers these objects would occupy in the DEFAULT pool. This results in more memory available for the other pools, increasing their efficiency.


The KEEP Pool
Some blocks would be better off if they "aged slower" (or not at all) out of the cache. Imagine a transaction that randomly accesses some table and joins in data from five small lookup tables based on foreign keys. If the blocks from these lookup tables and their indexes are not in the cache, this transaction may incur an additional 10 or more physical disk reads to get the data. When response time is critical, this extra overhead can cause a wide disparity between the best and worst cases. The Oracle8 KEEP buffer pool allows objects to be effectively "pinned" into the buffer cache and excluded from the LRU aging process. The KEEP pool is used to insure the presence of buffers in the cache regardless of when they were last accessed. This feature should be used carefully, though, as pinning infrequently used objects into the pool can be a waste of memory that could be better used elsewhere.

The DEFAULT pool is for all objects not explicitly assigned to one of the other pools. There is always a DEFAULT pool and it will most likely be the largest of the pools under normal circumstances.

Configuring and Using Multiple Buffer Pools
The DEFAULT, KEEP and RECYCLE buffer pools are configured using the initialization parameters shown in the following table. Parameter DB_BLOCK_BUFFERS BUFFER_POOL_KEEP BUFFER_POOL_RECYCLE Description Total number of block buffers for all pools Number of buffers and latches for the KEEP pool Number of buffers and latches for the RECYCLE pool

DB_BLOCK_LRU_LATCHES Total number of LRU latches for all pools

The KEEP and RECYCLE pools are explicitly configured with buffers and LRU latches, the DEFAULT pool is allocated the remainder from the overall totals specified by DB_BLOCK_BUFFERS and DB_BLOCK_LRU_LATCHES. For example, suppose we have 1000 total cache buffers to allocate. We can assign 700 to DEFAULT, 200 to KEEP and 100 to the RECYCLE pools with one LRU latch per pool as follows: 72

db_block_buffers = 1000 db_block_lru_latches = 3 buffer_pool_keep = (buffers:200,lru_latches:1) buffer_pool_recycle = (buffers:100,lru_latches:1)

Note the syntax for the two new parameters BUFFER_POOL_KEEP and BUFFER_POOL_RECYCLE. The minimum size of an LRU working set is 50 buffers, so the number of buffers configured for any of the pools must be at least 50 times the number of LRU latches. Conversely, the number of LRU latches cannot exceed DB_BLOCK_BUFFERS divided by 50. Each LRU working set in a pool is equally sized at the number of buffers divided by the number of latches. The maximum number of LRU latches that can be configured for all the pools is six times the number of CPUs.

Load Balancing Multiple DBWRs
When multiple buffer pools are used in combination with multiple DBWR processes, there is a potential for uneven load balancing to occur between the DBWR processes. The DBWR processes are assigned to LRU working sets in round-robin fashion, and the working sets from different pools may be of different sizes and impose different database writing workloads. For instance, a DBWR assigned to working sets from the KEEP and DEFAULT pools may have a much different workload than one assigned sets from the RECYCLE pool. Oracle recommends that such load imbalances can be avoided by allocating to each pool a number of LRU latches equal to a multiple of the number of DBWR processes. Under this strategy each DBWR will draw the same number of working sets from each pool and thus the load will be balanced.

Assigning Objects to Buffer Pools
Segments can be assigned to buffer pools using the new BUFFER_POOL option of the STORAGE clause. For example, a table can be created and assigned to the KEEP pool as follows:

Other valid values are BUFFER_POOL RECYCLE and BUFFER_POOL DEFAULT. The following statements accept this new STORAGE clause option:


     


When a pool is not explicitly specified, segments are assigned to the DEFAULT pool. Clustered tables inherit the pool of their cluster. Note also that pools can be specified at the partition level for partitioned segments.
If a segment is reassigned from one pool to another using an ALTER command, blocks in the cache from the segment stay in their current pool until aged out. New blocks from the segment will be loaded into the new pool. "N i T i N" <>, "nitin tiwari"

Reorganize tablespaces using Oracle 10g Data Pump

Export tablespaces as a unit
In the past, the export (exp) and import (imp) utilities had three modes: You could export a single table and its dependent objects such as indexes; you could export all objects owned by a specific user; or you could export the entire database. But tablespaces were a problem. Objects owned by many different users could be stored in a given tablespace, but some of their objects might be stored in other tablespaces. So, the only solution was to query the data dictionary to find the exact list of tables and their owners and use table-mode export to export the objects individually. In Oracle 10g, the Data Pump version of export (expdp) lets you directly export all the objects in a tablespace. The TABLESPACES parameter lets you specify which tablespace(s) you want to export.
TABLESPACES=name [,...]

This is particularly useful if you've inherited a database with a lot of dictionary-based tablespaces, and you want to reduce fragmentation by recreating the tablespaces as locally managed, and then re-import the contents.

Rename datafile names during import
When migrating a database from one platform to another prior to 10g, the DBA was required to precreate the tablespaces and their datafiles before importing. Why? Because the dump file created by


export contained datafile pathnames in the format of the original database's operating system. These pathnames would cause errors if used with a different operating system on import. In the 10g Data Pump version of import (impdp), the REMAP_DATAFILE parameter can be used to rename these datafiles on the fly. The format is:

This option is used with FULL imports only, and the userID you specify must have the IMP_FULL_DATABASE role.

Change tablespace names during import
The impdp utility also lets you load objects into different tablespaces than they came from originally. Before 10g, the way to do this was complex. First, you had to remove your quota on the original tablespace so that you had no privileges to write there. Then, you set your default tablespace to the desired one. During the import, objects that were in the original tablespace would be stored in the default tablespace for the user. Then you had to remember to set everything back again when you were done. In 10g import, the REMAP_TABLESPACE parameter makes this a lot easier. You still need to have quota on the destination tablespace, but no other preparation is required. Simply add the parameter:

Objects will be automatically sent to the new tablespace.


To top