Reviewed by Oracle Certified Master Korea Community
( http://www.ocmkorea.com http://cafe.daum.net/oraclemanager )
ORACLE 9I RAC WITH STORAGE FOUNDATION ARCHITECTURES
The promise of Oracle 9i Real Application Clusters is “Applications won’t break and they'll run faster at a lower cost.”
The demand of the dynamic business environment places incredible demands on the applications supporting the
business. The data to be managed and stored are growing exponentially at an unprecedented rate. Business initiatives
depend on the ability to access data and applications 7x24. This paper will explore techniques for optimizing the
configuration of 9i RAC using storage subsystems to streamline the process and assure the highest application and
data availability possible. The different approaches to storage virtualization and the similarities between storage
virtualization architectures and 9i RAC clustering architecture will be discussed in an effort to help identify the best
methods for implementing 9i RAC in network storage environments.
This paper will talk about:
• Some Industry Directions and Trends
• How a Storage Foundation Architecture helps
• Benefits of Oracle 9i RAC
• Storage configuration for 9i RAC
SOME INDUSTRY TRENDS
There are several trends that are increasing the pressure on the IT industry. These are:
• An explosive growth of in storage requirements.
• More applications are becoming mission-critical with 7x24 requirements.
• IT staffing challenges; need to become more IT generalists.
• The cost of IT personnel is increasing.
The growth in storage is unprecedented. There are 1.5 to 2 exabytes of data generated each year. An exabyte is a
To give you a perspective on this, this table shows the representation of the sizes we are talking about.
• KB - kilobyte 10^3 or 2^10
• MB - megabyte 10^6 or 2^20
• GB - gigabyte 10^9 or 2^30
• TB - terabyte 10^12 or 2^40
• PB - petabyte 10^15 or 2^50
• EB - exabyte 10^18 or 2^60
• All of the elementary particles in the universe is estimated at 10^80
This growth means that in four years, companies will have to manage ten times the data that they have today. This
explosive growth is in all areas of IT. As an example, email is the first worldwide mission-critical application,
generating nearly one trillion messages a year.
The following figure show the new data creation, please notice that the unique vs. replicated content fits the 80/20
rule. The amount that is under the control of a database is significantly smaller.
New Data Creation
The total amount of data that is under control of a database is not available, the data warehouse information is tracked
and the growth in the data warehouse is stunning. The growth rate on existing data warehouse projects is a 72%
combined annual growth rate. In addition to this growth, the percent of companies with more than 100 employees
that are adopting data warehousing has gone from 45% in 2000 to an estimated 62% this year. Combining this
information places the storage growth at an estimated 92% CAGR. This gives us an estimated 250 petabytes of
storage for data warehousing alone. The following figure represents the explosive growth in data warehousing.
Data Warehouses Experiencing Dramatic Growth
This information reflects the changes that are occurring today in the data warehouse area. The growth in other
database areas is also great. For example when I first began working with Oracle ten years ago, a database of 15 to 20-
gigabytes was considered “large”. Five years later, I worked a medium sized 30 to 40-gigabyte Oracle database. Now,
I hear of “small” 50-gigabyte oracle databases and 200 to 300-gigabyte is more normal. The large today is in the
terabyte range. This is a growth rate of 10 to 15 times in ten years. The rate is accelerating.
MORE APPLICATIONS ARE MISSION-CRITICAL WITH 7X24 REQUIREMENTS.
One the things that IT needs to deal with is the increasing demand for improved uptime. This is needed to support
the applications that enable your business to compete and thrive in the dynamic business environment. Today, email
has become mission-critical in many businesses. The 7 by 24 uptime requirement for most, if not all of your
applications, has made planning for any infrastructure changes or software upgrades very difficult.
You may very well have a five 9s target for uptime. This is very challenging, five 9s represents 5 minutes and 15
seconds of down time over the course of one year. Four 9s is about one minute a week and three 9s is about 10
minutes a week.
IT STAFFING CHALLENGES; NEED TO BECOME MORE IT GENERALISTS.
The skill sets of the IT personnel are changing, they to become more of a generalist on the technology.
IT PERSONNEL COST IS INCREASING.
As the hardware costs have decreased, the personnel costs have increased (recruiting, training, retention…). This has
had the effect of driving up total costs.
These trends will continue to apply pressure to IT organizations.
STORAGE FOUNDATION ARCHITECTURE
Just as a well designed database provides a solid foundation for supporting applications, the storage supporting the
systems should provide a solid foundation for the IT infrastructure.
The definition of a Storage Foundation Architecture is an IO subsystem that provides networked storage similar to a
utility. The storage is presented to a server as an open standard SCSI device. The hardware provides the spindle
management and the client systems are provided with the amount storage that they need with the performance
characteristics that these systems require. The storage foundation device will provide:
• Management of storage, not spindles; all disks are presented as if they are one massive drive
• As storage is needed for a server, it is carved out specifically for that server—creating a virtual disk which is
striped across all available drives
• Many disks brought to bear, more IOs per second, mode bandwidth
• Flexibility in Storage creation and use
• It can mix drive sizes and spindle speeds; efficiently use all the drive space
This will enable:
• Mixing and matching RAID levels
• Mixing and matching drive sizes
• Mixing and matching drive speeds
• Adding physical storage (drives) without incurring downtime
• Restriping virtual data volumes (LUNs) to use the new hardware without incurring downtime
The storage foundation will virtualize (simplify) the management of physical devices and
• Is an architecture that combines physical disks into a storage pool
• Allows virtual disks (LUNs) to be presented to servers with various characteristics
• RAID Level
• Stripe size
• Allows physical storage (drive) changes without downtime
• Allows restriping virtual data volumes to use the new hardware without downtime
The Storage Foundation Architecture is a step toward automated storage management. This is a step in the direction
for the “holy grail” of application driven, attribute based storage management.
Automated Storage Application Driven /
Level of Automation
Virtualization IT Infrastructure
Time to Implement
The Holy Grail of enterprise storage is complete automation of storage allocation and administration.
However, there are iterative steps along the way.
The corporate agenda drives business requirements that almost always have some impact on your IT infrastructure.
The horizontal axis represents the amount of time it takes to implement that change.
As it exists today, most IT infrastructures have little or no virtualization to help automate storage administrative tasks
(you could argue that RAID is a basic level of virtualization).
The storage foundation architecture, and its underlying virtualization technology, simplify many storage administrative
tasks and reduce the time to implement change to your IT environment.
The next step in automation is application driven storage management. By this we are talking about the application
taking control of the storage subsystem to automate tasks, such as breaking mirrors. This, in and of itself, creates
many efficiencies. But when it is coupled with virtualization, it becomes even more powerful.
Where things get real interesting is when we start talking about attribute-based storage management. Now we are able
to allow the administrator (or application) to simply choose the attributes of the storage without having to manage the
underlying physical characteristics of the disks. This requires disk level virtualization.
This Holy Grail of storage will provide the following items.
APPLICATION-DRIVEN, ATTRIBUTE-BASED STORAGE
• Data striped across all spindles
• IO subsystems pooled together
• Automation of key tasks
• All storage (disk and tape) pooled together
• Capacity Management
• Min/Max by LUN or user
• Time of day
• Length of use
• Policy-based growth (JIRO?)
• Bill-back policies
• Performance Level
• Response time
• Availability Level
• Remote replication
• Remote mirroring
• Remote Management
• User-rights allow more flexibility and consistency
• Web-based – manage any storage anywhere
All of these can be driven by the application to automatically provision storage as needed. We are not there yet, but
the industry is moving in this direction. Oracle is working in this direction. Using 9i RAC, a cluster files system and
Oracle managed files (OMF) can greatly reduce the storage management requirements of the DBA. Oracle can create
the files that it needs, manage the file names and dynamically grow those files as needed. In the future Oracle will be
able to request that the storage sub system provide a new LUN and then send you an email requesting that more disks
be purchased. Given the incredible storage growth, that may not be a great idea.
BENEFITS OF 9I RAC
Oracle has been touting that 9i RAC is unbreakable. Beyond the marketing hype, there are benefits to the architecture.
• Availability – The loss of a single node in a multiple node cluster may not be noticed...
• Scalability – The ability to add CPU poser to the cluster without shutting down the cluster and not requiring data
partitioning provides for a very smooth growth path.
• Horizontal scaling – very cost effective. The hardware cost savings may be 10 to 1.
• Flexibility – With enough nodes in a cluster, one (or more) of the nodes could be dedicated to reporting or end of
period activities and minimize the impact to the other users of the database.
9i RAC is a more complex install, but most of the complexity is in configuring the inter node communication and
One thing that 9i RAC provides is the ability to provision CPU power in a manner that is more granular that the fork
lift upgrades we have done in the past. 9i RAC allow the addition of nodes to a cluster and ability to replace existing
nodes in a cluster with more powerful machines. This requires that the OS is consistent between the nodes and the
version of 9i is also consistent.
CONFIGURING 9I RAC ON A STORAGE FOUNDATION ARCHITECTURE
Before configuring Oracle on a storage foundation architecture, it is important to understand the I/O profiles of the
files that make up the database. These files have different IO requirements. In order to simplify the storage
configuration, Oracle published a paper in 2000 called “Optimal Storage Configuration Made Easy”; this paper may be
found at http://otn.oracle.com/deploy/performance/pdf/opt_storage_conf.pdf. This is refered to as the SAME
(Stripe And Mirror Everything) methodology.
The key points of this paper is that there are
1. Stripe all files across all disks
2. Mirror data for high availability
3. Place frequently accessed data on the outside half of the disk drives
4. Subset data by partition, not disk
The key point of the SAME methodology is that (from the paper) “In all the experiments, the simple ‘stripe all files over every
disk’ approach performed as well or better than the best subset that experienced experts were able to come up with.”
ORACLE FILE IO PROFILE
Sequential synchronous writes. This is the key to Oracle update performance.
Redo Logs On a commit, nothing happens until the redo log file is written. One set is
required for each instance.
Read on startup, written on log switches, checkpoints and on file additions.
Shared (server) Parameter file. A binary form of init.ora. Read on startup,
spfile written to by certain “alter database” DDL statements. Introduced to
support 9i RAC and automatic tuning of the database. Synchronous update.
Voting Real Application Clusters uses a voting disk to improve cluster availability.
– or – Oracle stores cluster status information on the partition reserved for the
Quorum Disk voting disk.
The data dictionary. For DML the tables are read. For DDL and dictionary
managed tablespaces reads and writes.
Undo Tablespace Oracle 9i introduces the ability to undo actions; this is managed with the
undo tables and flashback queries. The UNDO tablespace is used instead of
-- or --
the rollback segments. Mostly writes. Reads for database recovery, rollback,
Rollback Segments transaction read consistency and for flashback queries.
Used for sort operations that “spill” to disk and, in 8i and up, for global
Dependent on the application. Generally mostly reads with 20% random
User Tables & Indexes
Asynchronously written by the archive process. This process only needs to
Archive Logs complete before the online redo log being copied needs to be used again. A
log switch will not occur until the archiving is completed.
When we look at a storage foundation architecture, we have various parameters that we can use to tune the IO
subsystem. As defined earlier, these are RAID level, stripe size, spindle count, spindle speed.
• Online redo logs – Following the recommendations in the SAME mythology, the redo log should have a smaller
stripe size (dependent on the application behavior) and the logs should be striped across as many spindles as
• Control files – These files are written serially to all copies of the file and they should follow the same guidelines as
the redo logs.
• spfile – This file has low IO activity and it is read on instance startup and written when there is an “ALTER
SYSTEM” and the scope includes the spfile command.
• The voting or quorum disk is required for 9i RAC on Windows or Linux. The cluster management software uses
this for node and cluster management.
• System Tablespace – This file should have high band width for reading (RAID10). The writing is asynchronous.
Again following the recommendations from the SAME methodology, RAID 10 and as many spindles as possible.
• Undo Tablespace – There will be one of these for each instance in the 9i RAC cluster. This file will be mostly
written for the ability to rollback (or undo) a transaction. If there is heavy use of the flashback query, then there
will be heavy read activity as well. This should have high bandwidth and follow the SAME methodology.
• Temporary Segments – This used for sorts that spill to disk and for temporary tables. Although the goal of a well
tuned database is to minimize disk activity, the temporary segment should also have high bandwidth to minimize
performance impact when a transaction does need to spill to disk.
• User Tables & Indexes – This will vary according to each application, but the reads need to be as quick as possible
and this can be achieved by a moderately large stripe size and again, as many spindles as possible. The write
activity is asynchronous.
• Archive Logs - The archive logs are copies of the online redo logs and the performance of this needs be “fast”
enough to complete the archive activity before the online redo log is needed. In this case, the archive logs are
“bulk” storage and they may be placed on a slower disk set.
The minimum configuration for 9i RAC on the storage would be to establish a logical volume, or LUN, with the
characteristics desired for each group of datafiles. For example a LUN could be created with the characteristics for
performance for the redo logs. This would have the following characteristics:
• RAID10 – best performance
• Stripe across all available disks – bandwidth
• 32k stripe size.
• One gigabyte in size
The online redo logs, control files and quorum disk could be placed here.
Following the SAME methodology, the rest of the database files (excluding archive logs) would be placed into a large
LUN that has these characteristics.
• RAID10 – best performance
• Stripe across all available disks – bandwidth
• 256k stripe size.
• Fifty gigabytes in size – as an example.
The archive logs would be placed in the last LUN with these characteristics:
• RAID5 p3 – for better storage utilization and good performance
• Stripe across all available disks – bandwidth
• 32k stripe size. (for RAID5)
• The sizing should be set large enough to handle the number of archive logs that are required to be available. Four
gigabytes in size – as an example.
This type of a configuration should provide adequate performance following the SAME methodology. If more
performance is required, then select tablespaces may be located on LUNs that have better performance characteristics.
This may be a faster spindle speed, more spindles or a different stripe size.
The vanilla install of 9i RAC has a series of required files. A cluster file system simplifies the set up of the database
using files. Otherwise these must be raw partitions on disk. The files are:
• Each node will require at least three files or raw partitions
• One for the UNDO tablespace
• Two (minimum) for the REDO logs.
• The initial database created by the DBCA (Database Configuration Assistant) will create 12 files or raw partitions
• spfile (Server parameter file)
• Voting disk (Server Configuration raw device)
• Two copies of the control file
• System tablespace
• User tablespace
• Temp tablespace
• Example tablespace
• CMLITE tablespace
• Index Tablespace
• Tools Tablespace
• DRSYS Tablespace
• Plus what is needed for your application.
Please note that Oracle has a cluster files system for Windows and Linux. There some third party cluster file systems,
PolyServe for Linux; Tru64 OS cluster file system. Other OS’s such as IBM AIX require that the 9i RAC be installed
on raw partitions.