Aspiring to be a RAC DBA Part 2
Ashok Singh Winona,MN-55987 asingh@fastenal.com
Previous Session
What is Real Application Cluster? Cluster Manager(9i) ,Cluster Ready Services(10g) Cluster Files System Cache Fusion and Interconnect Routine Administration of RAC
Real Application Clusters
Objective
Cache Fusion Instance Recovery Backup and Recovery Wait Events specific to RAC Performance Tuning Areas of Concern Routine Administration Conclusion
Cache Fusion Architecture
Revolutionary concept ,removes inefficiencies , enhances performance and scalability Use GC_FILES_TO_LOCKS for old cache fusion behavior Cache Fusion provides a single buffer Cache through the Interconnect . Data Dictionary and Library Cache are synced Cache Fusion model uses –
Global Cache Service (GCS) Global Enqueue Service (GES) Global Resource Directory (GRD)
Cache Fusion :Resource Modes and Roles
lock mode : Null (N), Shared (S) , Exclusive (E) Lock Role : Local (L) Global (G) Past Image (PI) :(1) PI exists , (0) No PI exists The different combinations could be one of the following : NL0 Null Local and No past Images SL0 Shared Local with no past image XL0 Exclusive Local with no past image NG0 Null Global - Instance owns current block image SG0 Global Shared Lock - Instance owns current image XG0 Global Exclusive Lock - Instance own current image NG1 Global Null – Another Instance Owns the Past image Block. SG1 Shared Global – Another Instance owns past image XG1 Global Exclusive Lock – Another Instance owns Past Image.
Cache Fusion Model
Global Resource Directory : -in SGA ,maintained by GES and GCS -used at instance recovery and cluster reconfigure. -latest locations of a block(node) -modes and roles of the block Enqueues : serializes access to any resource Past Image (PI) :Whenever a dirty block is sent to any remote cache using CF, a copy is kept in the local cache (in case of failures) Dynamic Remastering of resources ,enhances scalability _lm_dynamic_remastering (false)
Buffers States
Buffers can have many States (gv$bh): - e.g. free, xcur, scur, cr, read, mrec, irec, wri, pi Only one current copy per dba but many in CR state - limited by _db_block_max_cr_dba (6) and 10 cs Light_work_rule : This rule is applied when the cost of creating a CR block is high . Fairness_down_converts : X Lock is converted to null depending upon the _fairness_threshold (deflt 4) Chk values of light_works & fairness_down_converts from gv$bsp OR gv$cr_block_server
•
•
•
•
Cache Fusion :Read From Disk
Cache Fusion :Read to Read
Cache Fusion :Read to Write
Cache Fusion :Write to Write
Checkpoint /Writing to Disk
At log switch/checkpoint the following action takes place :
GCS will notify all nodes with PI of the block DBWR of the most current PI will perform the write to disk BWR (block written record ) may not be the checkpoint node Resources modified i.e. Global to local GRD is updated
Instance Recovery in RAC
Failure is detected by Cluster manager or CGS SMON will complete the Instance Recovery GRD is frozen All resources on the lost are re-distributed GES resources are reconfigured first GCS recovery is done in two pass - recovery set is prepared (list of blocks) and locks requested on these blocks -blocks recovered and made available
Alert log :Instance Recovery
ALERT LOG
List of nodes: 1 Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Update rdomain variables Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Mon Sep 5 10:13:00 2005 LMS 1: 0 GCS shadows cancelled, 0 closed Mon Sep 5 10:13:00 2005 LMS 0: 0 GCS shadows cancelled, 0 closed Set master node info
v$resource_limit
ges_procs ges_ress ges_locks ges_cache_ress ges_reg_msgs ges_big_msgs ges_rsv_msgs gcs_resources gcs_shadows 30 7810 8298 557 69 20 0 4041 2745 37 251 7821 6419 8305 9518 750 0 247 930 62 930 0 500 4041 9992 2745 9992 251 UNLIMITED UNLIMITED UNLIMITED UNLIMITED UNLIMITED 500 9992 9992
Backup and Recovery
RAC and RMAN integrate well . Conceptually ,same as single instance Archived logs should be on CFS and readable by all nodes . RMAN will autolocate –data,control & archieved log files Channel Failover introduced in 10gR1
RMAN Configuration
Configuration is persistent across all nodes Location (ie absolute path) must be available to all nodes
Configure controlfile autobackup Format ‘ ‘
RMAN will perform backup for controlfile on the first allocated channel when
Configure controlfile autobackup ON
Channels can be allocated on all nodes of RAC After this configuration
Backup database
Archived Logs –
One common location *.log_archive_dest=‘+usr02/TEST/ARCH/’ Shared storage (CFS,ASM) Read by all nodes during backup and recovery Log_archive_format should have %t%s%r
Archived Log Location
1. Common Shared location for Archived Logs 2. This Shared Location can be on CFS/ASM 3. *. log_archive_dest_1=/arc_dest 4. *.log_archive_format=arch%s_%t_%r.log
RMAN
RMAN connects to only one of the nodes from the command line .
rman target sys/oracle@node2 catalog rman/cat@catdb
RMAN allocates channels on multiple nodes using allocate channel or configure channel RMAN detects Node Affinity and uses this info while deciding which datafile should be backed by which node. Any allocated channel can delete archived logs from any thread (if using shared location ) Recovery Catalog should not be used as per the HA recommendation ie
use controlfile Catalog the backup after backups –resyn catalog
Rman:Backups
racnode2:oracle> rman target / connected to target database: WMST (DBID=957552121) RMAN> @bck.sh select inst_id,module from gv$session where module like 'back%' INST_ID MODULE 2 backup incr datafile: ORA_DISK_2 1 backup incr datafile: ORA_DISK_1 2 rows selected The script :bck.sh
configure channel 1 device type disk format ‘…..’ connect 'sys/XXXX@wmst1' ; configure channel 2 device type disk format ‘…..’ connect 'sys/XXXX@wmst2' ; backup incremental level 0 database ; Release channel 1,2 ;
Channel Failover in RAC
Any Backup is done using multiple Channels These channels could be from multiple nodes Nodes may fail during the duration of this Backup RMAN is now capable of completing this work on another allocated channel . The error is also recorded in the v$rman_output Currently not working as documented –Bug 4422034
Cloning a RAC Database
Duplicate to a Single Instance then Convert into a Cluster DB Create and Copy the Pfile to the desired host. Make the required changes -*.cluster_database=FALSE db_file_name_convert=('+TBLA','+USR01') log_file_name_convert=('+TBLA','+USR01') control_files='+USR01/picky/controlfile/control01.ctl‘,…. mkdir –p …udump ,.. bdump,….,cdump Add the auxiliary instance to tnsnames.ora Depending upon the config add in the listener.ora file Create a password file Startup nomount using this new pfile –auxiliary instance is ready
Cloning a Database
From one of the RAC nodes : rman target / auxiliary sys/xxxx@yyy Duplicate target database to ….. Starting 10gR2, from alert log “Re-creating tempfile +USR01 as +USR01/picky/tempfile/temp.270.566600837” alter database add logfile thread 2
group 3 ('+USR01/PICKY/ONLINELOG/redo3.log') size 25m, group 4 ('+USR01/PICKY/ONLINELOG/redo4.log') size 25m;
Alter database enable public thread 2 Start the other instance from the other node Register this database in the OCR racnode4:oracle> srvctl add database -d PICKY -o $ORACLE_HOME racnode4:oracle> srvctl config
PICKY
racnode4:oracle> srvctl add instance -d PICKY -i PICKY1 -n racnode3 racnode4:oracle> srvctl add instance -d PICKY -i PICKY2 -n racnode4
Recovering a Datafile
rm /usr05/oradata/WMST/WMST/test01.dbf srvctl start database -d WMST -o mount SQL> alter tablespace test offline; SQL> alter database datafile '/usr05/oradata/WMST/WMST/test01.dbf' offline; SQL> alter database open ; restore datafile '/usr05/oradata/WMST/WMST/test01.dbf' preview ;
List of Backup Sets BS Key Type LV Size Device Type Elapsed Time Completion Time ------- ---- -- ---------- ----------- ------------ --------------1178 Incr 0 5M SBT_TAPE 00:01:48 04-SEP-05 BP Key: 1532 Status: AVAILABLE Compressed: NO Tag: TAG20050904T160811 Handle: LEVEL0WMST481120050904 Media: dd_01.RO List of Datafiles in backup set 1178 File LV Type Ckp SCN Ckp Time Name ---- -- ---- ---------- --------- ---8 0 Incr 41914078 04-SEP-05 /usr05/oradata/WMST/WMST/test01.dbf
restore datafile '/usr05/oradata/WMST/WMST/test01.dbf' ; recover datafile '/usr05/oradata/WMST/WMST/test01.dbf' ; sql 'alter tablespace test online' ;
Performance Tuning
All techniques used in single instance databases are also applicable here Global Cache is made up multiple local Cache. Cache Fusion introduces additional waits Waits -gc_cr_request,gc_cache_busy, enqueues select * from v$sqlarea where cluster_wait_time > 0 Level 7–identify segments for interconnect traffic Statspack report has cluster specific statistics GCS workload Characteristics GES summary statistics GCS and GES messaging statistics GES detail statistics
New Wait Events
global enqueue gets sync global enqueue gets async global enqueue get time global enqueue releases gcs messages sent ges messages sent global enqueue CPU used by this session gc cr blocks served gc cr block build time gc cr block flush time gc cr block send time gc current blocks served gc current block pin time gc current block flush time gc current block send time gc cr blocks received gc cr block receive time gc current blocks received gc current block receive time gc blocks lost gc claim blocks lost gc blocks corrupt gc CPU used by this session global undo segment hints helped global undo segment hints were stale
STATSPACK
Statspack can still be used Change to level 7 to collect stats related to block contention and segment block waits
execute statspack.snap(i_snap_level=>7,i_modify_parameter=>'true');
EM used statspack data (until 9i) Starting 10g,EM uses data collected by AWR AWR collects and stores data internally . Statspack needs to be executed on all nodes Use sprepins for any specific instance However ,AWR is the recommended method
STASPACK report
Global Cache Load Profile ~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction Global Cache blocks received: 0.08 0.41 Global Cache blocks served: 0.19 0.91 GCS/GES messages received: 1.86 9.17 GCS/GES messages sent: 3.45 17.02 DBWR Fusion writes: 0.04 0.17 Global Cache Efficiency Percentages (Target local+remote 100%) Buffer access - local cache %: 99.91 Buffer access - remote cache %: 0.07 Buffer access disk %: 0.02 Global Cache and Enqueue Services - Workload Characteristics
Avg global enqueue get time (ms): Avg global cache cr block receive time (ms): Avg global cache current block receive time (ms): Avg global cache cr block build time (ms): Global cache log flushes for cr blocks served %: 0.1 0.0 1.3 0.3 5.7
Global Cache and Enqueue Services - Messaging Statistics
% of direct sent messages: % of indirect sent messages: % of flow controlled messages: 30.58 55.65 13.77
Performance Tuning
Calculate and Monitor :
Current Block average Latency Block Mode conversions GCS Request latency Avg Global Enqueue Time Avg lock convert time
Populate ges_convert_local and remote by enabling event 29700 Set fast_start_mttr_target to reduce PI and CR If using 10g – Use ADDM Analysis ADDM is proactive ,captures RAC related service Issues AWR reports –use awrrpti.sql for RAC Use GV$sysstat,gv$system_event,gv$...transfer gv$bsp,gv$cr_block_server
Reverse Key Indexes
OPS introduced RKI to avoid contention To preserve the B-Tree structure keys have to be inserted into specific blocks. Many rows are inserted in the same block Many nodes may try to insert into the same block All these will lead to contention which will reduce scalability Contention may lead to gcs and ges waits This contention is more prominent if sequences are used
Reverse Key Indexes
RKI will –
Distribute the load ie place the leaf blocks far away Reduces contention when nodes are competing NOT allow range scan NOT do efficient space management
Use dbms_utility.current_instance to avoid this contention
It will place leaf blocks away to avoid this contention
create table ioug (id number ,instance_id number ); alter table ioug add constraint ioug_pk primary key (instance_id,id) ; insert into ioug values (dbms_utility.current_instance,ioug_sequence.nextval) ;
Sequences in RAC
Many applications have to use them for unique number generation . Use alongwith dbms_utility.current_instance if it is a PK Be prepared to have some gaps . Options CACHE and ORDER cannot be used for RAC If ORDER is used then Data Dictionary will be accessed every time –overhead GCS maintains the sequences across the instances.
System Change Numbers
Generated Globally,Unique across all instances Written to redo logs for synchronization Two schemes –Lamport / Broadcast_on_commit Depends on max_commit_propogation_delay(mcpd) Lamport (9i) / Latch_free SCN scheme 2 (10g) - default ,mcpd = 700 centiseconds - no overhead ,all messages contain scn - not suitable for rapid and simultaneous update/query Broadcast_on_commit(mcpd=0) :some overhead as on commit LGWR will send a msg to other nodes
Parallelism in RAC
When dealing with large volume of data When we have extra CPU and memory Hint is different -- /*+ parallel (ps_item,2,2) */ QC will always be on the initiating node Slaves can be on others nodes Configured using
instance_group –instances have to be member(one or more) parallel_instance_group –dynamically ,all instances(default)
Freelist Group vs ASSM
Primarily ,to avoid buffer_busy_waits Freelist Group is especially useful for RAC - to reduce contention on segment header - space usage may not be very efficient Starting 9i ,ASSM may be a good solution - internal and automatic maintenance - good when nodes are added - not tunable, space ,slower for FTS - Contradictory results ASSM need not be manipulated if nodes are added or removed Remember this while migrating single instances to RAC
select tablespace_name from dba_tablespaces where segment_space_management = 'MANUAL‘
DBMS_SCHEDULER
Will replace dbms_jobs. Also support shell scripts which is very useful for RAC Scripts can now be located at a shared location Any available node of RAC can execute these script Instance_stickiness is very helpful for RAC . instance_stickiness- BOOLEAN
If this option is set to TRUE, then for the first run of the job the scheduler will choose the instance with the lightest load to run this job on. Subsequent runs will use the same instance that the first run used (unless this instance is down). If this is FALSE then the scheduler will choose the first available instance to schedule the job on on all runs.
SQLPLUS /OIFCFG
Alter system checkpoint local Alter system checkpoint global Alter system switch logfile –LOCAL Alter system archive log current –GLOBAL USE OIFCFG
To define network interfaces To retrieve configuration information To change network configuration
oifcfg getif
eth0 172.16.8.0 global public eth1 192.168.215.0 global cluster_interconnect
oifcfg iflist
eth0 172.16.8.0 eth1 192.168.215.0
The Universal installer uses oifcfg to identify and display
Other Features
Logminer – no continuous_mine Glogin.sql –
set sqlprompt ‘_connect_identifier’
Dbms_jobs –instance is a parameter Crontabs /dbms_scheduler gv$Views—catclust.sql ,inst_id External files ,utl_file
Question & Answers
Questions?