Embed
Email

Monitoring Your RAC 10g Cluster Environment

Document Sample
Monitoring Your RAC 10g Cluster Environment
Description

The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.

1



Monitoring Your RAC 10g Cluster Environment V2.0

Gary McGalliard

RAC Pack - Technical Manager



2



Subjects to Discuss

Why we Monitor What to Monitor How to Monitor Questions



3



Why Monitor?



4



Common Oracle DBA Tasks

Installing Oracle software Creating Oracle databases Performing upgrades Starting up and shutting down the database Managing the database’s storage structures Managing users and security Managing schema objects Making database backups/ recovery when necessary Proactively monitoring the database’s health and taking preventive or corrective action as required Monitoring and tuning performance

5



Track Application Usage

What are the busy periods? Is the workload as expected? Has the disk usage gone up? Is the avg transaction length increasing? Are we using more CPU? Are there more users than last month, last quarter, last year? Are we meeting user expectations? – Service Level Agreement/Objectives (SLA/SLO) – This is the ultimate measure of IT success



Same questions as single instance monitoring

6



Evaluate Changes

Did the last change …

– – – – –



Help lower CPU usage? Increase the read rate? Reduce the write rate? Change the average transaction profile? Improve the user’s perception of response time?



Same questions as single instance



7



Capacity Planning

When should another machine be ordered? How long will the current storage unit last? Network performance still within limits? Can the systems handle the next change? Are additional resources needed before increasing application users by X%? Same questions as single instance



8



Prevent Unplanned Outages

Use effective management practices Check logs for error messages Review application testing reports Adhere to capacity planning standards Unplanned downtime drains business bottom lines Same methods as single instance Service Level Agreements/Objectives (SLA/SLO) define outage types

9



Service Level Agreements Clearly define SLO’s

Sufficiently granular

– –



Cannot architect, design, OR manage a system without clearly understanding the SLOs 24x7 is NOT an SLO



Define HA/recovery time objectives, throughput, response time, data loss, etc

– – –



Need to be established with an understanding of the cost of downtime for the system. RTO and RPO are key availability metrics Response time and throughput are key performance metrics Planned vs unplanned Localized vs site-wide



Must address different failure conditions

– –



Must be linked to the business requirements





Response time and resolution time



Must be realistic

10



Why Monitor? - Summary

Part of DBA’s Common Task List Track Application Usage/trends Evaluate Changes (relative to SLA/SLO’s) Capacity Planning Prevent Unplanned Outages Meeting Service Level Agreements/Objectives



11



What to Monitor?

EVERYTHING

Same resources as single instance For RAC:

– – –



Each instance carrying planned load (balanced?). Shared storage access is equal. Interconnect

Load Latency







High CPU usage - Oracle processes getting enough resources.



12



What to Monitor?

Performance Statics, Logs, Errors at ALL Levels

Application Level



Database Level



OS Level



13



OS Level Statistics

Each cluster member, check usage

– –



CPU – blocked queue length, %idle IO – queue length, response times

Storage Network - Public - Private Interconnect (RAC)



– –



Memory – paging, swapping, scan rates Log: /var/log/messages – error messages



14



What to Monitor? CRS 10.2.0.x

ORA_CRS_HOME – CRS alert log - log//alert.log – CRS logs - log//crsd/ – CSS logs - log//cssd/ – EVM logs – log//evmd & evm/log/ – SRVM logs - log//client – OPMN logs - opmn/logs – Resource specific logs – log//racg – Cluster Network Communication logs - log ORACLE_HOME (rdbms) – Resource specific logs – log//racg – SRVM logs - log//client



Note 331168.1 - Oracle Clusterware consolidated logging in 10gR2



15



What to Monitor? ASM

alert_.log – Default: ORACLE_HOME/rdbms/log Trace Files – Default: ORACLE_HOME/rdbms/log bdump - background_dump_dest cdump - core_dump_dest udump - user_dump_dest



16



What to Monitor? RDBMS

alert_.log – Default: ORACLE_HOME/rdbms/log Trace Files – Default: ORACLE_HOME/rdbms/log – bdump - background_dump_dest – cdump - core_dump_dest – udump – user_dump_dest AWR / Statspack (each node for RAC) – retain for one full business cycle listener_.log – Default: ORACLE_HOME/network/log



17



What to Monitor? Application

Must be designed and coded into the application. Mid-tier server OS level monitoring can use the same methods as the database server. Remember, monitoring is about identifying deviations to “normal” processing expectations.





Establish baselines at all levels



The deviations are then investigated as possible problems.



18



How to Monitor?



19



What Are Baselines?

Baselines are time-lagged calculations (usually averages of one sort or another). Provides a basis for making comparisons of past performance to current performance.





Compare past Mondays to this Monday, past weeks to this week, etc. Determining whether the trends show you're likely to meet an established goal.



May also be forward-looking.





Be aware of how your systems perform. Record baseline information and review on a regular schedule.

20



OS (Unix) Tools

top – Top Processes ps – Process Status iostat - I/O Statistics netstat - Network Statistics vmstat - Virtual Memory Statistics ping - Checks network host connectivity



21



OS Watcher (OSW)

A collection of UNIX shell scripts intended to collect and archive operating system and network metrics. Support in diagnosing:

– –



complex RAC issues generic performance issues



OSW operates as a set of background processes, gathering OS data on a regular basis using Unix utilities. OSW can be installed and run standalone. Data collection intervals are configurable by the user.



22



OS Watcher (OSW)

OSW is certified on the following platforms:



• AIX, Tru64, Solaris, HP-UX, Linux OSW invokes distinct OS utilities • ps, top, mpstat, iostat, netstat, traceroute, vmstat startOSW.sh - start OSW processes

– –



arg1 = snapshot interval in seconds. arg2 = number of hours of archive data to store.



stopOSW.sh - terminate all OSW processes Metalink Note 301137.1

23



How to Monitor? – OS Level Summary

There are many tools which collect statistics at the OS level.

– – – –



Pick one/several you like Collect the information Review the results Review the methods used on a regular basis

Change as needed - e.g. New tools are available



24



Automatic Workload Repository

Superior to Any Other Data Collection Tool Automatic, Self-Managing, More Efficient Set-up Out-of-Box Pre-Calculated Metrics





E.g. transactions/second, logon/second, etc.



Foundation of Self-Management Enables Historical Performance Analysis









My user complained about poor performance 3 AM last night. What was going on then? Who was using the system at any given time in the past and what exactly were they doing?



25



Automatic Workload Repository Regularly Monitor

Load Profile Top 5 Timed Events RAC Statistics

– – –



Global Cache Load Profile Global Cache Efficiency Percentages Global Cache and Enqueue Services



26



Oracle Enterprise Manager 10g

Enables management of RAC environments as single system image Cluster Database page provides RAC – wide view – Aggregated status, performance data across all instances – Supports operations on database and services – Drill down to pages for specific instances – Drill up to cluster page Cluster page – Shows hardware and operating system configuration, performance, and status across cluster – Drill down to pages for specific nodes

27



RAC Administration

Single system image Cluster Database page provides RAC-wide view

– – – – –



Aggregated status Performance data across all instances Database Operations Drill down to instances Drill up to cluster Hardware OS configuration Performance Status Drill down to nodes



Cluster page

– – – – –



28



RAC Monitoring

• • • • CRS Monitoring RAC DB and Instance monitoring Interconnect monitoring Cluster cache diagnostics



User transparency Cluster awareness

– –



Database Hosts (OS) e.g. storage alerts



Database-level alerts





Cluster-aware EM jobs RAC-specific performance management Service Assurance Management



29



Cluster Database Performance



30



RAC Interconnect Monitoring

Monitor private and public interconnects Identify interconnects used Traffic generated Interconnect alerts



31



RAC Cluster Cache Diagnostics Monitor inter-instance



communication Identify performance problems due to object contention



32



Comprehensive System Monitoring

Integrated Database and OS Monitoring Comprehensive Performance Monitoring for All Supported Database Versions





– –



Well Defined, Intuitive, Performance Management Workflow Detailed Wait , Session, SQL Drilldowns Historical Performance Data

Event, Metric History







Full Integration with New Oracle10g Data Sources

AWR, ASH



33



Monitor your system

Define key metrics and monitor them actively





Establish a (performance) baseline RDA (+ RACDDT) AWR/ADDM Active Session History OSWatcher Enterprise Manager



Learn how to use Oracle-provided tools

– – – – –



Coordinate monitoring and collection of OS level stats as well as db-level stats





Problems observed at one layer are often just symptoms of problems that exist at a different layer



Don’t jump to conclusions



34



References

Metalink Note: 301137.1 - “OS Watcher User Guide” Metalink Note: 175853.1 - “Remote Diagnostics Agent (RDA)” Metalink Note: 250655.1 - “How to use the Automatic Database Diagnostic Monitor ” Metalink Note: 243132.1 - “10g New Feature Active Session History (Ash) And Analysis Of Ash Online And Offline ” OTN - Enterprise Manager 10g Grid Control: ScreenWatch Demos (Monitoring)





http://www.oracle.com/technology/products/oem/htdocs/demos.html



Oracle® Database 2 Day DBA Oracle® Database PL/SQL Packages and Types Reference Oracle® Database Performance Tuning “Service Level Agreement in the Data Center” By Edward Wustenhoff –Sun Professional Services

35



QUESTIONS ANSWERS



36



Thank You!



37



38




Related docs
Other docs by Arun Mahendran
AMS Best Practice
Views: 167  |  Downloads: 31
Bhoomika Chawla
Views: 28  |  Downloads: 0
Microsoft Exchange Server 2003
Views: 343  |  Downloads: 72
Swine Flu
Views: 29  |  Downloads: 8
RAC DBA-2
Views: 918  |  Downloads: 151
Mr & Mrs Smith Screenplay
Views: 1680  |  Downloads: 52
Understanding RAC Internals
Views: 4089  |  Downloads: 299
Anushka
Views: 139  |  Downloads: 5
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!