RAC
Real Application Cluster
Agenda
架构和概念
Clusterware
安装、配置与管理
VIP原理
负载均衡&故障恢复
自动工作负载
性能调优
什么是集群?
– 许多内联的节点,
像一台机器一样工作.
– 集群软件隐藏了结构. Private Interconnect
– 磁盘可供所有节点读写. Node
– 操作系统相同. Public
Network
Public
Network
Public
Network
Public
Network
Clusterware
on each node
Disks
什么是 Oracle Real Application
Clusters?
– 多个实例访问
同一个数据库 Interconnect
– 一个实例
一个节点
– Physical or Shared
cache
logical access
to each
database file Instances
spread
– 软件控制访问数据 across nodes
Database
files
RAC(Real application cluster)
并行工作,性能随节点增加
自动负载均衡
瞬间切换
在线按需增、删节点,维护管理成本低
免费集群件
heartbeat heartbeat
…多节点 …多节点
failover
共用 共用
资料库 资料库
Oracle 10g 网格计算
现在提供虚拟性和即时供应性
– 应用
– 信息
– 服务器
– 存储
基于低成本,标准化的硬件模块
– 可随意伸缩扩展
集群模式和单机模式
SMP model RAC model
Shared
Memory
storage
Cache Cache SGA SGA
CPU CPU CPU CPU BGP BGP BGP BGP
Cache 一致 Cache 融合
BGP: Background process
Oracle RAC
Architecture
OEM浏览器管理 中间件,网络 终端用户
Low Latency Interconnect
No Single
高速互连,
Switch
Point Of Failure
数据库实例
光纤或者网
络交换机 SAN网络
Drive and Exploit
磁盘阵列 Industry Advances in
Clustering
了解体系结构
共享数据模型
GES&GCS GES&GCS GES&GCS GES&GCS
共享内存/全局区域 共享内存/全局区域 共享内存/全局区域 共享内存/全局区域
共享的 日志 共享的 日志
. . .. . . 共享的 日志 共享的 日志
SQL 缓冲区 SQL 缓冲区 SQL 缓冲区 SQL 缓冲区
共享磁盘数据库
了解体系结构
公用网络
节点 1 节点 2 节点 3
数据库实例 1 数据库实例 2 数据库实例 3
集群互联 集群互联
ASM 实例 1 ASM 实例 2 ASM 实例 3
...
CRS CRS CRS
操作系统 操作系统 操作系统
共享存储
重做日志所有实例
数据库和控制文件
OCR 和 voting 磁盘
(oracle_home)
了解Clusterware体系结构
公用网络
节点 1 节点 2 节点 3
CRSD 集群互联 CRSD 集群互联 CRSD
CSSD CSSD CSSD
...
EVMD EVMD EVMD
操作系统 操作系统 操作系统
共享存储
重做日志所有实例
数据库和控制文件
OCR 和 voting 磁盘
(oracle_home)
了解Clusterware管理的资源
SERVICE
INSTANCE
ASM
LISTENER
ONS
GSD
VIP
RAC 组件架构
集群软件Clusterware
数据 数据 数据 数据 Grid Control
库实 库实 库实 库实
例 例 例 例
节点
集群 集群 集群 集群
软件 软件 软件 软件
操作 操作 操作 操作
系统 系统 系统 系统
VOTE RAW/OCFS/第三方文件系统
OCR RAW/OCFS/第三方文件系统
SPFILE参数文件 RAW/OCFS/ASM/第三方文件系统
共享磁盘 控制文件/redo RAW/OCFS/ASM/第三方文件系统
数据文件 RAW/OCFS/ASM/第三方文件系统
快速回磙区 RAW/OCFS/ASM/第三方文件系统
crsd – CRS Daemon
监控集群中的 resources
– 资源一般包括:
Listener
VIP
ASM Instance
Database Instance
AWM Service
Starts/Stops/Checks/Fails-over resources
维护 OCR 中的配置信息
cssd – CSS Daemon
提供组服务给集群
– Tracks who is part of the cluster
– Notifies CRS, RAC, and other cssds when nodes
join/leave cluster
提供CRS 锁服务
– Locking w/in clusterware to coordinate actions
across the cluster
– Entirely separate from Cache Fusion locking
最基本的, CSS 是一个 “节点监控器”
evmd – Event Manager Daemon
Provides an event-based messaging channel
between crsd, cssd, and other processes
可以被扩展,使得在事件发生时出发特殊的和
业务相关的动作
– Example:
Node down 发一个告警email
Node down 记录到日志里
VIP – Virtual IP
Each server has:
– Static IP racnode1 (145.1.1.101)
– VIP racnode1-vip (145.1.1.201)
Clients always use the VIP
VIP allows rapid detection of server failures
OCR & Voting Disks
Oracle Cluster Repository (OCR)
– Binary file containing a database (state directory) of CRS
configuration & status information
– Maintained by the CRS Daemon
– Can be mirrored on 10g R2
Voting Disk
– Used by CSS to resolve split-brain scenarios
– Can be 1 or 3 files in 10g R2
These files must be outside of ASM
Can be either raw partitions or regular files
查询 CRS 状态
[oracle]$ $CRS_HOME/bin/crs_stat
NAME=ora.linux1.vip NAME=ora.linux2.ASM2.asm
TYPE=application TYPE=application
TARGET=ONLINE TARGET=ONLINE
STATE=ONLINE on linux1 STATE=OFFLINE on linux2
NAME=ora.linux1.ASM1.asm NAME=ora.ract.mydb1.inst
TYPE=application TYPE=application
TARGET=ONLINE TARGET=ONLINE
STATE=ONLINE on linux1 STATE=ONLINE on linux1
NAME=ora.linux2.vip NAME=ora.ract.mydb2.inst
TYPE=application TYPE=application
TARGET=ONLINE TARGET=ONLINE
STATE=ONLINE on linux1 STATE=OFFLINE on linux2
...
Starting / Stopping CRS
/etc/init.d/init.crs start
/etc/init.d/init.crs stop
/etc/init.d/init.crs disable
/etc/init.d/init.crs enable
Starting / Stopping Resources
[oracle]$ srvctl stop database –d mydb
– Stops all RAC instances
[root]# srvctl stop nodeapps –n racnode1
– Stops Listener, VIP, GSD, ONS
[oracle]$ srvctl start asm –n racnode1
– Starts ASM on racnode1 and all required dependencies
[oracle]$ srvctl start instance –d mydb –i mydb1
– Starts one instance and all required dependencies
安装流程图
配置硬件 安装 Oracle CRS
安装 Oracle 数据库,
配置专用网络
包括 RAC 和 ASM
运行 VIPCA,自动从
安装和配置
RDBMS
坚不可摧的 Linux
root.sh 启动
配置存储器,
包括 ASMLIB 使用 DBCA 创建数据库
Cluster Verification Utility
(Cluvfy)
Allows customers to verify cluster during various
stages of its deployment from hardware setup, CRS
Install, DB install, add node etc.
Extensible framework
Non-intrusive verification
Command Line only
Cluvfy comp peer -n $MYNODES | more
Does not take any corrective action following the
failure of a verification task
Cluster Verification Utility
Stage Verification
Component Verification
Cluvfy Stage Checks
RAC deployment is logically divided into
several operational phases, we call them
“stages”
Each stage comprises of a set of operations
during RAC deployment
Each stage has its own set of entry (pre-
checks) and/or exit (post-checks) criteria
Cluvfy Stage List
$> ./Cluvfy stage -list
Valid stage options and stage names are:
-post hwos : post-check for hardware & operating system
-pre cfs : pre-check for CFS setup
-post cfs : post-check for CFS setup
-pre crsinst : pre-check for CRS installation
-post crsinst : post-check for CRS installation
-pre dbinst : pre-check for database installation
-pre dbcfg : pre-check for database configuration
Cluvfy Component Checks
An individual sub-system or a module of the RAC
cluster is known as a “Component” in Cluvfy
Availability, integrity, liveliness, sanity or any other
specific behavior of a cluster component can be
verified.
Components could be simple like a specific storage
device, or complex like the CRS stack, involving a
number of sub-components like CRSD, EVMD, CSSD
and OCR.
Cluvfy Component List
$> ./Cluvfy comp -list
Valid components are:
nodereach : checks reachability between nodes
nodecon : checks node connectivity
cfs : checks CFS integrity
ssa : checks shared storage accessibility
space : checks space availability
sys : checks minimum system requirements
clu : checks cluster integrity
clumgr : checks cluster manager integrity
ocr : checks OCR integrity
crs : checks CRS integrity
nodeapp : checks node applications existence
admprv : checks administrative privileges
peer : compares properties with peers
Starting and Stopping
RAC Instances with SQL*Plus
[stc-raclin01] $ echo $ORACLE_SID
RACDB1
sqlplus / as sysdba
SQL> startup
SQL> shutdown
[stc-raclin02] $ echo $ORACLE_SID
RACDB2
sqlplus / as sysdba
SQL> startup
SQL> shutdown
OR
[stc-raclin01] $sqlplus / as sysdba
SQL> startup
SQL> shutdown
SQL> connect sys/oracle@RACDB2 as sysdba
SQL> startup
SQL> shutdown
Starting and Stopping
RAC Instances with SRVCTL
– start/stop syntax:
srvctl start|stop instance -d -i
[-o open|mount|nomount|normal|transactional|immediate|abort>]
[-c | -q]
srvctl start|stop database -d
[-o open|mount|nomount|normal|transactional|immediate|abort>]
[-c | -q]
– Examples:
$ srvctl start instance -d RACDB -i RACDB1,RACDB2
$ srvctl stop instance -d RACDB -i RACDB1,RACDB2
$ srvctl start database -d RACDB -o open
Switch Between the Automatic
and Manual Policies
$ srvctl config database -d RACB -a
ex0044 RACB1 /u01/app/oracle/product/10.2.0/db_1
ex0045 RACB2 /u01/app/oracle/product/10.2.0/db_1
DB_NAME: RACB
ORACLE_HOME: /u01/app/oracle/product/10.2.0/db_1
SPFILE: +DGDB/RACB/spfileRACB.ora
DOMAIN: null
DB_ROLE: null
START_OPTIONS: null
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED
$
srvctl modify database -d RACB -y MANUAL;
RAC 初始参数文件
– An SPFILE is created if you use the DBCA.
– The SPFILE must be created on a shared volume or shared
raw device.
– All instances use the same SPFILE.
– If the database is created manually, then create an SPFILE
from a PFILE.
Node1 Node2
RAC01 RAC02
initRAC01.ora initRAC02.ora
SPFILE=… SPFILE=…
SPFILE
SPFILE 参数
– You can change parameter settings using the ALTER SYSTEM
SET command from any instance:
ALTER SYSTEM SET SCOPE=MEMORY sid='';
– SPFILE entries such as:
*. apply to all instances
. apply only to
. takes precedence over *.
– Use current or future *. settings for :
ALTER SYSTEM RESET SCOPE=MEMORY sid='';
– Remove an entry from your SPFILE:
ALTER SYSTEM RESET SCOPE=SPFILE sid='';
为什么 Oracle RAC 10g 有 VIP
Allows RAC to provide a highly available
database to applications and users
During normal operation, works the same as
hostname
During failure, it removes network timeout
from connection request time, client fails
immediately to next address in the list
Oracle RAC 10g VIP
Virtual IP address required – one for each node in
cluster
Required for Oracle Clusterware installation
Should be registered in DNS and be on the same
subnet as public IP address
IP and network name should not currently be in use
Configuration managed by VIPCA
Can use OS bonding to provide failover and load
balancing on network interfaces on the node
Oracle RAC VIP 与众不同
Only accepts connections when on its home
node
Failure on home node: relocates to another
node in the cluster only to send a silent error
back to client
You can only have one active RAC VIP per
node (there may be others who have
relocated due to failure!)
Oracle RAC 10g VIP
Should be listed in listener.ora
LISTENER_PMHA =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC2))
(ADDRESS = (PROTOCOL = TCP)(HOST = pmvip1)(PORT = 1525)(IP =
FIRST))
(ADDRESS = (PROTOCOL = TCP)(HOST = 144.25.214.45)(PORT = 1525)(IP
= FIRST))
)
)
Oracle RAC 10g VIP
Should be used by all client connections:
BRAB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = pmvip1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = pmvip2)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = brab)
)
)
冗余网卡
Use OS Network Interface bonding techniques
(NIC bonding) or
Give Oracle Clusterware an alternative interface to
use if the primary interface fails
Use SRVCTL MODIFY NODEAPPS to give Oracle
Clusterware a list of interfaces it can use:
srvctl modify nodeapps -n ukdh364 -A 138.2.238.15/255.255.255.0/eth0\|eth1
To see your configuration use
oifcfg getif
What happens when a Node
Fails?
VIP fails over to another node in the cluster
For any connection requests, a silent TCP/IP
error is returned and the client fails
immediately to the next TNS address.
When the node restarts, the VIP is relocated
back to its node
What happens when the
Network Interface(s) Fails?
VIP fails over to another node in the cluster, database
instance will fail, ASM instance will fail
– Note with 10.2.0.3, ASM and database instance will not fail
For any connection requests, a silent TCP/IP error is
returned and the client fails immediately to the next
TNS address.
When fixed, reboot node to return to normal or srvctl
start instance to bring VIP, instance, and ASM back
online.
VIP Failover
mydb = x.x.x.201
x.x.x.202
(VIP) (VIP)
x.x.x.201 x.x.x.202
(Static) (Static)
x.x.x.101 x.x.x.102
VIP Failover
mydb = x.x.x.201
x.x.x.202
TCP Reset
(VIP) (VIP)
x.x.x.202 x.x.x.201
(Static) (Static)
x.x.x.101 x.x.x.102
VIP Failover
mydb = x.x.x.201
x.x.x.202
(VIP) (VIP)
x.x.x.202 x.x.x.201
(Static) (Static)
x.x.x.101 x.x.x.102
How to Change the VIP
Stop all node-level applications on the server
srvctl stop nodeapps –n pmrac1
Change the VIP to the new IP using srvctl
srvctl modify nodeapps -n pmrac1 -o $ORACLE_HOME -A
140.86.195.163/255.255.255.0/eth0
Start the node-level applications
srvctl start nodeapps -n pmrac1
Start the ASM.
srvctl start asm –n pmrac1
Start the database instance.
srvctl start instance –d brab –i brab1
Remember to update listener.ora and tnsnames.ora
Tracing the VIP
If you are having problems and do not
understand what is happening with the VIP, you
can turn on tracing:
– Edit CRS_HOME/bin/racgvip and uncomment the following line
45: #_USR_ORA_DEBUG=1 && export _USR_ORA_DEBUG
– OR as root, run "crsctl debug log res ora..vip:1"
where you specify your VIP resource name after "res“
Trace files will be in
CRS_HOME/log//racg/ora..vip.log
Oracle RAC负载均衡
客户端的负载均衡 Oracle Net Services
随机
客户端不清楚服务器的负载,只是随机分配连接到不同的节点.
服务端的负载均衡 (listener)
Load Base
服务端根据各节点的负载来分配连接(通过判断每一个节点的RunQ 长度
).
*当使用连接池或通过应用服务器连接不要使用这种方式.
Session base
服务端根据各节点已有的连接数来分配连接
set a listener parameter, prefer_least_loaded_node_listener-name=off.
10G 新特性 Load Balancing Advisory
运行时间连接负载均衡
JDBC, ODP.NET POOL
CRM Client connection requests
connection
cache ?
60% 30%
10%
CRM CRM CRM
空闲 非常忙 忙
Instance 1 Instance 2 Instance 3
10G 新特性- Load Balancing Advisory
按照定义的service监控集群中每一个
instances工作量活动.
根据定义的判断标准(服务时间/数据吞吐量)来
分析每一个节点的工作量级别
将每个节点的工作量大小通过FAN(快速应用通
知)的方式发送到前端.
运行时间连接负载均衡
目前JDBC and ODP.NET 连接池支持
客户端连接池与RAC 负载均衡顾问整合在一起
当客户端应用开始 “getConnection”时,连接会
分配到一个最好的节点.
Policy defined by setting GOAL on Service
Oracle 应用服务器 10g 提供的故障通知
(FaNTM)技术
Oracle 10g 集群件
快速、协同的恢复,无
需人工干预
应用 – 例程发生故障时,
服务器 Oracle RAC 10g 向 10g
10g JDBC 快速连接故障切
换发出信号
– 中间层的立即恢复
RAC 从15分钟减少到4 秒
以内
自我修正
故障通知 (FaNTM)
JDBC 快速连接故障切换处理
从 RAC 10g 接收到停
机信号时
– 将新的请求路由
JDBC / 中间层 数据库层
到正常运行的例
程 缓存
– 如果应用程序正 例程 X
服务 1
在进行事务处理
,则抛出异常 服务 2 例程 Y
从 RAC 10g 接收到开
机信号时 服务 3
– 创建到新例程的 例程 Z
新连接
– 将新的工作请求
平均分发到所有
可用例程
通知唤起
用户可以编写发生通知时调用的唤起程序
– 通知包括节点 启动/停止,例程启动/停止,或服务
启动/停止
用法示例:
– 发送电子邮件、页面
– 记录状态信息
– 启动/停止程序
Configuring the Server-Side
ONS
Mid-tier1
localport=6100
remoteport=6200 ONS
useocr=on
$ racgons add_config node1:6200 node2:6200
2
$ racgons add_config midtier1:6200
$ onsctl reconfig 3 3 $ onsctl reconfig
ONS ONS
OCR
ons.config 1 1 ons.config
Node1 Node2
Optionally Configure the
Client-Side ONS
Mid-tier1
ons.config ONS $ onsctl start 2
localport=6100
remoteport=6200 1
nodes=node1:6200,node2:6200
ONS ONS
OCR
ons.config ons.config
Node1 Node2
JDBC Fast Connection
Failover: Overview
Service Mid-tier1
Service or node
UP event ONS
DOWN event
JDBC ICC
Event
handler
Connections Connections
reconnected Connection Cache marked down &
cleaned up
Connections Connections
using Listeners using
service names service names
Connections
load balancing
ONS ……… ONS
Node1 Noden
自动工作负载管理
订单录入 备用 供应链
正常服务器分配
自动工作负载管理
订单录入 供应链
季度末
自动工作负载管理
订单录入 备用 供应链
正常服务器分配
自动工作负载管理
订单录入 备用 供应链
服务器发生故障
自动工作负载管理
订单录入 供应链
将备用服务器重新分配给订单项
自动工作负载管理
定义服务
为希望单独管理的每个工作负载创建一个服务
– 数目可能很小
每个服务获取一个全局唯一名称
无需更改应用程序
在 TNS 连接数据中指定服务
例如在tnsname中
CRM =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = MARTINYL-CN)(PORT =
1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = CRM)
)
)
自动工作负载管理
使用 DBCA 定义分配规则
规则指定自动的资源分
配
– 正常情况下的首选例程
– 发生故障时的可用例程
自动工作负载管理
企业管理器控制
执行服务操作
– 开始/停止
– 启用/禁用
– 重新分配
查看服务状态
– 包括自动资源分配规则
自动工作负载管理
性能跟踪
Load Balancing Advisory
– The Load Balancing Advisory (LBA) is an advisory for sending
work across RAC instances.
– The LBA advice is available to all applications that send work:
JDBC and ODP connection pools
Connection load balancing
– The LBA advice sends work to where services are executing well
and resources are available:
Relies on service goodness
Adjusts distribution for different power nodes, different
priority and shape workloads, changing demand
Stops sending work to slow, hung, or failed nodes
JDBC/ODP.NET Runtime Connection
Load Balancing: Overview
CRM work requests
Connection Cache
10%
?
30%
60%
RAC RAC RAC
CRM is CRM is CRM is
Inst1 very busy.
Inst2 Inst3
not busy. busy.
Session and System Statistics
– Use V$SYSSTAT to characterize the workload.
– Use V$SESSTAT to monitor important sessions.
– V$SEGMENT_STATISTICS includes RAC statistics.
– RAC-relevant statistic groups are:
Global Cache Service statistics
Global Enqueue Service statistics
Statistics for messages sent
– V$ENQUEUE_STATISTICS determines the enqueue with the
highest impact.
– V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into
block classes.
Index Block Contention:
Considerations
Wait events
enq: TX - index
Index
contention block
gc buffer busy Split in
progress
gc current block
busy
gc current split
System statistics
Leaf node splits
Branch node splits
Exchange deadlocks
gcs refuse xid
gcs ast xid
Service ITL waits RAC01 RAC02
Oracle Sequences and Index
Contention
Can contain 500 rows
…
1…50000 50001…100000
CACHE 50000 NOORDER
RAC01 RAC02
Undo Block Considerations
Index Changes
Reads
…
SGA1 SGA2
Undo Undo
Additional
interconnect traffic
Concurrent Cross-Instance
Calls: Considerations
Dirty
block
SGA1 SGA2
Table1 Table1
CKPT CKPT
Table2 Table2
1
2
3 4
Truncate Table1 Truncate Table2
Cross-instance call
Eg. parallel truncate table cross instance
AWR Snapshots in RAC
MMON Coordinator
In-memory
statistics
SYSAUX
SGA (Inst1)
AWR tables
… 6:00 a.m.
9:00 a.m. 7:00 a.m.
8:00 a.m.
MMON 9:00 a.m.
In-memory
statistics
SGA (Instn)
What Is a Service?
– Is a means of grouping sessions that are doing the same
kind of work
– Provides a single-system image instead of a multiple-
instances image
– Is a part of the regular administration tasks that provide
dynamic service-to-instance allocation
– Is the base for High Availability of connections
– Provides a new performance-tuning dimension
High Availability of Services in
RAC
– Services are available continuously with load shared across
one or more instances.
– Additional instances are made available in response to
failures.
– Preferred instances:
Set the initial cardinality for the service
Are the first to start the service
– Available instances are used in response to preferred-
instance failures.
Possible Service
Configuration with RAC
Active/spare
RAC01 RAC02 RAC03
AP AP
GL GL
Active/symmetric Active/asymmetric
RAC01 RAC02 RAC03 RAC01 RAC02 RAC03
AP AP AP AP AP AP
GL GL GL GL GL GL
Create Services with the
DBCA
Create Services with the
DBCA
The DBCA configures both the Oracle Clusterware resources
and the Net Service entries for each service.
Create Services with
Enterprise Manager
Create Services with SRVCTL
$ srvctl add service –d PROD –s GL -r RAC02 -a RAC01
$ srvctl add service –d PROD –s AP –r RAC01 -a RAC02
RAC02
AP GL AP GL
RAC01
Preferred and Available
Instances
$ srvctl add service –d PROD –s ERP \
–r RAC01,RAC02 -a RAC03,RAC04
1 2
RAC01 RAC02 RAC03 RAC04 RAC01 RAC02 RAC03 RAC04
ERP ERP ERP ERP ERP ERP ERP ERP
4 3
RAC01 RAC02 RAC03 RAC04 RAC01 RAC02 RAC03 RAC04
ERP ERP ERP ERP ERP ERP ERP ERP
Manage Services
– Use EM or SRVCTL to manage services:
Start: Allow connections
Stop: Prevent connections
Enable: Allow automatic restart and redistribution
Disable: Prevent starting and automatic restart
Relocate: Temporarily change instances on which
services run
Modify: Modify preferred and available instances
Get status information
Add or remove
– Use the DBCA :
Add or remove
Modify services
Manage Services with
Enterprise Manager
Manage Services with EM
Manage Services: Example
– Start a named service on all preferred instances:
$ srvctl start service –d PROD –s AP
– Stop a service on selected instances:
$ srvctl stop service –d PROD –s AP –i RAC03,RAC04
– Disable a service at a named instance:
$ srvctl disable service –d PROD –s AP –i RAC04
– Set an available instance as a preferred instance:
$ srvctl modify service –d PROD –s AP -i RAC05 –r