MAA - Oracle

Document Sample
MAA - Oracle Powered By Docstoc
					     <Insert Picture Here>




Oracle Maximum Availability Architecture &
Best Practices – Technical Overview
Lawrence To, High Availability and MAA Database Team
Joe Meeks, HA/MAA Product Manager
  Agenda

• Oracle Maximum Availability Architecture   <Insert Picture Here>

• MAA Best Practices – Oracle Database
  • Minimizing Unplanned Outages
  • Minimizing Planned Outages
• Resources




                                                           2
     Oracle Maximum Availability Architecture
                                                                  Best Availability AT
Integrated suite of best-of-breed HA technologies                    Lowest Cost
        - Scaleable, active-active, data centric
                                                                    Data Guard
                                   Online Upgrade                    Fully Active
Real Application Clusters          Upgrade Hardware                Failover Replica
     & Clusterware                and Software Online        Rolling Database Upgrades
      Fault Tolerant
     Server Scale-Out


                                                                          Database
                               Database
  Automatic Storage
    Management
     Fault Tolerant                                                         Storage
   Storage Scale-Out
                                                                           Streams
                                 Storage                           Multi-master Replication
    Online Redefinition                                            Hub & Spoke Replication
   Redefine Tables Online       Flashback           Recovery Manager &
                             Correct Errors by
                                                    Oracle Secure Backup
                            Moving Back in Time
                                                    Data Protection & Archival


                                                                                      3
   Oracle’s Integrated HA Solution Set

            System       Real Application Clusters




                                                     Oracle MAA Best Practices
            Failures
Unplanned                         ASM
Downtime      Data
                               Flashback
                       RMAN & Oracle Secure Backup
            Failures           Data Guard
                                Streams


            System        Online Reconfiguration
                             Online Patching
 Planned    Changes
                            Rolling Upgrades
Downtime
              Data
                            Online Redefinition
            Changes


                                                           4
                         Enterprise Manager 11g
Integrated Management   High Availability Console




                                                5
MAA – Integrated HA Best Practices


                  • MAA is a blueprint for achieving HA
                    • Correlates HA capabilities to customer
                      requirements
                    • Operational best practices
    AA
   M t,             • Prevent, tolerate, and recover
        n         • Tested, validated, and documented
     eve , and
  Pr ate
     r                 •   Database, Storage, Cluster, Network
  ole ver es
 T co
    e utag             •   Oracle Enterprise Manager
   R O
     m                 •   Oracle Application Server
 Fro
                       •   Oracle Applications

       otn.oracle.com/deploy/availability


                                                                 6
          www.oracle.com/technology/deploy/availability/htdocs/maa.htm
MAA OTN   www.oracle.com/technology/deploy/availability/demonstrations.html




                                                                       7
 Agenda

• Oracle Maximum Availability Architecture
• MAA Best Practices – Oracle Database
  • Minimizing Unplanned Outages
  • Minimizing Planned Outages
• Resources




                                             8
Server Scalability and High Availability
Oracle RAC
                               •   RAC pools standard low cost servers
                               •   Active – Active configuration
                               •   Great scalability - no idle resources
                               •   Service Management Framework
                                    • Easily manage resources across a
                                      cluster
                               • High Availability
                                    • Automatic failover and load balance
                               • Runs commercial applications
                                    • Oracle Applications, SAP, etc.
     Database                  • Thousands of production customers


http://www.oracle.com/technology/products/database/clustering/index.html

                                                                            9
Results Chart by Failure




                           10
  MAA Best Practices – Oracle RAC

• Client
    •   FCF, FAN ONS, FAN OCI, JDBC connection pooling
• Server
    •   Fast_start_mttr_target (11g)
    •   _fast_start_instance_recovery_target (10.2)
    •   CSS in real time
    •   VIP check interval
    •   Async io
    •   Listener Throttling can help in some cases
• MAA Best Practices
    •   http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm#i101358
        3
    •   http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_FastRecover
        yOracleClusterwareandRAC.pdf




                                                                                    11
Storage Protection & Performance
Automatic Storage Management – ASM
                              • ASM mirrors data across low cost
                                modular storage arrays
                                 • Automatically remirror when disk fails
                              • Simplifies administration
                                 • Add/subtract disks online
      Database                   • Automatically rebalance I/O
                              • ASM Oracle 11g enhancements
                                 • Use mirror to automatically re-read and
                                   repair when encountering IO problems
       Storage
                                 • Fast resync of mirror copy upon
                                   recovery from transient disk failures –
                                   uses only changed blocks
                                 • Rolling Upgrade for ASM instances


 http://www.oracle.com/technology/products/database/asm/index.html

                                                                       12
 MAA Best Practices - ASM

• Use clustered ASM to enable the storage GRID
• Use vendor RAID with legacy storage arrays; use ASM redundancy
  with medium/low cost storage arrays
• ASM ORACLE_HOME should be different from and RDBMS
  ORACLE_HOME to ease planned maintenance
• For maximum protection, use at least 3 Failure Groups for normal
  redundancy and 4 Failure Groups for external redundancy
• Ensure paths to storage feature both multipathing and fault tolerance
• Two diskgroups to ease manageability (DATA and Flash Recovery
  Area)
• Additional MAA Best Practices for ASM
   • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.ht
     m#CHDIBCCC



                                                                           13
    Protection from Human Error                                             Traditional
                                                                            Recovery

    Oracle Flashback Technologies                           80

                                                            70

                                                            60

                                                            50

• Flashback Revolutionizes Error Recovery                   40

                                                            30
   • Operates on just changed data                          20

   • Time to correct error equals time to make error        10   Flashback
                                                             0
      • Minutes instead of hours                                  Time To Recover (minutes)




   Correction Time = Error Time + f(DB_SIZE)
• Flashback is Easy
   • Single command instead of complex procedure
• Less performance overhead for OLTP and batch
• Great for testing especially when used with restore points
http://www.oracle.com/technology/deploy/availability/htdocs/Flashback_Overview.htm

                                                                                              14
       Error Investigation with Flashback
                        • Flashback Query
                           • Query all data at point in time
                          select * from Salary AS OF ‘12:00 P.M.’ where …


                         Flashback Version Query
                           –   See all versions of a row between times
                           –   See transactions that changed the row
Tx 3                       select * from Salary VERSIONS BETWEEN
                            ‘12:00 PM’ and ‘2:00 PM’ where …

 Tx 2
                         Flashback Transaction Query
                           –   See all changes made by a transaction

  Tx 1               select * from FLASHBACK_TRANSACTION_QUERY
                      where xid = ‘000200030000002D’;



                                                                     15
Error Correction with Flashback

        Database   Correct errors at any level
   Customer        •   Flashback Database – restore
                       database to time

                   •   Flashback Table – restore contents of
                       tables to time

    Order



                   •   Flashback Transaction – back out
                       transaction and all subsequent
                       conflicting transactions


                                                       16
 Flashback Database Use Cases

• Data Guard Integration
  • Fast-start Failover Reinstate
  • Snapshot Standby
• Upgrade Fallback
• Logical Failures & Reinstate
• Fast Restore for Testing or Planned Changes




                                                17
 MAA Best Practices - Flashback

• Enabling Flashback Database
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b251
    59/configbp.htm#i1014673


• Recovery from human error
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b251
    59/outage.htm#i1010215




                                                                   18
 Data Corruption Protection
 Oracle-aware Validation, Backup, and Repair
• DB_ULTRA_SAFE parameter (11g)
  • Most comprehensive data corruption detection and prevention
  • DB_BLOCK_CHECKING detects and prevents data block corruptions
  • DB_BLOCK_CHECKSUM detects and prevents (standby only) redo and data
    block corruptions
  • DB_LOST_WRITE_PROTECT - Detect writes lost by the I/O subsystem
     • Best protection when used together with Data Guard standby database
• Data Recovery Advisor (11g)
  • Quickly diagnose and repair data failures
• Oracle Recovery Manager – RMAN
  • Automate backups and management of recovery related files
• Oracle Secure Backup
  • Integrated tape backup and management


                                                                   19
 MAA Best Practices – Data Protection

• Set DB_ULTRA_SAFE on primary and standby (11g)
  • Block checking prevents memory and data corruptions. Overhead
    on every block change.
  • Redo and data block checksum detect corruptions on the primary
    and protect the standby. Minimal CPU resource required.
  • Lost write protection detects lost writes on the primary and protects
    physical standby databases from these corruptions. Minimal redo
    increase.
• Use Data Recovery Advisor for non-RAC primary databases
• Use RMAN to detect physical and logical corruptions
• Use Data Guard for best comprehensive corruption protection




                                                                            20
  Data Recovery Advisor - DRA
• An Oracle tool that automatically diagnoses data failures, presents
  repair options, and executes repairs at the user's request
• Determines failures based on symptoms
    • E.g. an “open failed” because datafiles f045.dbf and f003.dbf are missing
    • Failure Information recorded in diagnostic repository (ADR)
    • Flags problems before user discovers them, via automated health
      monitoring
• Intelligently determines recovery strategies
    • Aggregates failures for efficient recovery
    • Presents only feasible recovery options
    • Indicates any data loss for each option
• Can automatically perform selected recovery steps
• First release only supports non-RAC primary databases

         Reduce downtime by eliminating confusion and
               automating detection and repair

                                                                            21
Data Recovery Advisor Wizard




                               22
Data Recovery Advisor – View Failures




                                        23
Data Recovery Advisor – Manual Repair




                                    24
Data Recovery Advisor – Recovery Advice




                                    25
Data Recovery Advisor – Summary




                                  26
       Automated Disk Backups
       Oracle Recovery Manager
                                                   • Fully automatic disk-based
                                                     backup and recovery
                                                        • Set and Forget
                                                   • Nightly incremental backup rolls
                                                     forward recovery area backup
                                                        • Changed blocks are tracked in
                                                          production DB or standby DB
                                                        • Full scan is never needed
   Database     Nightly Flash Recovery   Weekly         • Dramatically faster (20x)
     Area       Apply        Area        Archive
                Validated                To Tape   • Low cost ATA disks can be
                Incremental
                                                     used for recovery area
Blocks validated during entire backup and recovery process

    http://www.oracle.com/technology/deploy/availability/htdocs/rman_overview.htm

                                                                                      27
 RMAN Enhancements

• Better performance
   • Intra-file parallel backup and restore of single data files >= 1 GB
   • Faster backup compression (ZLIB, ~40% faster)
• Better security
   • Virtual Private Catalog - grant visibility of a subset of registered
     databases in the catalog to specific RMAN users
• Lower space consumption and faster instantiation
   • Duplicate database or create standby database over the network,
     avoiding intermediate staging areas
• Integration with Windows Volume Shadow Copy Services API
   • Allows database to participate in snapshots coordinated by VSS-
     compliant backup management tools and storage products
   • Database is automatically recovered upon snapshot restore via RMAN



                                                                            28
 MAA Best Practices - RMAN

• Enable Archive Log Mode
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi
    gbp.htm#i1006953
• Use a Flash Recovery Area
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi
    gbp.htm#i1014270
• Configure backup and recovery
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi
    gbp.htm#i1007374
• Recovering from data corruption (data failures)
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/outa
    ge.htm#i1006317
• Use Oracle Secure Backup’s Tape Backup
  Management Solution
  • http://www.oracle.com/technology/products/secure-backup/index.html


                                                                           29
 Data Availability and Disaster Protection
 Oracle Data Guard

• High availability
  • Tolerate outages transparently
  • Recover from outages quickly
  • Address planned maintenance and unplanned events
• Complete data protection
  • Standby data must be isolated from production faults
  • No data should be lost
• Full systems utilization
  • Standby resources should be utilized for productive use
• Straightforward to manage
  • Integrated, reliable, high performance


                                                              30
    Best Failure Protection at Lowest Cost


                            Automatic Failover                      Physical
Production                                                         or Logical
 Database                                                         Standby DB
                                Synchronous
                                Redo Shipping

                               Data Guard

   • Synchronous or asynchronous redo shipping
   • Corruptions don’t propagate - Most comprehensive redo and block corruption
     and lost write detection and protection
   • Deploy on low cost servers and storage, no special network components
   • Thousands of production customers


                                                                           31
    Zero Data Loss over Longer Distances

                                   Data Guard DR Sweet Spot
                                • Far enough to avoid regional disaster
                                • Close enough for zero data loss


                 100 miles                   200 miles                    300+ miles


                        Data Guard: Synchronous Redo Shipping


                       • Data Guard redo transport uses order of
Synchronous              magnitude less network messaging than
Disk Mirroring           disk-based remote mirroring
                             • Enables zero data loss at hundreds of miles



                                                                                 32
 Enhanced Automatic Failover

• Fast-Start Failover supports ASYNC configurations
  • Automatic failover to a standby located 1,000s of miles away
  • Configurable maximum data loss
• Immediate failover for user-configurable health conditions
  • ENABLE FAST_START FAILOVER [CONDITION <value>];
  • Examples: datafile offline, corrupted controlfile, any explicit ORA-
    xyz error (e.g. ORA-1578) . . .
• Apps can request fast-start failover
  • DBMS_DG.INITIATE_FS_FAILOVER
• Integrated with Oracle Cold Cluster Failover



                                                                      33
Active Data Guard 11g
Real-time Query
                                             Real-time
                                            Real-time
                                               Query
                                             Queries



                        Continuous Redo
                       Shipment and Apply




          Production                  Physical Standby
           Database                      Database

• Offload read-only queries to physical standby
• Read-only scalability with standby reader farm or RAC standby
• Offload fast incremental backups to physical standby

                                                                  34
    Snapshot Standby
    Use Standby Database for Testing


   Updates                                         Queries
                                                   Updates




   Primary                                    Physical Standby
                                             Snapshot Standby
   Database                                      Database
                                                 Database




• Preserves zero data loss – continuous redo transport while open read-write
• Similar to storage snapshots, but provides continuous DR using same storage
• Can also be done using Data Guard 10g Release 2 – but more manual steps



                                                                                35
 MAA Best Practices – Data Guard

• Configure for Oracle Database 10g
  • http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.ht
    m#i1007026
• Set DB_ULTRA_SAFE for data corruption protection (11g)
  • http://download.oracle.com/docs/cd/B28359_01/server.111/b28281/hafeatures.
    htm#sthref84
• Optimize, transport, apply, and role transitions
  • http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
     • Data Guard Redo Transport & Network Configuration
     • Data Guard Redo Apply & Media Recovery (physical standby)
     • Data Guard SQL Apply (logical standby)
     • Data Guard Switchover and Failover
     • Data Guard Fast-Start Failover (automatic failover)
     • Data Guard Client Failover for Highly Available Oracle Databases




                                                                          36
    Multi-Master Replication
    Oracle Streams

 Source                                                 Target
Database                                               Database

                         Propagate




     Redo          Capture               Apply1
     Logs
                                       Apply2
                                                  Transparent
                                                   Gateway
•All sites active and updatable
                                                               Non-Oracle
•Flexible configurations – n-way, hub & spoke, …
                                                                Database
•Database platform / release / schema structure can differ
•HA for custom apps where update conflicts can be avoided or managed

                                                                      37
  Agenda

• Oracle Maximum Availability Architecture   <Insert Picture Here>

• MAA Best Practices – Oracle Database
  • Minimizing Unplanned Outages
  • Minimizing Planned Outages
• Resources




                                                           38
    MAA Planned Maintenance Solutions


Activity                   Oracle Solution                Downtime

Add and remove
processors and nodes
                           Dynamic Resource Management    Zero

                           Automatic Shared Memory
Grow and shrink memory                                    Zero
                           Management
• Add and remove disks
• Migrate to new storage
                           Automatic Storage Management
• Rebalance IO                                            Zero
                           (ASM)
• Move data files
• Rolling upgrade
Diagnostic Patches
                           Online Patching                Zero
Some one-off patches


                                                                 39
     MAA Planned Maintenance Solutions


Activity                            Oracle Solution           Downtime

• System and hardware upgrades      Real Application Clusters
• Operating system upgrades         (RAC)                     Zero
• Qualified one-off patches, CPUs   Oracle Clusterware        (< 1 min)
• CRS upgrades
• System, HW and cluster upgrades
• Migration to ASM, RAC             Data Guard
• Migration to different some
                                    (physical, logical,
  platforms (Windows/Linux and
                                    snapshot standby          < 1 min
  some mixed DG support)
                                    databases)
• Patchset or database upgrade
• Testing for development, Q/A,
  upgrades

                                                                          40
  MAA Planned Maintenance Solutions

Activity                 Oracle Solution   Downtime
• Database upgrades
• Cross platform
  migration
                         Oracle Streams    < 1 minute
• Cross characterset
  migrations
• Application upgrades



• Database upgrades
• Same endianness and    Transportable     dependent on data
  cross endianness       Technologies      file conversion time
  platform migration




                                                                  41
     MAA Planned Maintenance Solutions


Activity                               Oracle Solution       Downtime
• Reorganize and redefine tables
  and its attributes
• Add, delete or change column
  names, types and sizes
• Create, rebuild, coalesce, move
  and analyze indexes
                                       Online Redefinition   secs
• Convert LONG and LONG RAW
  columns to LOB
• Change table without recompilation
• Reorganize single partition,
  advanced queue and clustered
  tables, table containing ADT


                                                                    42
Online Reconfiguration
Scaling on Demand
                •   CPU
                      •    Add/remove CPUs on SMP online
                •   Cluster Nodes
                      •    Add/remove RAC nodes online
                      •    Add/remove instances
                      •    Add/remove listeners
    Database          •    Add/remove services
                      •    No data movement needed
                •   Memory
    Storage           •    Grow and shrink shared memory and
                           buffer cache online
                      •    Auto tuning of memory online
                •   Disk
                      •    Add/remove ASM disks online
                      •    Automatically rebalance


                                                         43
 Online Patching - One-off Patches

• Ability to patch running Oracle executable
    •   No downtime. Online patching is done on the instance level.
    •   No need to do rolling upgrades using RAC / Data Guard
    •   Many one-off patches can be patched online
    •   Great for diagnostic patches
• Supports enabling, disabling, de-installing patches with no downtime
• Integrated with Opatch
    • E.g. determine if a patch can be applied online:
        • opatch query -is_online
• Initially available on Linux (32-bit) and Solaris (64-bit)
• Long term goal is online patching of Critical Patch Updates (CPUs)
• Refer to OpenWorld - MAA Best Practices for Online Patching
    • http://www.oracle.com/technology/deploy/availability/pdf/oracle-openworld-
      2007/s291525_maa_plannedmaint.pdf



                                                                            44
     Rolling Patch Update using CRS and RAC
                                                    Oracle
                                                    Patch
Clients                  Clients            Patch   Upgrades,
          A   B                    A    B
                                        B
 1                         2                        including
                                                    Critical
                                                    Patch
                                                    Updates
Initial RAC Configuration Clients on A, Patch B     (CPUs)


                                                    Operating
          A   B            Patch A
                                 A      B           System
 4                         3                        Upgrades


                                                    Hardware
Upgrade Complete         Clients on B, Patch A      Upgrades

                                                            45
Service Level Impacts
CRS Rolling Patchset




                        SLO




                              46
     SQL Apply – Rolling Database Upgrades
                                                          Upgrade


                  Redo                                              Patch Set
Clients                                                             Upgrades
           A             B       Logs        A            B
                                 Queue

                                                                    Major
      Version X      Version X                X           X+1       Release
 1                                2                                 Upgrades
     Initial SQL Apply Config            Upgrade node B to X+1




                  Redo                             Redo             Cluster
Upgrade                                                             Software &
          A               B                  A             B
                                                                    Hardware
                                                                    Upgrades

          X+1            X+1                  X           X+1
 4 Switchover to B, upgrade A     3   Run in mixed mode to test



                                                                                47
Rolling Database Upgrades
Transient Logical Standby

           • Start rolling database upgrades with
Physical     physical standbys
           • Temporarily convert physical standby to
Logical      logical to perform the upgrade
             • Data type restrictions limited to shorter upgrade
               window
Upgrade    • No need for separate logical standby
           • Also possible in 10.2 (more manual steps)
Physical
                   Leverage existing physical standby databases


                                                              48
 Rolling Database Upgrades
 Streams

• Rolling upgrade with Streams if:
  • Heterogeneous platforms
  • Different charactersets
  • Database rolling upgrade when logical standby is not
    appropriate
  • Application upgrades
  • Use shadow tables and transformations to work around data
    type restrictions




                                                                49
       Streams Rolling Upgrade
       Extended Data Type Support

insert into EMP values (
1001, ‘Smith’, ‘Sales’, 42,
sysdate, 30000, 10, 19);         Source                              Upgraded Target
                                                                     Physical Standby
                                Database                                Database


         EMP                          Capture
                                      Capture                                      EMP




                                                             Apply
                                                             Apply
                                                Propagate
                                                Data Guard
        CUST                  CUST                                        DML
                                                                         Handler   CUST
                              log
                Trigger
                              table

insert into CUST values (123, ‘Acme Corp’,
address_typ(‘123 Any St’, ‘New York’, ‘NY’,
10001));


                                                                                        50
     Online Redefinition
•   All indexing operations can be done online
      • Create new index, move index, defragment index
•   Tables can be Reorganized & Redefined online (DBMS_REDEFINITION)
      • Table contents are copied to a new table
          • Defragments and allows changing location, table type, partitioning
      • Contents can be transformed as they are copied
          • Can change columns, types, sizes - specified using SQL “Select”
•   Updates and Queries can continue uninterrupted



                            Copy                 Transform
             Source
              Table         Table                                   Result
                                                                    Table

                                                                                     GUI
                                     Store
                       Update       Updates                                       interface
Continuous                                                                       to make it
 Queries &            Tracking                          Transform
                                                                                   simple
  Updates                                                Updates


                                                                                       51
 Online Operations &
 Redefinition Improvements

• Fast ‘add column’ with default value
• Invisible indexes speed application migration and testing
• No recompilation of dependent objects when Online Redefinition
  does not logically affect objects
• Support Online Redefinition for tables with Materialized Views
• Enhanced Online DDL execution
  • DDL operations now wait if underlying resource is busy (configured
    through DDL_LOCK_TIMEOUT parameter)
  • Some DDL operations (add/modify constraint, add column, Index
    create/rebuild) only required shared lock




                                                                         52
  Agenda

• Oracle Maximum Availability Architecture   <Insert Picture Here>

• MAA Best Practices – Oracle Database
  • Minimizing Unplanned Outages
  • Minimizing Planned Outages
• Resources




                                                           53
  Resources

• MAA Demonstrations
 http://www.oracle.com/technology/deploy/availability/demonstrations.html

• MAA Best Practices for High Availability – 10gR2
 http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/toc.htm

• MAA Overview – 11gR2 (detailed best practices to follow)
 http://download.oracle.com/docs/cd/B28359_01/server.111/b28281.pdf

• MAA Best Practice White Papers
 www.oracle.com/technology/deploy/availability/htdocs/maa.htm

• Oracle High Availability
 www.oracle.com/ha




                                                                            54
QUESTIONS
 ANSWERS




            55

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:2/15/2013
language:Unknown
pages:55