Database Recovery techniques.ppt

Document Sample
Database Recovery techniques.ppt Powered By Docstoc
					Database Recovery

        Cameron Wood
        Adhip Pokharel
         Bryan Reeves
Database Recovery
       Pre-condition : At any given point in time the database is in a
        consistent state
       Condition : Some kind of system failure occurs
       Post-condition : Restore the database to the consistent state that
        existed before the failure

   Database recovery is the process of restoring the database to the
    most recent consistent state that existed just before the failure
   Example:
     If the system crashes before a fund transfer transaction completes
      its execution, then either one or both accounts may have incorrect
      value. Thus, the database must be restored to the state before the
      transaction modified any of the accounts
Failure Classification
   Types of failures
       The database may become unavailable for use due
           Transaction failure: Transactions may fail because of
            incorrect input, deadlock, incorrect synchronization

           System failure: System may fail because of addressing
            error, application error, operating system fault, RAM
            failure, etc

           Media failure: Disk head crash, power disruption, etc
 Transaction Log
          For recovery from any type of failure data values prior to modification
           (BFIM - BeFore Image) and the new value after modification (AFIM –
           AFter Image) are required

          These values and other information is stored in a sequential file called
           Transaction log. A sample log is given below. Back P and Next P point to
           the previous and next log records of the same transaction

T ID          Back P      Next P      Operation       Data item       BFIM           AFIM

T1              0           1           Begin

T1              1           4            Write            X          X = 100         X = 200

T2              0           8            Begin

T1              2           5             W               Y          Y = 50          Y = 100

T1              4           7             R              M           M = 200         M = 200

T3              0           9             R               N          N = 400         N = 400

T1              5          nil           End
Main Recovery Techniques
1. Deferred Update:

     These techniques do not physically update the
      DB on disk until after a transaction reaches its
      commit point

     The updates are recorded in the local
      transaction buffer and in the log file for

     These techniques need only to redo the
      committed transaction and no-undo is needed in
      case of failure (No-Undo/Redo)
Deferred Update example in a single user
Main Recovery Techniques(Cont)
2. Immediate Update:

     The DB may be updated by some operations of a
      transaction before the transaction reaches its commit point

     The updates are recorded in the log which must contain the
      old values (BFIM) and the new values (AFIM)

     These techniques need to undo the operations of the
      uncommitted transactions and redo the operations of the
      committed transactions (Undo/Redo)

     The Undo/No-Redo may be used in special case where all
      operations are recorded in the DB before the transaction
Main Recovery Techniques(Cont)
3. Shadow Update:
 The modified version of a data item does not
  overwrite its disk copy but is written at a separate
  disk location

4. In-Place Update:
   The disk version of the data item is overwritten by the cache
  Main Recovery Techniques(Cont)
Example of Shadow paging:
Data Caching
    Data items to be modified are first stored into
     database cache by the Cache Manager (CM) and
     after modification they are flushed (written) to the

    The flushing is controlled by Modified and Pin-
     Unpin bits

        Pin-Unpin: Instructs the operating system not to flush
         the data item
        Modified: Indicates the AFIM of the data item
Transaction Roll-back (Undo) and
Roll-Forward (Redo)
    To maintain atomicity, a transaction’s operations
     are redone or undone

        Undo: Restore all BFIMs on to disk (Remove all AFIMs)
        Redo: Restore all AFIMs on to disk

    Database recovery is achieved either by
     performing only Undos or only Redos or by a
     combination of the two. These operations are
     recorded in the log as they happen
Write-Ahead Logging(WAL)
   When in-place update (immediate or deferred) is used
    then log is necessary for recovery and it must be available
    to recovery manager. This is achieved by Write-Ahead
    Logging (WAL) protocol. WAL states that:

       For Undo: Before a data item’s AFIM is flushed to the
        database disk (overwriting the BFIM) its BFIM must be
        written to the log and the log must be saved on a stable
        store (log disk)

       For Redo: Before a transaction executes its commit
        operation, all its AFIMs must be written to the log and the
        log must be saved on a stable store
Steal/No-Steal & Force/No-Force
Steal/No-Steal and Force/No-Force
      Possible ways for flushing database cache to database

       1. Steal: Cache can be flushed before transaction commits
       2. No-Steal: Cache cannot be flushed before transaction
       3. Force: Cache is immediately flushed (forced) to disk
       4. No-Force: Cache is deferred until transaction commits

      These give rise to four different ways for handling
          Steal/No-Force (Undo/Redo)
          Steal/Force (Undo/No-redo)
          No-Steal/No-Force (Redo/No-undo)
          No-Steal/Force (No-undo/No-redo)
   Time to time (randomly or under some criteria) the
    database flushes its buffer to database disk to minimize
    the task of recovery. The following steps defines a
    checkpoint operation:

     1.   Suspend execution of transactions temporarily
     2.   Force write modified buffer data to disk
     3.   Write a [checkpoint] record to the log, save the log to disk.
     4.   Resume normal transaction execution
    During recovery redo or undo is required to transactions
     appearing after [checkpoint] record
Recovery Techniques Based on
Deferred Update(No Undo/Redo)
   The idea is to postpone any actual updates to the DB
    until the transaction completes its execution and
    commits (it follows the No-Steal approach)
   The updates are recorded only in the log and in the
   After transaction reaches its commit point and the log
    is force-written to disk, the updates are recorded in
    the disk
   If the transaction fails before commit, no need to
    undo any operations
Recovery Techniques Based on
Immediate Update(Undo/No-Redo)
   In this algorithm AFIMs of a transaction are
    flushed to the database disk under WAL
    before it commits

       For this reason the recovery manager undoes all
        transactions during recovery

       No transaction is redone

       It is possible that a transaction might have
        completed execution and ready to commit but this
        transaction is also undone
ARIES Recovery Scheme
   The ARIES Recovery Algorithm is based on:
       WAL (Write Ahead Logging)

       Repeating history during redo:
           ARIES will retrace all actions of the database system
            prior to the crash to reconstruct the database state
            when the crash occurred

       Logging changes during undo:
           It will prevent ARIES from repeating the completed
            undo operations if a failure occurs during recovery,
            which causes a restart of the recovery process
ARIES Recovery Scheme(Cont)
   The ARIES recovery algorithm consists of three steps:

    1. Analysis: step identifies the dirty (updated) pages in the
       buffer and the set of transactions active at the time of
       crash. The appropriate point in the log where redo is to
       start is also determined

    2. Redo: necessary redo operations are applied

    3. Undo: log is scanned backwards and the operations of
       transactions active at the time of crash are undone in
       reverse order
Recovery in Multidatabase Systems
   A multidatabase system is a special distributed database system
    where one node may be running relational database system under
    UNIX, another may be running object-oriented system under
    Windows and so on

   A transaction may run in a distributed fashion at multiple nodes.
   In this execution scenario the transaction commits only when all these
    multiple nodes agree to commit individually the part of the transaction
    they were executing

   This commit scheme is referred to as “two-phase commit” (2PC)
     If any one of these nodes fails or cannot commit the part of the
       transaction, then the transaction is aborted

   Each node recovers the transaction under its own recovery protocol
Microsoft SQL Server
Data Availability Continuum
     Degrees of protection for information systems:

                        Business Risk      Solution
         Data           Data loss          Redundant data
         High           Downtime of        Redundant
         Availability   database service   system

    .
Database Backups
   Traditional backup types
     Full backup

     Differential backup

     Transaction log backup

   Disk is better than tape
     First backup to disk (separate physical disk volume)

     Detect exceptions encountered during backup

     Verify backup files

     Copy backup files to tape or remote disk

   Data retention policy for backup files
Full Backup
1. Dynamic Full Backup
     Captures the state of the DB at the time
      the backup is started
     System records any activity that takes
      place during the backup
     Even uncommited transactions in the
      transaction log are written to the backup
Full Backup Continued
1.   Static Backup

        Captures the state of the DB at the time
         the backup is started
        Only Active session (SPID) is the one
         used to create the backup
        DB cannot be modified during this time
        Disadvantage is Performance
Differential Backup
 A full database backup must be done
 Creates a copy of only the parts of
  the database that have changed
  since the last full database backup
 Advantage is speed
Transaction Log Backup
   Considers only the changes recorded in the
   Not based on physical pages of the
    database, but rather on logical
    operations…i.e. INSERT, UPDATE, and
   Used with a full database backup to restore
    to a particular point in time
   Make frequent backups of the t-logs to regain
    valuable disk space
File or Filegroup Backup
 Allows you to back up specific
  database files (or filegroups) instead
  of the entire database
 Advantage is speed

 Recommended only when the
  database to be backed up is very
Database Recovery
   Automatic Recovery
        Occurs when I/O, memory, or system errors occur in
         the middle of a query
   Manual Recovery
        Most commonly used when user error occurs
        Allows for Dynamic (Online) restore, which keeps the
         availability of the DB while the restore is
         occurring…only the data that is being restored is
Code Sample of Manual
SQL SERVER Database Recovery


FULL Recovery Model
   Provides complete protection against media
   Allows recovery to any point in time.
   Allows recovery to any log mark... i.e. a
    specific transaction
   Restores all indexes to the DB
   Disadvantage is disk space and time of
Bulk-Logged Recovery Model
 Supports log backups by using
  minimal space in the t-log for certain
  large-scale or bulk operations
 Allows recovery to the end of a t-log
  (last committed transaction)
 Advantage is speed of recovery
SIMPLE Recovery Model
   T-log is truncated whenever a checkpoint
   Recover only from a full DB backup or a
    differential DB backup
   Strategy is simple, recover the DB using
    existing backups and, if differential backups
    exist, apply the most recent one
   Advantages are Performance gain for bulk
    operations and minimal log space required
High Availability

 FailoverClustering
 Using RAID technology.

 Database mirroring

 Log Shipping
High Availability
   Minimize or avoid service downtime
       Whether planned or unplanned
   When components fail,
    service interruption is brief or non-existent
       Automatic failover
   Eliminate single points of failure (as
       Redundant components
       Fault-tolerant servers
SQL Server Failover Clustering
   Most important technology
    with the concept of High
    Availability                     node A                    node B
   OS and DBMS work together
    to provide availability in the
    event of failures
                                              Shared Storage
   Consists of redundant                     · system DBs
    servers, called nodes that                · user DBs
    share an external disk drive              · quorum
   Protects against hardware
RAID Technology
                     RAID 0 – Disk
                     Fastest RAID
                     Disadvantage is no
                      fault tolerance. If
                      one disk fails, none
                      of the data is
   Also known as
   Protects data by
    design by storing
    copy of data on
    another disk
   Slower than RAID 0,
    but more reliable
   Also known as PARITY
   Implemented by calculating
    recovery information about
    data written to disk and writing
    this information to another disk
   Advantage is you only need
    one additional disk to protect
    any number of existing disk
   Disadvantage is
    performance…Additional disk
    I/O operations are required
   Minimum 4 disks.
   This is also called as
    “stripe of mirrors”
   Excellent redundancy
    ( as blocks are
    mirrored )
   Excellent
    performance ( as
    blocks are striped )
Database Mirroring
   Uses two servers, the principal server and
    the mirrored server
   Allows continuous streaming of the t-log from
    the principal server to the mirrored server.
   Unlike failover clustering, mirrored server is
    fully cached and ready to accept workloads
    because of its synchronized state
   It is possible to have up to four mirrored
    backup sets
Log Shipping
   Allows the transaction logs from one
    database to be constantly sent and used by
    another database
   Great way to keep DR read only servers
    synchronized with Production servers,
    because every transaction is shipped to the
    backup server
   Does not support automatic failover, backup
    server must be manually brought on line
Data Recovery Requirements
Oracle Database Recovery
   Backup

   Restore

   Recover
Backing up the Oracle DB
   Backup - A copy of data that can include
    control files and data files, also acts as a
    safeguard against unexpected data loss and
    application error
   If the original data is lost, a backup allows you
    to reconstruct the lost data
   Two types – physical backup and logical
Physical Backup
   Is a copy of the physical Database files.
   Done with either the RMAN or operating
    system utilities
   Done by maintaining a backup of those files
    which make up the physical structure of the
   Can be taken by using operating system
    commands or by using RMAN
   Most reliable technique
Logical Backup
   Extracts logical definitions and data from the
    database to a file
   Contain logical data extracted with the Oracle
    Export utility and stored in a binary file.
   Logical backup means to backup your logical
   You can use logical backups to supplement
    physical backups
On line DB backup code/steps
 Switch to backup mode:
! cp xyzFile1 /backupDir/
  Copy files to secondary storage
 Backup control files:

   Involves copying backup files from secondary
    storage to disk
   This can be done to replace damaged files or
    to copy/move a database to a new location.
   If you manage only Oracle databases of
    release 8.0 or higher, then RMAN is an
    appropriate choice for restoring your DB
   If you manage some Oracle7 releases then
    you must use a non-RMAN method
   Is the process of applying redo logs to the
    database to roll it forward
   You can roll-forward until a specific point-in-
    time (before the disaster occurred).
   Or roll-forward until the last transaction
    recorded in the log files
   The reconstructing of data is achieved
    through media recovery, which refers to the
    various operations involved in restoring,
    rolling forward, and rolling back a backup of
    database files
Recovery Code
   SQL> connect SYS as SYSDBA
    '2001-03-06:16:00:00' USING BACKUP
   RMAN> run {
     set until time to_date('04-Aug-2004
    00:00:00', 'DD-MON-YYYY HH24:MI:SS');
     restore database;
     recover database;
   }
   Stands for “Recovery Manager”, and it is a
    utility provided by Oracle for backing-up,
    restoring and recovering Oracle Databases.
   RMAN is a utility automatically installed with
    the database that can back up any Oracle8 or
    later database
   Most of oracle backup and recovery
    techniques are designed around RMAN
   RMAN can do off-line and on-line database
   The RMAN executable is located in your
    ORACLE_HOME/bin directory
   RMAN has its own syntax and is accessible
    either through a command-line interface or
    though the Oracle Enterprise Manager GUI
   It performs the record keeping of backups,
    archived logs, and its operations using the
    metadata which is stored in the control file, so
    restore and recovery is simplified
   RMAN uses server sessions on the database
    to perform the work of backup and recovery
   RMAN will re-read database blocks until it
    gets a consistent image of it
   RMAN comes with an API that allows it to
    function with a third-party media manager

   It cannot, however, write directly to tape

   The biggest advantage of RMAN is that it
    only backup used space in the database, this
    saves time and money
RMAN command line
   Argument Value                   Description
   ------------------------------------------------------------
   target         quoted-string connect-string for target database
   catalog        quoted-string connect-string for recovery catalog
   nocatalog         none        if specified, then no recovery catalog
   cmdfile        quoted-string name of input command file
   log           quoted-string name of output message log file
   trace         quoted-string name of output debugging message log file
   append            none       if specified, log is opened in append mode
   debug          optional-args activate debugging
   msgno             none        show RMAN-nnnn prefix for all messages
   send          quoted-string send a command to the media manager
   pipe             string     building block for pipe names
   timeout         integer     number of seconds to wait for pipe input
RMAN backup/restore DB
   rman target sys/*** nocatalog
   run {
     allocate channel t1 type disk;
     backup
       format
         (database);
      release channel t1;
   }
User Managed Recovery
   Also called “operating system commands”
   Any strategy in which Recovery Manager
    (RMAN) is not used as the principal backup
    and recovery tool
    Can be either logical or physical
   Strategy is to make periodic backups of data
    files and archived logs with operating system
User Mang. Recovery Steps
   The basic user-managed procedure for
    recovering from a media failure is as follows:
       Restore database file backups with operating
        system commands
       Recover restored data files with the SQL*Plus
        RECOVER statement
       If the database is closed, then open it for normal
        use; if it is open, then bring the recovered
        tablespaces back online
   Used to recover data lost due to user errors
   Data can be recovered when database is in
    open stage
   Flashback recovery is considered an
    alternate to Point-in-time recovery
   Flashback feature can be divided into two
    categories: Logical Level Features and
    Physical Level Features
   Recovery using these features depends on
    retention period
Retrieve/update for Flashback
   This retrieves the record
    TO_TIMESTAMP('2006-04-04 09:30:00', 'YYYY-MM-
    DD HH:MI:SS')
    WHERE name = 'JON';
   This updates and restores the record
   SQL> INSERT INTO employee
      TO_TIMESTAMP('2006-04-04 09:30:00', 'YYYY-MM-
    DD HH:MI:SS')
      WHERE name = 'JON');
Flashback query function
     l_scn     NUMBER;
     l_timestamp TIMESTAMP;
     l_scn    :=
     l_timestamp := SCN_TO_TIMESTAMP(l_scn);
   END;
   /
Restore Points
   Defined with an SCN number.
   Database can be restored to an available
    restore point
   Restore points can also be used in
    conjunction with Flashback Database
   Supported by Oracle Database 10g
Crash / Instance Recovery
   Crash recovery is the recovery of a DB in a
    single-instance configuration
   Instance recovery is the recovery of one
    failed instance by a live instance in an Oracle
    Real Application Clusters configuration
   Oracle performs crash recovery and instance
    recovery automatically after an instance
Crash/Instance Recovery Operations
   Crash and instance recovery recover a
    database to its transaction-consistent state
    just before instance failure
   Crash and instance recovery involve two
    distinct operations: rolling forward the current
    online data files by applying both committed
    and uncommitted transactions contained in
    online redo records, then rolling back
    changes made in uncommitted transactions
    to their original state
Media Failure
   An error can occur when trying to write or
    read a file on disk that is required to operate
    an Oracle database, its called media failure
   This is because there is a physical problem
    reading or writing to files on the storage
   Its the primary concern of a backup and
    recovery strategy, because it typically
    requires restoring some or all database files
    and the application of redo during recovery
Process Failure
   Is a failure in a user, server, or background
    process of a database instance
   The Oracle background process PMON
    detects aborted Oracle processes
   If the aborted process is a user or server
    process, PMON resolves the failure by rolling
    back the current transaction of the aborted
    process and releasing any resources that this
    process was using
   Recovery of the failed user or server process
    is automatic
Automatic Archiving
   Frees you from having to keep track of, and
    archive, filled groups manually
   You can interactively start or stop automatic
    archiving at any time
   The archiver always archives groups in order
   For most DB systems, automatic archiving is
    best cause you don’t have to watch for a
    group to become inactive and available for
Manual Archive
   If a DB runs in ARCHIVELOG mode, then
    you can manually archive the filled groups of
    inactive online redo log files, whether or not
    automatic archiving is enabled or disabled
   If automatic archiving is disabled, then you
    must manually archive filled groups
   If manual archiving is not performed fast
    enough, then database operation can be
    suspended temporarily
ARCHIVELOG mode code
   The following parameters needs to be set for databases
    in ARCHIVELOG mode:
   log_archive_start     = TRUE
   log_archive_dest_1      = 'LOCATION=/arch_dir_name'
   log_archive_dest_state_1 = ENABLE
   log_archive_format     = %d_%t_%s.arc
1.   Database Recovery Techniques
A)   Microsoft SQL Server
B)   Oracle Database Recovery

           ANY QUESTIONS???????

               THANK YOU!!!!!!

Shared By: