4

Document Sample
4 Powered By Docstoc
					 CO532 Database Systems

                  Lecture 12

             Reliability & Recovery


                     CO532 Lecture 12
2010-03-02                              1
                           Systems DO fail!


n    Reliability is achieved by
       n     Minimising
              n   frequency of failure
              n   side-effects
       n     Maximising
              n   ability to recover




                                       CO532 Lecture 12
2010-03-02                                                2
                               Reliability
n    Primary reliability – design, implementation
       n     Avoid single point of failure
              n   Hardware replication
              n   Backup power sources
       n     Secure, air-conditioned machine rooms
       n     Carefully designed operating procedures

n    Secondary reliability – operating procedures
       n     Rigorously implemented operating procedures
       n     Access control
       n     Reliable (tested!) backup and recovery
             mechanisms
                                 CO532 Lecture 12
2010-03-02                                                 3
      Types of Database System Failures

 n     Transaction failure, e.g. deadlock
         n   handled by transaction rollback and restart

 n     System failure, e.g. power failure
         n   system should recover to stable (consistent) state on re-start

 n     Media failure, e.g. disk failure
         n   loss of stable storage, potentially very serious
         n   mitigated by some RAID strategies

 n     Communications failure, e.g. network losses
         n   must be handled in distributed systems


                                  CO532 Lecture 12
2010-03-02                                                               4
                   Recovery Mechanisms


n    Recovery mechanism is a normal part of a
     DBMS
n    Recovery mechanism should:
       n     Guarantee consistency of physical storage
       n     Ensure the database is in a consistent state
       n     Ensure updates are written to stable storage
       n     Have minimal impact on normal processing



                              CO532 Lecture 12
2010-03-02                                                  5
                   Recovery Mechanisms

n    For performance reasons, a typical DBMS
     installation uses >3 disk units (may each be
     multiple disk units):
       n     System and DBMS software
       n     Live data
       n     Transaction logs

n    This is also beneficial for recovery
       n     With regular backup of consistent database, the
             system can survive single disk unit failures


                              CO532 Lecture 12
2010-03-02                                                     6
                       Log-based Recovery
n    Processing model (assumes 2PL):
       n     Pages needed to process transactions are read
             from disk into memory buffers
       n     Buffer pages are updated
       n     Record of updates written to log
       n     On commit, log records flushed to disk before
              releasing locks

n    Buffer pool:
              n   Controlled by buffer manager
              n   Pages fetched from disk as required
              n   Pages freed when no longer required (LRU)
              n   Dirty (modified) pages flushed to disk
                                  CO532 Lecture 12
2010-03-02                                                    7
                        Log-based Recovery

n    Log files typically include:

       n     Before and/or after-images of pages
              n   tagged with transaction ID
              n   often just a differential record


       n     Transaction start, commit, abort records

       n     System checkpoint
              n   record of a consistent database state
                                     CO532 Lecture 12
2010-03-02                                                8
             Log-based Recovery Procedures

n    Normal recovery (after normal shutdown)
       n     start from last log record (a checkpoint written at
             shutdown)
n    Warm recovery (after system failure)
       n     revert to last checkpoint in log
       n     apply committed transactions
       n     remove effects of un-committed transactions
n    Cold recovery (after media failure)
       n     restore from backup/dump
       n     apply log records from last checkpoint in dump to
             reach latest consistent state
                               CO532 Lecture 12
2010-03-02                                                         9
             Transaction States at Failure

At failure, transactions are in one of five states:
       a.    committed before last checkpoint
       b.    started before checkpoint, committed before failure
       c.    started before checkpoint, in progress at failure
       d.    started after checkpoint, committed before failure
       e.    started after checkpoint, in progress at failure




                            CO532 Lecture 12
2010-03-02                                                   10
                 Transaction States at Failure


At failure, transactions in one of five states:
         a.

         b.

         c.

         d.

         e   .

                                                           time
                 checkpoint                      failure


                              CO532 Lecture 12
2010-03-02                                                        11
                   Recovery Algorithms


n      Undo/Redo algorithm
       n     Widely used
       n     Allows buffer manager to flush dirty pages
             before, during or after commit
       n     Optimal for normal processing
       n     but, at the expense of abort and recovery



                            CO532 Lecture 12
2010-03-02                                                12
             Undo/Redo Algorithm

1. Create two empty lists: UNDO and REDO
2. Start reading log at last checkpoint
3. Put all active transactions on UNDO list
4. Read log forwards:
       if BEGIN-TRANSACTION found
                add transaction to UNDO list
       if COMMIT found
                remove transaction from UNDO list
                add transaction to REDO list
5. At end,
       Use before-images to undo all on UNDO list
       Use after-images to redo all updates on REDO list

                      CO532 Lecture 12
2010-03-02                                            13
                     Undo/Redo algorithm

       1. Create two empty lists: UNDO and REDO
       2. Start reading log at last checkpoint


             a.

             b.

             c.

             d.

             e.
                                                            time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         14
                         Undo/Redo algorithm

       3. Put all active transactions on UNDO list


             a.

             b.

             c.

             d.

             e.                                             time


                  checkpoint                      failure




                               CO532 Lecture 12
2010-03-02                                                         15
                     Undo/Redo algorithm

       4. Read log forwards:
           BEGIN-TRANSACTION – add to UNDO
           COMMIT – remove from UNDO, add to REDO

             a.

             b.

             c.

             d.

             e.
                                                            time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         16
                     Undo/Redo algorithm
       4. Read log forwards:
           BEGIN-TRANSACTION – add to UNDO
           COMMIT – remove from UNDO, add to REDO

             a.

             b.

             c.

             d.

             e.
                                                            time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         17
                     Undo/Redo algorithm
       4. Read log forwards:
           BEGIN-TRANSACTION – add to UNDO
           COMMIT – remove from UNDO, add to REDO

             a.

             b.

             c.

             d.

             e.
                                                            time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         18
                     Undo/Redo algorithm
       4. Read log forwards:
           BEGIN-TRANSACTION – add to UNDO
           COMMIT – remove from UNDO, add to REDO

             a.

             b.

             c.

             d.

             e.                                             time



                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         19
                     Undo/Redo algorithm

       5. At end,
           use before-images to undo all on UNDO list

             a.

             b.

             c.

             d.

             e.
                                                            time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                         20
                     Undo/Redo algorithm

       5. At end,
           use after-images to redo all updates on REDO list

             a.
                                                            then restart normal
             b.                                                 processing

             c.

             d.

             e.
                                                                           time
                  checkpoint                      failure

                               CO532 Lecture 12
2010-03-02                                                                        21

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:10/6/2013
language:Latin
pages:21