Dependable Computing Systems by absences


									                     Resource Managers
                                        Jim Gray
                            Microsoft, Gray @

                                 Andreas Reuter
                 International University,
                  Mon                 Tue           Wed            Thur             Fri

9:00           Overview            TP mons          Log        Files &Buffers      B-tree

11:00            Faults           Lock Theory     ResMgr          COM+          Access Paths

1:30           Tolerance          Lock Techniq   CICS & Inet      Corba          Groupware

3:30           T Models             Queues        Adv TM        Replication     Benchmark

7:00              Party            Workflow      Cyberbrick        Party

Gray & Reuter: Resource Manager                                                             1
              Whirlwind Tour: The Actors
    Resource managers
         –   provide ACID objects (transactional objects)
         –   Use log manager to record changes
         –   Use transaction manager to coordinate multi-RM changes
         –   Use communication manager to make transactional RPCs
                                        Communication   Communication
                                        Manager         Manager
               Resource                                                                    Resource
               Managers                   Transaction    Transaction                       Managers
                                          Manager        Manager

                                           Log            Log
               Objects                                                                Objects
                                           Manager        Manager

                     Volatile Storage                                   Volatile Storage

       Durable Storage                                       Log                           Durable Storage

Gray & Reuter: Resource Manager                                                                              2
       Whirlwind Tour: the Application Verbs
   TRID            Begin_Work(context *);        /* begin a transaction                */
   Boolean         Commit_Work(context *);       /* commit the transaction             */
   void            Abort_Work(void);             /* rollback to savepoint zero         */

   savepoint       Save_Work(context *);         /* establish a savepoint             */
   savepoint       Rollback_Work(savepoint);     /*return to savept (savept 0 = abort)*/
   Boolean         Prepare_Work(context *);      /* put transaction in prepared state */
   context         Read_Context(void);           /* return current savepoint context */
   TRID            Chain_Work(context *);        /* end current and start next trans */

   TRID            My_Trid(void);                /* return current transaction identifier*/
   TRID            Leave_Transaction(void);      /*set process trid null, return current
   Boolean         Resume_Transaction(TRID); /* set process trid to desired trid       */

   tran_status Status_Transaction(TRID);         /* transaction identifier status      */

Gray & Reuter: Resource Manager                                                               3
                       Whirlwind Tour
                Types Of Transaction Executions

 A Simple              A Simple        A Partial       A Persistent Transaction
 Commit                 Abort          Rollback        Surviving A System Restart
  Begin                 Begin         Begin                     Begin
  Action                Action        Action                    Action
  Action                Action        Action                    Action
  Save                  Save          Save                      Save Persistent
  Action                Action        Action       Action       Action
  Save                  Save          Save         Action       Save
  Action                Action        Action       Action       Action
  Action                Action        Action       Save        Restart
  Action                Action        Action       Action                    Action
  Save                  Save          Save         Commit                    Save
  Action                Action        Action                                 Action
  Action                Rollback      Rollback                               Commit

                      Shaded stuff is “undone”

Gray & Reuter: Resource Manager                                                       4
                Whirlwind Tour: the TRID Flow
   Call graph: who calls whom.
   TRIDs flow on all such calls.
   Application is typically root.
   RM can be an application (use a transactional RM to store state)

         Transaction              Application                  Application
                                   Servers                      Servers


                                    Resource                    Resource
                                    Managers                    Managers
Gray & Reuter: Resource Manager                                              5
 Whirlwind tour Normal (no failure) Transaction
  TM generates the TRID at Begin_Work().
  Coordinates Commit,
  RM joins work, generates log records, allows commit

    Begin_ Work()                 trans id                                              Trans action
                                             Reso urce                                  Manager
                         Work Req uests                     Lo ck Req uests    Lo ck
                                             Manager                          Manager
   Application          Work Req uests                        Join_ Work
                                              No rmal                          Lo g
                                                            Lo g Reco rd s
                                             Fu ncito ns                      Manager
  Co mmit_Wo rk ()

                                                                 Co mmit P hase 1?
                                             Trans action                               Write Commit
                                              Callb acks                                Lo g Reco rd &
                                              Fu nctio ns        Co mmit P hase 2       Fo rce L og

Gray & Reuter: Resource Manager                                                                          6
             WW tour: The Resource Manger view

  Transaction                     rmCall(...)           response             TP monitor
   Manager                                                             administrative functions
       Identify                                                        and callbacks to install, start, and
      SaveWork                                                         schedule a resource manager
         Join               resource manager's own service interface
  StatusTransaction           functions                   invocation      rmCall(...)
        Leave                               Resource
       Resume              transaction      Manager
        Save               management
       Prepare                                            managers
       Commit                                                             (depends on application)
                             callbacks                     callbacks

Gray & Reuter: Resource Manager                                                                               7
            WW tour: The Resource manager view
  BooleanSavepoint(LSN *);        /* invoked at tran Save_Work(). Returns RM vote */
  BooleanPrepare(LSN *);          /* invoked at phase_1. Return vote on commit     */
  void   Commit();                /* called at commit ¯2                           */
  void   Abort();                 /* called at failed commit ¯2 or abort           */

  void   UNDO(LSN);          /* Undo the log record with this LSN                  */
  void   REDO(LSN);          /* Redo the log record with this LSN                  */
  BooleanUNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint               */
  void   REDO_Savepoint(LSN);/* Redo a savepoint.                                  */

  void   TM_Startup(LSN);      /* TM restarting. Passes RM ckpt LSN            */
  LSN    Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN,
                               set low water LSN                               */
  Boolean Join_Work(RMID, TRID); /* Become part of a transaction               */

Gray & Reuter: Resource Manager                                                     8
        WW Tour: The Transaction Manager
  Transaction rollback.
     coordinates transaction rollback to a savepoint or abort
     rollbacks can be initiated by any participant.
  Resource manager restart.
     If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log
  System restart.
     TM drives local RM recovery (like RM restart)
     TM resolves any in-doubt distributed transactions
   Media recovery.
     TM helps RM reconstruct damaged objects by providing
     archive copies of object + the log of object since archived.
  Node restart.
     Transaction commit among independent TMs when a TM fails.
Gray & Reuter: Resource Manager                                                      9
      WW Tour: When a Transaction Aborts

Begin_ Work()
                             trans id                                               Trans action
                                         Reso urce                                  Manager
                       Work Req uests                  Lo ck Req uests Lo ck
                                         Manager        Join_ Work Manager
  Application         Work Req uests     No rmal
                                                       Lo g Reco rd s    Lo g
                                        Fu ncito ns
Ro llback_Wo rk ()
                                                                                   Read Tran saction 's
                                        Trans action                               Lo g Reco rd s &
                                         Callb acks           Un do (log record)         Call U ndo
                                                                                   Write Ab ort Record
                                                             Ab orted(trans id )   in Lo g

    At transaction rollback
            TM drives undo of each RM joined to the transaction
    Can be to savepoint 0 (abort) or partial rollback.

Gray & Reuter: Resource Manager                                                                      10
              WW tour: the Transaction Manager
                    at Restart/Recovery
                                   Find C heckpoint                             Manager
   Log              Log Records    Read log forward   Redo (log record)
  Manager                          Redo each op        Redo (log record)
                                                        Redo (log record)
                                                         Redo (log record)
                                   At end,                 Redo (log record)
                                                             Redo(log record)
                                   Undo Sof  t
                                   Savepoints &
                                                       Undo (log record)
                                   Transactions         Undo (log record)
                     Log Records                         Undo(log record)
            At restart, TM reading the log drives RM recovery.
            Single log scan.
            Single resolver of transactions.
            Multiple logs possible, but more complex/more work.

Gray & Reuter: Resource Manager                                                            11
                     End of Whirl-Wind Tour

Gray & Reuter: Resource Manager               12
                      Resource Manager Concepts:
                         Undo Redo Protocol
                                    DO-UNDO- REDO Protoc ol
   Old St at e                    New State      New State
                                                                     Old St at e
                                  log record     log record

   Old St at e
                                  New State

    log record

Gray & Reuter: Resource Manager                                                    13
                     Resource Manager Concepts:
                     Transaction UNDO Protocol
   declare cursor for transaction_log
           select rmid, lsn                          /* a cursor on the transaction's log    */
           from      log                             /* it returns the resource manager name */
           where trid = :trid                        /* and record id (log sequence number) */
           descending lsn;                           /* and returns records in LIFO order    */
   void transaction_undo(TRID trid)                  /* Undo the specified transaction.      */
     { int           sqlcode;                        /* event variables set by sql           */
        open cursor transaction_log;                 /* open an sql cursor on the trans log */
        while (TRUE)                                 /* scan trans log backwards & undo each*/
           {                                         /* fetch the next most recent log rec   */
           fetch transaction_log into :rmid, :lsn;   /*                                      */
           if (sqlcode != 0) break;                  /* if no more, trans is undone, end loop*/
        rmid.undo(lsn);                              /* tell RM to undo that record          */
        }                                            /* tell RM to undo that record          */
     close cursor transaction_log;                   /* Undo scan is complete, close cursor */
     };                                              /* return to caller                     */

 • If UNDO to savepoint , the UNDO stops at desired savepoint
Gray & Reuter: Resource Manager                                                                   14
                      Resource Manager Concepts:
                        Restart REDO Protocol
  void log_redo(void)                        /*                                         */
     {declare cursor for the_log             /* declare cursor from log start forward   */
         select rmid, lsn                    /* gets RM id and log record id (lsn)      */
         from      log                       /* of all log records.                     */
         ascending lsn;                      /* in FIFO order                           */
     open cursor the_log;                    /* open an sql cursor on the log table     */
     while (TRUE)                            /* Scan log forward& redo each record.     */
         { fetch the_log into :rmid, :lsn;   /* fetch the next log record               */
         if (sqlcode != 0) break;            /* if no more, then all redone, end loop   */
         rmid.redo(lsn);}                    /* tell RM to redo that record             */
     close cursor the_log;                   /* Redo scan complete, close cursor        */
     };                                      /* return to caller                        */

    Note: REDO forwards, UNDO backwards

Gray & Reuter: Resource Manager                                                              15
                         Old State


                          log record
                          New State


                           log record

  F(F(X)) == F(X): Needed in case restart fails (and restarts)
  Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state
  Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state
Gray & Reuter: Resource Manager                                      16
       Testable State: Can Tell If It Happened.
       IF operation not idempotent AND state not testable
          THEN recovery is impossible
       ELSE for F in {UNDO, REDO}:
          not testable: WHILE (! ACK) F(F(X))
          testable: WHILE ( not desired state) {F(x)}

                                                      Old State

                          State          test
                                                       New State

Gray & Reuter: Resource Manager                                    17
          Real Operations: Can Not Be Undone
    Defer operations until commit is assured.
    Perform as part of Phase 2 of commit
    If must undo for some reason,
            generate compensation log record
            to be processed by some higher authority.
 Old St at e                  Old St at e      Old St at e           Old St at e
                  DO                                          UNDO

                                  log record     log record          Compensation log record

Old St at e                  New State          Old St at e          New State
                Commit                                        REDO
  log record                                     log record

Gray & Reuter: Resource Manager                                                                18
     Example: Communications Session RM
                        Ses sion And Mess age Recovery Actions
                                    Se nder                       Recei ver

       DO                 log message & seqno             est ablish savepoint.
                          send                            log message & seqno
       UNDO               send cancellat ion              log cancellation message
                           (generat es log record)        ret urn t o savepoint
       REDO               resend message                  if not duplicat e
                                                             <normal DO processing>
                                                          else just acknowledge.
       CO MMIT           send any deferred (real)
                            are idempotent (sequence numbers)it
                           and testable (sequence numbers)

Gray & Reuter: Resource Manager                                                       19
                                  Kinds of Logging
         Keep old and new value of container (page, file,...)
         Pro: Simple
               Allows recovery of physical object (e.g. broken page)
         Con: Generates LOTS of log data

         Keep call params such that you can compute F(x), F (x)
         Pro: Sounds simple
                       Compact log.
         Con: Doesn't work (wrong failure model).
                       Operations do not fail cleanly.

Gray & Reuter: Resource Manager                                        20
                  Sample Physical LOG RECORD
    struct compressed_log_record_for_page_update /*                                  */
            { int opcode;                  /* opcode will say compressed page update*/
            filename fname;                /* name of file that was updated          */
            long     pageno;               /* page that was updated                  */
            long     offset;               /* offset within page that was updated    */
            long     length;               /* length of field that was updated       */
            char     old_value[length];    /* old value of field                     */
            char     new_value[length];    /* new value of field                     */
            };                             /*                                        */
                      Ordinary sequential insert is OK.
                      Update of sorted (B-tree) page:
                              update LSN
                              update page space map
                              update pointer to record
                              insert record at correct spot (move 1/2 the others)
                      Essentially writes whole page (old and new).
                      16KB log records for 100-byte updates.

Gray & Reuter: Resource Manager                                                           21
                  Sample Physical LOG RECORD

   struct logical_log_record_for_insert   /*                                     */
            { int opcode;                 /* opcode will says insert             */
            filename fname;               /* name of file that was updated       */
            long     length;              /* length of record that was updated   */
            char     record[length];      /* value record                        */
            };                            /*                                     */
   Very compact.
   Implies page update(s) for record (may be many pages long).
   Implies index updates (many be many indices on base table)

Gray & Reuter: Resource Manager                                                       22
                 The trouble with Logical Logging
  Logical logging needs to start UNDO/REDO with an action-consistent state.
  No half completed operations.
  for example: insert (table, record)
           ALL or NONE of the indices should be updated
           when logical UNDO/REDO is invoked.
     Failure model is Page & Message action consistency
           (Lampson /Sturgis model of Chapter 3).
     Actions can fail due to:
           Logic: e.g. duplicate key.
           Limit: ran out of space
           Contention: deadlock
           Media: broken page or session
           System: computer failure/restart

Gray & Reuter: Resource Manager                                               23
     Making Logical Logging Work: Shadows
   Keep old copy of each page
   Reset page to old copy at abort (no undo log)
   Discard old copy at commit.
   Handles all online failures due to:
         Logic: e.g. duplicate key.
         Limit: ran out of space
         Contention: deadlock
   Problem: forces page locking, only one updater per page.
   What about restart?
   Need to atomically write out all changed pages.
Gray & Reuter: Resource Manager                               24
     Making Logical Logging Work: Shadows
  Perform same shadow trick at disc level.
  Keep shadow copy of old pages.
  Write out new pages.
  In one careful write, write out new page root.
  Makes update atomic

                                         A Sh adow Update
                                  Free Space                        Free Space
  Old           Direct ory        Bit Map              Direct ory   Bit Map

  Data         A     B     C                          A        C    B

Gray & Reuter: Resource Manager                                                  25
      Pro: Simple
           Not such a bad deal with non-volatile ram
      Con: page locking
           extra space
           extra overhead (for page maps)
           extra IO
           declusters sequential data

Gray & Reuter: Resource Manager                        26
            Compromise Physio-Logical Logging
   Physio-Logical Logging
    Physical to a "page" (physical container)
    Logical within a "page".

   Keep old and new value of container (page, file,...)
    Pro: Simple
        Allows recovery of physical object (e.g. broken page)
    Con: Generates LOTS of log data

Gray & Reuter: Resource Manager                             27
                Logical vs Physio-logical Logging
  Insert record r int o table A

             Table A                                   Table A

                                  Index B                             Index B

                                  Index C                             Index C

      Logical log record                          Physiological log records
          insert, A, r                                insert, A, page 508, r
                                                      insert, B , page 72, s
                                                      insert, C , page 94, t

           Note: physical log records would be bigger for sorted pages.
Gray & Reuter: Resource Manager                                                 28
                      Physiological Logging Rules
             Complex operations are a sequence of simple operations on pages and

             Each operation is constructed as a mini-transaction:
               lock the object in exclusive mode
               transform the object
               generate an UNDO-REDO log record
               record log LSN in object
               unlock the object.

             Action Consistent Object:
               When object semaphore free, no ops in progress.

               contains log records of all complete page/msg actions.

Gray & Reuter: Resource Manager                                                    29
          Physiological Logging Rules
   Online Operation - Only Need the Fix Rule
         Each operation is structured as a mini-transaction.

         Each operation generates an UNDO record.

         No page operation fails with the semaphore set.
          (exception handler must clean up state
          and UNFIX any pages).

         Then Rollback can be
           physical to a page/session/container and
           logical within page/session/container.

Gray & Reuter: Resource Manager                                30
           Physiological Logging Rules
    Restart Operation - Need WAL and F@C
          Need Page-Action consistent disc state.
           Pages are action consistent.
           Committed actions can be redone from log.
           Uncommitted actions can be undone from log.

          WAL: Write Ahead Log
           Write undo/redo log records before overwriting disc page
           Only write action-consistent pages

            Make transaction log records durable at commit.

Gray & Reuter: Resource Manager                                       31
                     Physiological Logging Rules
                           WAL and F@C
   WAL: Write Ahead Log
    write page:
      get page semaphore
      copy page
      give page semaphore /* avoids holding semaphore during IO */
      Force_log(Page(LSN)) /*WAL logic, probably already flushed*/
      Write copy to disc.

   WAL gives idempotence and testability.

     At commit phase 1:

Gray & Reuter: Resource Manager                                      32
                         WAL & F@C in Pictures
   Volatile Page            Volatile Log Durable Log Persistent Page
    Versions                 Records      Records    Versions
                                                               online: VVlsn = VLlsn
                                                       PVlsn   restart: DLlsn <= VVlsn
                                       DLlsn                           PVlsn <= DLlsn
   VVlsn        VLlsn                                          Commit:
                                                                 commit_lsn <= DLlsn

At restart all volatile memory is reset and must be
reconstructed from persistent memory.
                                                   PVlsn    PVlsn <= DLlsn
                                      DLlsn                 commit_lsn <= DLlsn

                FIX, WAL and F@C assure these assertions

Gray & Reuter: Resource Manager                                                    33
                   The One Bit Resource Manager
    Manages an array of transactional bits (the free space bit map).

    i = get_bit();         /* gets a free bit and sets it                     */

    give_bit(i);           /* returns a free bit (when transaction commits)   */

Gray & Reuter: Resource Manager                                                    34
                  The Bitmap and Its Log Records
    The Data Structure
    struct {                                     /* layout of the one-bit RM data structure   */
        LSN        lsn;                          /* page LSN for WAL protocol                 */
        xsemaphore sem;                          /* semaphore regulates access to the page    */
        Boolean    bit[BITS];                    /* page.bit[i] = TRUE => bit[i] is free      */
        } page;                                  /* allocates the page structure              */

    The Log Records
    struct                                       /* log record format for the one-bit RM      */
        { int index;                             /* index of bit that was updated             */
        Boolean      value;                      /* new value of bit[index]                   */
        } log_rec;                               /* log record used by the one-bit RM         */

    const int rec_size = sizeof(log_rec); /*size of the log record body.                      */

Gray & Reuter: Resource Manager                                                               35
           Page and Log Consistency for 1-Bit RM
         Data dirty if reflects an uncommitted transaction update
         Otherwise, data is clean.

         Page Consistency:
         • No clean free bit has been given to any transaction.
         • Every clean busy bit was given to exactly one transaction.
         • Dirty bits locked in X mode by updating transactions .
         • The page.lsn reflects most recent log record for page.
         Log Consistency:
         • Log contains a record for every completed
            mini-transaction update to the page.

Gray & Reuter: Resource Manager                                         36
    get_bit() & give_bit(i) temporarily violate page consistency.
    Mini-transaction holds semaphore while violating consistency.
    Makes page & log mutually consistent before releasing sem.
    => each mini-transaction observes a consistent page state.

    void give_bit(int i)                                                      /* free a bit          */
       { if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0))                        /* Lock bit            */
              { Xsem_get(&page.sem);                                          /* get page sem        */
              page.bit[i] = TRUE;                                             /* free the bit        */
              log_rec.index = i;                                              /* generate log rec    */
              log_rec.value = TRUE;                                           /*saying bit is free   */
              page.lsn = log_insert(log_rec,rec_size);         /*write log rec&update lsn            */
              Xsem_give(&page.sem);}                                          /* page consistent     */
       else                                            /* if lock failed, caller doesn't own bit,    */
               Abort_Work();                           /* in that case abort caller's trans          */
       return; };                                                             /*                     */

Gray & Reuter: Resource Manager                                                                       37
   int get_bit(void)                                    /* allocate a bit to and returns bit index */
       { int i;                                         /* loop variable                           */
       Xsem_get(&page.sem);                             /* get the page semaphore                  */
       for ( i = 0; i<BITS; i++);                       /* loop looking for a free bit             */
               {if (page.bit[i])                        /* if bit is free, may be dirty (so locked)*/
                   {if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit                      */
                          { page.bit[i] =FALSE;         /* got lock on it, so it was free          */
                          log_rec.value = FALSE;        /* generate log rec describing update */
                          log_rec.index = i;            /*                                         */
                          page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */
                          Xsem_give(&page.sem);         /* page now consistent, give up sem        */
                          return i; }                   /* return to caller                        */
                  };                                    /* else lock bounce so bit dirty           */
               };                                       /* try next free bit,                      */
       Xsem_give(&page.sem);                            /* if no free bits, give up semaphore      */
       Abort_Work();                                    /* abort transaction                       */
       return -1;};                                     /* returns -1 if no bits are available. */

Gray & Reuter: Resource Manager                                                                         38
                             Compensation Logging

   New State                             Logical Old St at e

    log record                                compensation log record
          Undo may generate a log record recording undo step
          Makes Page LSN monotonic
          Similar technique was used for Communication Manager
            (session sequence number was monotonic)

Gray & Reuter: Resource Manager                                         39
                        1-bit RM UNDO Callback

void undo(LSN lsn)                               /* undo a one-bit RM operation           */
   { int             i;                          /* bit index                             */
   Boolean           value;                      /* old bit value from log rec to be undone*/
   log_rec_header header;                        /* buffer to hold log record header      */
   rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec                     */
   Xsem_get(&page.sem);                          /* get the page semaphore                */
   i = log_rec.index;                            /* get bit index from log record         */
   value = ! log_rec.value;                      /* get complement of new bit value       */
   page.bit[i] = value;                          /* update bit to old value               */
   log_rec.value= value;                         /* make a compensation log record        */
   page.lsn = log_insert(log_rec,rec_size);      /* log it and bump page lsn              */
   Xsem_give(&page.sem);                         /* free the page semaphore               */
   return; }                                     /*                                       */

Gray & Reuter: Resource Manager                                                                 40
                    1-bit RM Checkpoint Callback
     LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/
       { Xsem_get(&page.sem);            /* get the page semaphore              */
       *low_water = log_flush(page.lsn); /* WAL force up to page lsn, and       */
                                         /*           set low water mark        */
       write(file,page,0,sizeof(page));  /* write page to persistent memory     */
       Xsem_give(&page.sem);             /* give page semaphore                 */
       return NULLlsn; }                 /* return checkpoint lsn (none needed) */

Gray & Reuter: Resource Manager                                                      41
                          1-bit RM REDO Callback
   void redo( LSN lsn)                              /* redo an free space operation           */
      { int             i;                          /* bit index                              */
      Boolean           value;                      /* new bit value from log rec to be redone*/
      log_rec_header header;                        /* buffer to hold log record header       */
      rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record                   */
      i = log_rec.index;                            /* Get bit index                          */
      lock(i,LOCK_X,LOCK_LONG,0);                   /* get lock on the bit (often not needed) */
      Xsem_get(&page.sem);                          /* get the page semaphore                 */
      if (page.lsn < lsn)                           /* if bit version older than log record */
             { value= log_rec.value;                /* then redo the op. get new bit value */
             page.bit[i] = value;                   /* apply new bit value to bit             */
             page.lsn = lsn; }                      /* advance the page lsn                   */
      Xsem_give(&page.sem);                         /* free the page semaphore                */
      return; };                                    /*                                        */

Gray & Reuter: Resource Manager                                                                42
                       1-BIT Rm Noise Callbacks
    Boolean prepare(LSN * lsn)                   /* 1-bit RM has no phase 1 work     */
      {*lsn = NULLlsn; return TRUE ;};           /*                                  */

    void Commit(void )                     /* Commit release locks &                 */
       { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return                       */

    void Abort(void )                      /* Abort release all locks &              */
       { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return                       */

    Boolean savepoint((LSN * lsn)                /* no work to do at savepoint       */
      {*lsn = NULLlsn; return TRUE ;};           /*                                  */

    void UNDO_savepoint(LSN lsn)             /* rollback work or abort transaction   */
       {if (savepoint == 0)                  /* if at savepoint zero (abort)         */
              unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks          */
       };                                    /*                                      */

Gray & Reuter: Resource Manager                                                       43
Model: Complex actions are a page/message action sequence.
LSN: Each page carries an LSN and a semaphore.
ReadFix: Read acts semaphore in shared mode.
WriteFix: Update actions get semaphore in exclusive mode,
        generate one or more log records covering the page,
        advance the page LSN to match highest LSN
        give semaphore
WAL: log_flush(page.LSN) before overwriting persistent page
F@C: force all log records up to the commit LSN at commit
Compensation Logging: Invalidate undone log record with a
  compensating log record.
Idempotence via LSN: page LSN makes REDO idempotent
Gray & Reuter: Resource Manager                           44
                                  Two Phase Commit
  Getting two or more logs to agree
  Getting two or more RMs to agree
  Atomically and Durably
  Even in case one of them fails and restarts.
  The TM phases
  Prepare. Invoke each joined RM asking for its vote.
  Decide. If all vote yes, durably write commit log record.
  Commit. Invoke each joined RM, telling it commit
  Complete. Write commit completion when all RM ACK.

Gray & Reuter: Resource Manager                               45
         Centralized Case of Two Phase Commit

   Each participant: (TM &RM) goes through a
    sequence of states
                                  Prepared   Committing   Committed
       Null         Active
                                             Aborting     Aborted

   These generate log records

Gray & Reuter: Resource Manager                                       46
   Committed                            Aborted
   begin                                begin
   DO rm1                               DO rm1
   DO rm2                               DO rm2
   DO rm2                               DO rm2
   prepare rm2 {locks}                  UNDO rm2
   commit { rm1, rm2}                   UNDO rm2
   complete                             UNDO rm1
                                        UNDO begin { rm1, rm2}

Gray & Reuter: Resource Manager                              47
                   Transitions in Case of Restart

           Active state not persistent, others are persistent

                                  For both TM and RM.

           Log records make them persistent (redo)

           TM tries to drive states to the right. (to committed, aborted)

                                     Prepared   Committing      Committed
            Null        Active
                                                 Aborting       Aborted

Gray & Reuter: Resource Manager                                             48
                      Successful two phase commit
  Message/Call flow from TM to each RM joined to transaction
                                    State                                                                State
                                                        Coordinator        Participant
                                    Active                                                               Active

                                                 Local Prepare                 Local Prepare
                                                        (lazy)                 Write Prepare Record
                                   Prepared                                    In Log (force)          Prepared
                                                 Write Commit
                                                 Record In Log Commit
                                                 Local Commit                  Local Commit Work
                                                         Work                  Write Completion Record Committing
                                                         (lazy)                In Log (lazy)
                                                                   Ack         Ack when durable.
                                               Write Completion
                                                 Record In Log                                        Committed
                                  Committed              (lazy)

  If TM and RM share the same log,
     the RM FORCE can piggyback on the TM FORCE
  One IO to commit a transaction (less if commit is grouped)
Gray & Reuter: Resource Manager                                                                                     49
                        Abort Two Phase Commit

    If RM sends "NO" or no response (timeout), TM starts abort.

    Calls UNDO of each trans log record

    May stop at a savepoint.

    At begin_trans it calls ABORT() callback of each joined RM

Gray & Reuter: Resource Manager                                   50
                      Distributed two phase commit

   Tracking joined TMs -- the communications manager helps
     Much as TRPC helps in the local case.

                            Communications                  Communications
     call      trid, data
                            Manager               Session   Manager        trid, data   callee
                              first time?                     first time?

                                                              incoming from A
                                                              trid is
                                  outgoing to B
                                  trid is

                            Transaction                       Transaction
                            Manager A                         Manager

   Root TM owes a Prepare/Commit/Abort message to each joined TM.
   Joined TM does "local" commit.
Gray & Reuter: Resource Manager                                                                  51
                      Full Transaction State Diagram
  Next section explains how these states are implemented.

                                  = save point 0
             live s tates
                                  = save point 1

            save point n             active        persistent save point n

             Persistent           committing              aborting
               States              committed              aborted

          complete states
Gray & Reuter: Resource Manager                                              52
       Summary of Resource Manager Concepts
       Idempotent, Testable, Real operations
       Logical vs Physical logging
       Shadows to make logical logging work
       Physiological logging
         Fix, WAL, Force-at-commit
         Page/Message/Log consistency
       RM callbacks (the 1-bit resource manager)
         Join, Prepare, Commit, Abort, UNDO, REDO, ....
       Restart REDO/UNDO
       Two phase commit (RM story is simple).

Gray & Reuter: Resource Manager                           53

To top