Transaction in database

Document Sample
Transaction in database Powered By Docstoc
					   Outline
               I   Introduction
               I   Background
               I   Distributed DBMS Architecture
               I   Distributed Database Design
               I   Semantic Data Control
               I   Distributed Query Processing
               ❏   Distributed Transaction Management
                    ➠ Transaction Concepts and Models
                    ➠ Distributed Concurrency Control
                    ➠ Distributed Reliability
               ❏   Parallel Database Systems
               ❏   Distributed Object DBMS
               ❏   Database Interoperability
               ❏   Concluding Remarks
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 1
 Transaction

         A transaction is a collection of actions that make
         consistent transformations of system states while
         preserving system consistency.
              ➠ concurrency transparency
              ➠ failure transparency
                                             Database may be
                       Database in a         temporarily in an              Database in a
                       consistent            inconsistent state             consistent
                       state                 during execution               state




                   Begin                     Execution of                   End
                   Transaction               Transaction                    Transaction


Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez                   Page 10-12. 2
Transaction Example –
A Simple SQL Query


         Transaction BUDGET_UPDATE
         begin
              EXEC SQL UPDATE PROJ
                       SET    BUDGET = BUDGET∗1.1
                       WHERE PNAME = “CAD/CAM”
         end.




Distributed DBMS       © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 3
 Example Database


         Consider an airline reservation example with the
         relations:

              FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP)
              CUST(CNAME, ADDR, BAL)
              FC(FNO, DATE, CNAME,SPECIAL)




Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 4
Example Transaction – SQL Version

    Begin_transaction Reservation
    begin
         input(flight_no, date, customer_name);
         EXEC SQL UPDATE FLIGHT
                      SET         STSOLD = STSOLD + 1
                      WHERE FNO = flight_no AND DATE = date;
         EXEC SQL INSERT
                      INTO        FC(FNO, DATE, CNAME, SPECIAL);
                      VALUES (flight_no, date, customer_name, null);
         output(“reservation completed”)
    end . {Reservation}




Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 5
Termination of Transactions
       Begin_transaction Reservation
       begin
            input(flight_no, date, customer_name);
            EXEC SQL SELECT          STSOLD,CAP
                             INTO             temp1,temp2
                             FROM             FLIGHT
                             WHERE            FNO = flight_no AND DATE = date;
            if temp1 = temp2 then
                   output(“no free seats”);
                   Abort
            else
                   EXEC SQL      UPDATE FLIGHT
                                 SET    STSOLD = STSOLD + 1
                                 WHERE FNO = flight_no AND DATE = date;
                   EXEC SQL      INSERT
                                 INTO   FC(FNO, DATE, CNAME, SPECIAL);
                                 VALUES (flight_no, date, customer_name, null);
              Commit
              output(“reservation completed”)
         endif
       end . {Reservation}
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 6
Example Transaction –
Reads & Writes
        Begin_transaction Reservation
        begin
                input(flight_no, date, customer_name);
                temp ←=Read(flight_no(date).stsold);
                if temp = flight(date).cap then
                begin
                     output(“no free seats”);
                     Abort
                end
                else begin
                     Write(flight(date).stsold, temp + 1);
                     Write(flight(date).cname, customer_name);
                     Write(flight(date).special, null);
                     Commit;
                     output(“reservation completed”)
                end
        end. {Reservation}

Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 7
Characterization


         I   Read set (RS)
              ➠ The set of data items that are read by a transaction
         I   Write set (WS)
              ➠ The set of data items whose values are changed by
                   this transaction
         I   Base set (BS)
              ➠ RS ∪ WS




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 8
Formalization
         Let
               ➠ Oij(x) be some operation Oj of transaction Ti operating on
                   entity x, where Oj ∈ {read,write} and Oj is atomic
               ➠ OSi = ∪j Oij
               ➠ Ni ∈ {abort,commit}

         Transaction Ti is a partial order Ti = {Σi, <i} where
         ❶ Σi = OSi ∪={Ni }
         ❷ For any two operations Oij , Oik ∈=OSi , if Oij = R(x)
           and Oik = W(x) for any data item x, then either
           Oij <i Oik or Oik <i Oij
         ❸ ∀Oij ∈=OSi, Oij <i Ni
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 9
Example

           Consider a transaction T:
                      Read(x)
                      Read(y)
                      x ←x + y
                      Write(x)
                      Commit
           Then
                   Σ = {R(x), R(y), W(x), C}
                   < = {(R(x), W(x)), (R(y), W(x)), (W(x), C), (R(x), C), (R(y), C)}




Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez     Page 10-12. 10
DAG Representation
       Assume
             < = {(R(x),W(x)), (R(y),W(x)), (R(x), C), (R(y), C), (W(x), C)}




                   R(x)


                                                 W(x)                   C


                   R(y)




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez       Page 10-12. 11
Properties of Transactions
         ATOMICITY
              ➠ all or nothing


         CONSISTENCY
              ➠ no violation of integrity constraints


         ISOLATION
              ➠ concurrent changes invisible È serializable


         DURABILITY
              ➠ committed updates persist


Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 12
Atomicity

      I   Either all or none of the transaction's operations are
          performed.
      I   Atomicity requires that if a transaction is
          interrupted by a failure, its partial results must be
          undone.
      I   The activity of preserving the transaction's atomicity
          in presence of transaction aborts due to input errors,
          system overloads, or deadlocks is called transaction
          recovery.
      I   The activity of ensuring atomicity in the presence of
          system crashes is called crash recovery.

Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 13
Consistency


               I   Internal consistency
                    ➠ A transaction which executes alone against a
                      consistent database leaves it in a consistent state.
                    ➠ Transactions do not violate database integrity
                      constraints.
               I   Transactions are correct programs




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 14
Consistency Degrees
         I   Degree 0
              ➠ Transaction T does not overwrite dirty data of other
                transactions
              ➠ Dirty data refers to data values that have been
                updated by a transaction prior to its commitment
         I   Degree 1
              ➠ T does not overwrite dirty data of other transactions
              ➠ T does not commit any writes before EOT




Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 15
Consistency Degrees (cont’d)
         I   Degree 2
              ➠ T does not overwrite dirty data of other transactions
              ➠ T does not commit any writes before EOT
              ➠ T does not read dirty data from other transactions
         I   Degree 3
              ➠ T does not overwrite dirty data of other transactions
              ➠ T does not commit any writes before EOT
              ➠ T does not read dirty data from other transactions
              ➠ Other transactions do not dirty any data read by T
                   before T completes.




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 16
Isolation
         I   Serializability
              ➠ If several transactions are executed concurrently,
                   the results must be the same as if they were
                   executed serially in some order.
         I   Incomplete results
              ➠ An incomplete transaction cannot reveal its results
                to other transactions before its commitment.
              ➠ Necessary to avoid cascading aborts.




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 17
Isolation Example
         I   Consider the following two transactions:
                   T1: Read(x)            T2: Read(x)
                        x ←x+1                x ←x+1
                        Write(x)              Write(x)
                        Commit                Commit
         I   Possible execution sequences:
                   T1:   Read(x)                       T1:    Read(x)
                   T1:   x ←x+1                        T1:    x ←x+1
                   T1:   Write(x)                      T2:    Read(x)
                   T1:   Commit                        T1:    Write(x)
                   T2:   Read(x)                       T2:    x ←x+1
                   T2:   x ←x+1                        T2:    Write(x)
                   T2:   Write(x)                      T1:    Commit
                   T2:   Commit                        T2:    Commit
Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez       Page 10-12. 18
SQL-92 Isolation Levels
         Phenomena:
         I Dirty read
              ➠ T1 modifies x which is then read by T2 before T1
                   terminates; T1 aborts    T2 has read value which
                   never exists in the database.
         I   Non-repeatable (fuzzy) read
              ➠ T1 reads x; T2 then modifies or deletes x and
                   commits. T1 tries to read x again but reads a
                   different value or can’t find it.
         I   Phantom
              ➠ T1 searches the database according to a predicate
                   while T2 inserts new tuples that satisfy the
                   predicate.



Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 19
SQL-92 Isolation Levels (cont’d)
         I   Read Uncommitted
              ➠ For transactions operating at this level, all three
                   phenomena are possible.
         I   Read Committed
              ➠ Fuzzy reads and phantoms are possible, but dirty
                   reads are not.
         I   Repeatable Read
              ➠ Only phantoms possible.
         I   Anomaly Serializable
              ➠ None of the phenomena are possible.




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 20
Durability


               I   Once a transaction commits, the system
                   must guarantee that the results of its
                   operations will never be lost, in spite of
                   subsequent failures.
               I   Database recovery




Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 21
Characterization of Transactions
        Based on
              ➠ Application areas
                   N   non-distributed vs. distributed
                   N   compensating transactions
                   N   heterogeneous transactions
              ➠ Timing
                   N   on-line (short-life) vs batch (long-life)
              ➠ Organization of read and write actions
                   N   two-step
                   N   restricted
                   N   action model
              ➠ Structure
                   N   flat (or simple) transactions
                   N   nested transactions
                   N   workflows
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 22
Transaction Structure
        I   Flat transaction
              ➠ Consists of a sequence of primitive operations embraced
                   between a begin and end markers.
                   Begin_transaction Reservation
                       …
                   end.
        I   Nested transaction
              ➠ The operations of a transaction may themselves be
                   transactions.
                   Begin_transaction Reservation
                       …
                       Begin_transaction Airline
                            – …
                       end. {Airline}
                       Begin_transaction Hotel
                            …
                       end. {Hotel}
                   end. {Reservation}
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 23
Nested Transactions
         I   Have the same properties as their parents ➩ may
             themselves have other nested transactions.
         I   Introduces concurrency control and recovery
             concepts to within the transaction.
         I   Types
              ➠ Closed nesting
                   N   Subtransactions begin after their parents and finish before
                       them.
                   N   Commitment of a subtransaction is conditional upon the
                       commitment of the parent (commitment through the root).
              ➠ Open nesting
                   N   Subtransactions can execute and commit independently.
                   N   Compensation may be necessary.
Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez      Page 10-12. 24
Workflows
      I   “A collection of tasks organized to accomplish some
          business process.” [D. Georgakopoulos]
      I   Types
            ➠ Human-oriented workflows
                   N   Involve humans in performing the tasks.
                   N   System support for collaboration and coordination; but no
                       system-wide consistency definition
            ➠ System-oriented workflows
                   N   Computation-intensive & specialized tasks that can be
                       executed by a computer
                   N   System support for concurrency control and recovery,
                       automatic task execution, notification, etc.
            ➠ Transactional workflows
                   N   In between the previous two; may involve humans, require
                       access to heterogeneous, autonomous and/or distributed
                       systems, and support selective use of ACID properties
Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez      Page 10-12. 25
Workflow Example

                                         T3
                                                                         T1: Customer request
                                                                             obtained
                                                                         T2: Airline reservation
                                                                             performed
     T1              T2                                      T5          T3: Hotel reservation
                                                                             performed
                                                                         T4: Auto reservation
                                                                             performed
                                                                         T5: Bill generated
                                         T4
  Customer         Customer                                Customer
  Database         Database                                Database




Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez                Page 10-12. 26
Transactions Provide…


         I   Atomic and reliable execution in the presence
             of failures

         I   Correct execution in the presence of multiple
             user accesses

         I   Correct management of replicas (if they
             support it)




Distributed DBMS         © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 27
Transaction Processing Issues

        I   Transaction structure (usually called
            transaction model)
              ➠ Flat (simple), nested

        I   Internal database consistency
              ➠ Semantic data control (integrity enforcement)
                   algorithms

        I   Reliability protocols
              ➠ Atomicity & Durability

              ➠ Local recovery protocols

              ➠ Global commit protocols
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 28
Transaction Processing Issues


         I   Concurrency control algorithms
              ➠ How to synchronize concurrent transaction
                   executions (correctness criterion)
              ➠ Intra-transaction consistency, Isolation

         I   Replica control protocols
              ➠ How to control the mutual consistency of replicated
                   data
              ➠ One copy equivalence and ROWA



Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 29
 Architecture Revisited
                         Begin_transaction,
                         Read, Write,
                         Commit, Abort      Results


                                                     Distributed
                                                   Execution Monitor
                              Transaction Manager
                                        (TM)
            With other                                                 With other
                         Scheduling/
            TMs          Descheduling                                  SCs
                         Requests
                                     Scheduler
                                       (SC)




                                        To data
                                       processor


Distributed DBMS         © 1998 M. Tamer Özsu & Patrick Valduriez            Page 10-12. 30
Centralized Transaction Execution
                       User                                                User
                                                 …
                    Application                                         Application

        Begin_Transaction,                                                  Results &
        Read, Write, Abort, EOT                                             User Notifications
                                            Transaction
                                             Manager
                                               (TM)

                             Read, Write,
                                                         Results
                             Abort, EOT

                                             Scheduler
                                               (SC)
                             Scheduled
                                                         Results
                             Operations

                                             Recovery
                                             Manager
                                               (RM)

Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez                   Page 10-12. 31
  Distributed Transaction Execution
                    User application

Begin_transaction,                Results &
Read, Write, EOT,                 User notifications                               Distributed
Abort                                                                         Transaction Execution
                                                                                     Model
                          TM                                   TM
                                                                                 Replica Control
 Read, Write,                                                                       Protocol
 EOT, Abort

                                                                                  Distributed
                          SC                                    SC            Concurrency Control
                                                                                   Protocol



                                                                                     Local
                          RM                                   RM                   Recovery
                                                                                    Protocol



 Distributed DBMS                  © 1998 M. Tamer Özsu & Patrick Valduriez            Page 10-12. 32
Concurrency Control

         I   The problem of synchronizing concurrent
             transactions such that the consistency of the
             database is maintained while, at the same
             time, maximum degree of concurrency is
             achieved.
         I   Anomalies:
              ➠ Lost updates
                   N   The effects of some transactions are not reflected on
                       the database.
              ➠ Inconsistent retrievals
                   N   A transaction, if it reads the same data item more than
                       once, should always read the same value.


Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 33
Execution Schedule (or History)
         I   An order in which the operations of a set of
             transactions are executed.
         I   A schedule (history) can be defined as a partial
             order over the operations of a set of transactions.

                   T1: Read(x)            T2: Write(x)              T3: Read(x)
                       Write(x)               Write(y)                  Read(y)
                       Commit                 Read(z)                   Read(z)
                                              Commit                    Commit


             H1={W2(x),R1(x), R3(x),W1(x),C1,W2(y),R3(y),R2(z),C2,R3(z),C3}



Distributed DBMS                  © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 34
Formalization of Schedule
          A complete schedule SC(T) over a set of
            transactions T={T1, …, Tn} is a partial order
            SC(T)={ΣT, < T} where

          ❶ ΣT = ∪i Σi , for i = 1, 2, …, n

          ❷ < T ⊇=∪i < i , for i = 1, 2, …, n

          ❸ For any two conflicting operations Oij, Okl ∈ ΣT,
            either Oij < T Okl or Okl < T Oij



Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 35
Complete Schedule – Example
    Given three transactions
         T1: Read(x)              T2: Write(x)                        T3: Read(x)
             Write(x)                 Write(y)                            Read(y)
             Commit                   Read(z)                             Read(z)
                                      Commit                              Commit
    A possible complete schedule is given as the DAG
                        R1(x)            W2(x)               R3(x)

                        W1(x)            W2(y)               R3(y)


                        C1               R2(z)               R3(z)

                                          C2                  C3
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 36
Schedule Definition
          A schedule is a prefix of a complete schedule
          such that only some of the operations and only
          some of the ordering relationships are included.
               T1: Read(x)           T2: Write(x)       T3:              Read(x)
                   Write(x)              Write(y)                        Read(y)
                   Commit                Read(z)                         Read(z)
                                         Commit                          Commit
   R1(x)           W2(x)             R3(x)        R1(x)                    W2(x)       R3(x)

   W1(x)           W2(y)             R3(y)                                 W2(y)       R3(y)


     C1            R2(z)             R3(z)                                 R2(z)       R3(z)

                    C2                C3
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez             Page 10-12. 37
Serial History
         I   All the actions of a transaction occur
             consecutively.
         I   No interleaving of transaction operations.
         I   If each transaction is consistent (obeys
             integrity rules), then the database is
             guaranteed to be consistent at the end of
             executing a serial history.
              T1: Read(x)          T2: Write(x)              T3: Read(x)
                  Write(x)             Write(y)                  Read(y)
                  Commit               Read(z)                   Read(z)
                                       Commit                    Commit

              Hs={W2(x),W2(y),R2(z),C2,R1(x),W1(x),C1,R3(x),R3(y),R3(z),C3}


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez         Page 10-12. 38
Serializable History
       I   Transactions execute concurrently, but the net
           effect of the resulting history upon the database
           is equivalent to some serial history.
       I   Equivalent with respect to what?
            ➠ Conflict equivalence: the relative order of
              execution of the conflicting operations belonging to
              unaborted transactions in two histories are the
              same.
            ➠ Conflicting operations: two incompatible
              operations (e.g., Read and Write) conflict if they both
              access the same data item.
                   N   Incompatible operations of each transaction is assumed
                       to conflict; do not change their execution orders.
                   N   If two operations from two different transactions
                       conflict, the corresponding transactions are also said to
                       conflict.
Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 39
Serializable History
                   T1: Read(x)            T2: Write(x)             T3: Read(x)
                       Write(x)               Write(y)                 Read(y)
                       Commit                 Read(z)                  Read(z)
                                              Commit                   Commit

         The following are not conflict equivalent
                    Hs={W2(x),W2(y),R2(z),C2,R1(x),W1(x),C1,R3(x),R3(y),R3(z),C3}

                    H1={W2(x),R1(x), R3(x),W1(x),C1,W2(y),R3(y),R2(z),C2,R3(z),C3}

         The following are conflict equivalent; therefore
               H2 is serializable.
                    Hs={W2(x),W2(y),R2(z),C2,R1(x),W1(x),C1,R3(x),R3(y),R3(z),C3}

                    H2={W2(x),R1(x),W1(x),C1,R3(x),W2(y),R3(y),R2(z),C2,R3(z),C3}


Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 40
 Serializability in Distributed DBMS

     I   Somewhat more involved. Two histories have to be
         considered:
           ➠ local histories
           ➠ global history

     I   For global transactions (i.e., global history) to be
         serializable, two conditions are necessary:
           ➠ Each local history should be serializable.
           ➠ Two conflicting operations should be in the same relative
               order in all of the local histories where they appear together.




Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez      Page 10-12. 41
 Global Non-serializability
                   T1: Read(x)                       T2: Read(x)
                    x ←x+5                               x ←x∗15
                    Write(x)                             Write(x)
                    Commit                               Commit

      The following two local histories are individually
      serializable (in fact serial), but the two transactions
      are not globally serializable.
                    LH1={R1(x),W1(x),C1,R2(x),W2(x),C2}
                    LH2={R2(x),W2(x),C2,R1(x),W1(x),C1}



Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 42
Concurrency Control
Algorithms
         I   Pessimistic
              ➠ Two-Phase Locking-based (2PL)
                   N   Centralized (primary site) 2PL
                   N   Primary copy 2PL
                   N   Distributed 2PL
              ➠ Timestamp Ordering (TO)
                   N   Basic TO
                   N   Multiversion TO
                   N   Conservative TO
              ➠ Hybrid
         I   Optimistic
              ➠ Locking-based
              ➠ Timestamp ordering-based


Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 43
Locking-Based Algorithms
         I   Transactions indicate their intentions by
             requesting locks from the scheduler (called lock
             manager).
         I   Locks are either read lock (rl) [also called shared
             lock] or write lock (wl) [also called exclusive lock]
         I   Read locks and write locks conflict (because Read
             and Write operations are incompatible
                           rl   wl
                  rl     yes no
                  wl     no     no
         I   Locking works nicely to allow concurrent
             processing of transactions.
Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 44
Two-Phase Locking (2PL)
     ❶ A Transaction locks an object before using it.
     ❷ When an object is locked by another transaction,
       the requesting transaction must wait.
     ❸ When a transaction releases a lock, it may not
       request another lock.
                                                    Lock point
                                                                                   Obtain lock

                                                                                   Release lock
                   No. of locks




                                          Phase 1                  Phase 2
                                  BEGIN                                          END
Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez                   Page 10-12. 45
 Strict 2PL
               Hold locks until the end.



                                                              Obtain lock

                                                              Release lock



                                                                      Transaction
                      BEGIN                            END            duration
                                        period of
                                        data item
                                        use




Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez              Page 10-12. 46
 Centralized 2PL
I   There is only one 2PL scheduler in the distributed system.
I   Lock requests are issued to the central scheduler.
           Data Processors at
            participating sites    Coordinating TM                  Central Site LM
                                                 Lock
                                                            Requ
                                                                est

                                                             d
                                                  Lock Grante
                               tion
                          Opera

                     End o
                           fO   peratio
                                          n

                                                  Relea
                                                        se   Locks

Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 47
 Distributed 2PL


      I   2PL schedulers are placed at each site. Each
          scheduler handles lock requests for data at that site.
      I   A transaction may read any of the replicated copies
          of item x, by obtaining a read lock on one of the
          copies of x. Writing into x requires obtaining write
          locks for all copies of x.




Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 48
Distributed 2PL Execution
              Coordinating TM       Participating LMs                Participating DPs

                         Lock
                                Requ
                                     est
                                                          Oper
                                                              ation


                                             ration
                                  End of Ope



                        Relea
                             se   Lock
                                         s



Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 49
Timestamp Ordering
  ❶ Transaction (Ti) is assigned a globally unique
    timestamp ts(Ti).
  ❷ Transaction manager attaches the timestamp to all
    operations issued by the transaction.
  ❸ Each data item is assigned a write timestamp (wts) and
    a read timestamp (rts):
        ➠ rts(x) = largest timestamp of any read on x
        ➠ wts(x) = largest timestamp of any read on x
  ❹ Conflicting operations are resolved by timestamp order.
    Basic T/O:
      for Ri(x)                               for Wi(x)
      if ts(Ti) < wts(x)                      if ts(Ti) < rts(x) and ts(Ti) < wts(x)
      then reject Ri(x)                       then reject Wi(x)
      else accept Ri(x)                       else accept Wi(x)
      rts(x) ←Τts(Ti)                         wts(x) ←Τts(Ti)
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez             Page 10-12. 50
Conservative Timestamp
Ordering
               I   Basic timestamp ordering tries to
                   execute an operation as soon as it
                   receives it
                    ➠ progressive
                    ➠ too many restarts since there is no delaying
               I   Conservative timestamping delays each
                   operation until there is an assurance
                   that it will not be restarted
               I   Assurance?
                    ➠ No other operation with a smaller
                      timestamp can arrive at the scheduler
                    ➠ Note that the delay may result in the
                      formation of deadlocks

Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 51
 Multiversion Timestamp Ordering
         I   Do not modify the values in the database,
             create new values.
         I   A Ri(x) is translated into a read on one version
             of x.
              ➠ Find a version of x (say xv) such that ts(xv) is the
                   largest timestamp less than ts(Ti).
         I   A Wi(x) is translated into Wi(xw) and accepted if
             the scheduler has not yet processed any Rj(xr)
             such that
                       ts(Ti) < ts(xr) < ts(Tj)


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 52
Optimistic Concurrency Control
Algorithms

      Pessimistic execution


                   Validate        Read               Compute            Write



      Optimistic execution


                    Read         Compute               Validate          Write




Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez           Page 10-12. 53
 Optimistic Concurrency Control
 Algorithms

         I   Transaction execution model: divide into
             subtransactions each of which execute at a site
              ➠ Tij: transaction Ti that executes at site j

         I   Transactions run independently at each site
             until they reach the end of their read phases
         I   All subtransactions are assigned a timestamp
             at the end of their read phase
         I   Validation test performed during validation
             phase. If one fails, all rejected.


Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 54
Optimistic CC Validation Test

         ❶ If all transactions Tk where ts(Tk) < ts(Tij)
           have completed their write phase before Tij
           has started its read phase, then validation
           succeeds
                   ➠ Transaction executions in serial order



            Tk         R       V           W
                                                              R          V   W
                                                   Tij




Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez           Page 10-12. 55
Optimistic CC Validation Test

       ❷ If there is any transaction Tk such that ts(Tk)<ts(Tij)
         and which completes its write phase while Tij is in
         its read phase, then validation succeeds if
         WS(Tk) ∩ RS(Tij) = Ø
               ➠ Read and write phases overlap, but Tij does not read data
                   items written by Tk


                    Tk     R            V           W
                                                         R           V    W
                                              Tij




Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez       Page 10-12. 56
 Optimistic CC Validation Test

    ❸ If there is any transaction Tk such that ts(Tk)< ts(Tij)
     and which completes its read phase before Tij
     completes its read phase, then validation succeeds if
     WS(Tk) ∩=RS(Tij) = Ø and WS(Tk) ∩=WS(Tij) = Ø
          ➠ They overlap, but don't access any common data items.



                                  R          V           W
                      Tk
                                        R           V           W
                            Tij




Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 57
Deadlock
       I   A transaction is deadlocked if it is blocked and will
           remain blocked until there is intervention.
       I   Locking-based CC algorithms may cause deadlocks.
       I   TO-based algorithms that involve waiting may cause
           deadlocks.
       I   Wait-for graph
             ➠ If transaction Ti waits for another transaction Tj to release
                   a lock on an entity, then Ti → Tj in WFG.


                            Ti                                        Tj


Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 58
Local versus Global WFG
     Assume T1 and T2 run at site 1, T3 and T4 run at site 2.
     Also assume T3 waits for a lock held by T4 which waits
     for a lock held by T1 which waits for a lock held by T2
     which, in turn, waits for a lock held by T3.
     Local WFG Site 1                      Site 2
                   T1                                              T4


                   T2                                              T3
       Global WFG
                   T1                                              T4


                   T2                                              T3

Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 59
Deadlock Management
               I   Ignore
                    ➠ Let the application programmer deal with it, or
                      restart the system
               I   Prevention
                    ➠ Guaranteeing that deadlocks can never occur in
                      the first place. Check transaction when it is
                      initiated. Requires no run time support.
               I   Avoidance
                    ➠ Detecting potential deadlocks in advance and
                      taking action to insure that deadlock will not
                      occur. Requires run time support.
               I   Detection and Recovery
                    ➠ Allowing deadlocks to form and then finding and
                      breaking them. As in the avoidance scheme, this
                      requires run time support.
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 60
Deadlock Prevention
      I   All resources which may be needed by a transaction
          must be predeclared.
          ➠ The system must guarantee that none of the resources will
            be needed by an ongoing transaction.
          ➠ Resources must only be reserved, but not necessarily
            allocated a priori
          ➠ Unsuitability of the scheme in database environment
          ➠ Suitable for systems that have no provisions for undoing
            processes.
      I   Evaluation:
          – Reduced concurrency due to preallocation
          – Evaluating whether an allocation is safe leads to added
            overhead.
          – Difficult to determine (partial order)
          + No transaction rollback or restart is involved.
Distributed DBMS         © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 61
Deadlock Avoidance

              I    Transactions are not required to request
                   resources a priori.
              I    Transactions are allowed to proceed unless a
                   requested resource is unavailable.
              I    In case of conflict, transactions may be
                   allowed to wait for a fixed time interval.
              I    Order either the data items or the sites and
                   always request locks in that order.
              I    More attractive than prevention in a
                   database environment.

Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 62
Deadlock Avoidance –
Wait-Die & Wound-Wait Algorithms
      WAIT-DIE Rule: If Ti requests a lock on a data item
      which is already locked by Tj, then Ti is permitted to
      wait iff ts(Ti)<ts(Tj). If ts(Ti)>ts(Tj), then Ti is aborted
      and restarted with the same timestamp.
           ➠ if ts(Ti)<ts(Tj) then Ti waits else Ti dies
           ➠ non-preemptive: Ti never preempts Tj
           ➠ prefers younger transactions
      WOUND-WAIT Rule: If Ti requests a lock on a data
      item which is already locked by Tj , then Ti is
      permitted to wait iff ts(Ti)>ts(Tj). If ts(Ti)<ts(Tj), then
      Tj is aborted and the lock is granted to Ti.
           ➠ if ts(Ti)<ts(Tj) then Tj is wounded else Ti waits
           ➠ preemptive: Ti preempts Tj if it is younger
           ➠ prefers older transactions
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 63
Deadlock Detection


                   I   Transactions are allowed to wait freely.
                   I   Wait-for graphs and cycles.
                   I   Topologies for deadlock detection
                       algorithms
                        ➠ Centralized
                        ➠ Distributed
                        ➠ Hierarchical



Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 64
Centralized Deadlock Detection
        I   One site is designated as the deadlock detector for
            the system. Each scheduler periodically sends its
            local WFG to the central site which merges them to
            a global WFG to determine cycles.
        I   How often to transmit?
              ➠ Too often      higher communication cost but lower delays
                   due to undetected deadlocks
              ➠ Too late    higher delays due to deadlocks, but lower
                   communication cost
        I   Would be a reasonable choice if the concurrency
            control algorithm is also centralized.
        I   Proposed for Distributed INGRES
Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 65
Hierarchical Deadlock Detection
              Build a hierarchy of detectors


                                                 DDox




                            DD11                                       DD14



                   Site 1                   Site 2 Site 3                       Site 4
                        DD21             DD22             DD23                DD24




Distributed DBMS                   © 1998 M. Tamer Özsu & Patrick Valduriez              Page 10-12. 66
Distributed Deadlock Detection
       I   Sites cooperate in detection of deadlocks.
       I   One example:
            ➠ The local WFGs are formed at each site and passed on to
                   other sites. Each local WFG is modified as follows:
                    ❶ Since each site receives the potential deadlock cycles from
                      other sites, these edges are added to the local WFGs
                    ❷ The edges in the local WFG which show that local
                      transactions are waiting for transactions at other sites are
                      joined with edges in the local WFGs which show that remote
                      transactions are waiting for local ones.
            ➠ Each local deadlock detector:
                    N   looks for a cycle that does not involve the external edge. If it
                        exists, there is a local deadlock which can be handled locally.
                    N   looks for a cycle involving the external edge. If it exists, it
                        indicates a potential global deadlock. Pass on the information
                        to the next site.
Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 67
Reliability


                   Problem:

                     How to maintain

                           atomicity

                           durability

                     properties of transactions




Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 68
Fundamental Definitions
        I   Reliability
              ➠ A measure of success with which a system conforms
                   to some authoritative specification of its behavior.
              ➠ Probability that the system has not experienced any
                   failures within a given time period.
              ➠ Typically used to describe systems that cannot be
                   repaired or where the continuous operation of the
                   system is critical.
        I   Availability
              ➠ The fraction of the time that a system meets its
                   specification.
              ➠ The probability that the system is operational at a
                   given time t.
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 69
Basic System Concepts
                           ENVIRONMENT

                           SYSTEM

                           Component 1                 Component 2
               Stimuli                                                  Responses
                                       Component 3




                     External state

                    Internal state


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 70
Fundamental Definitions
         I   Failure
              ➠ The deviation of a system from the behavior that is
                   described in its specification.
         I   Erroneous state
              ➠ The internal state of a system such that there exist
                   circumstances in which further processing, by the
                   normal algorithms of the system, will lead to a
                   failure which is not attributed to a subsequent fault.
         I   Error
              ➠ The part of the state which is incorrect.
         I   Fault
              ➠ An error in the internal states of the components of
                   a system or in the design of a system.

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 71
Faults to Failures



                           causes                           results in
                   Fault                    Error                        Failure




Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez               Page 10-12. 72
Types of Faults

         I   Hard faults
              ➠ Permanent
              ➠ Resulting failures are called hard failures

         I   Soft faults
              ➠ Transient or intermittent
              ➠ Account for more than 90% of all failures
              ➠ Resulting failures are called soft failures




Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 73
Fault Classification
               Permanent
                  fault
                                               Permanent
                                                  error
                   Incorrect
                    design
                                              Intermittent
              Unstable or                         error
                marginal                                                  System
              components                                                  Failure


               Unstable                        Transient
              environment                        error


                   Operator
                   mistake

Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez             Page 10-12. 74
Failures

                                       MTBF

                         MTTD                  MTTR


                                                                                        Time
         Fault  Error              Detection               Repair       Fault Error
         occurs caused             of error                             occurs caused


                   Multiple errors can occur
                      during this period




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez              Page 10-12. 75
Fault Tolerance Measures
       Reliability

                  R(t) = Pr{0 failures in time [0,t] | no failures at t=0}
             If occurrence of failures is Poisson
                  R(t) = Pr{0 failures in time [0,t]}
             Then
                                                           e-m(t)[m(t)]k
                    Pr(k failures in time [0,t] =
                                                                  k!

                   where m(t) is known as the hazard function which
                   gives the time-dependent failure rate of the
                   component and is defined as
                                            t
                              m(t) = z(x)dx
                                        0

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez       Page 10-12. 76
Fault-Tolerance Measures
         Reliability
              The mean number of failures in time [0, t] can be
                computed as
                            ∞
                                e-m(t )[m(t )]k
                     E [k] = k                 = m(t )
                           k =0
                                      k!
              and the variance can be be computed as
                      Var[k] = E[k2] - (E[k])2 = m(t)
              Thus, reliability of a single component is
                      R(t) = e-m(t)
            and of a system consisting of n non-redundant
            components as
                                n
                     Rsys(t) = ∏ Ri(t)
                               i =1

Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 77
Fault-Tolerance Measures

         Availability
                   A(t) = Pr{system is operational at time t}

              Assume
                   N   Poisson failures with rate=λ
                   N   Repair time is exponentially distributed with mean 1/µ

              Then, steady-state availability


                                                µ
                          A = lim A(t) =
                               t →=∞          λ=+=µ




Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez     Page 10-12. 78
Fault-Tolerance Measures
         MTBF
              Mean time between failures
                                 ∞
                      MTBF =    0
                                     R(t)dt

         MTTR
              Mean time to repair

         Availability
                      MTBF
                   MTBF + MTTR




Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 79
Sources of Failure –
SLAC Data (1985)

                                          Software
                                            13%


                                                Hardware
                                                  13%
                         Operations
                           57%
                                             Environment
                                                17%




       S. Mourad and D. Andrews, “The Reliability of the IBM/XA
       Operating System”, Proc. 15th Annual Int. Symp. on FTCS, 1985.
Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 80
Sources of Failure –
Japanese Data (1986)
                                  Operations
                                    10% Environment
                                             11%


                          Application
                             SW
                            25%

                                                  Vendor
                                                   42%
                              Comm.
                               Lines
                               12%




       “Survey on Computer Security”, Japan Info. Dev. Corp.,1986.

Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 81
Sources of Failure –
5ESS Switch (1987)

                                    Operations    Unknown
                                      18%           6%




                                                   Hardware
                                Software             32%
                                  44%




          D.A. Yaeger. 5ESS Switch Performance Metrics. Proc. Int.
          Conf. on Communications, Volume 1, pp. 46-52, June 1987.

Distributed DBMS         © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 82
Sources of Failures –
Tandem Data (1985)

                                       Maintenance
                                          25%


                         Operations                        Hardware
                           17%                               18%



                          Environment
                             14%                Software
                                                  26%




            Jim Gray, Why Do Computers Stop and What can be
            Done About It?, Tandem Technical Report 85.7, 1985.

Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 83
Types of Failures
       I   Transaction failures
             ➠ Transaction aborts (unilaterally or due to deadlock)
             ➠ Avg. 3% of transactions abort abnormally
       I   System (site) failures
             ➠ Failure of processor, main memory, power supply, …
             ➠ Main memory contents are lost, but secondary storage
               contents are safe
             ➠ Partial vs. total failure
       I   Media failures
             ➠ Failure of secondary storage devices such that the
               stored data is lost
             ➠ Head crash/controller failure (?)
       I   Communication failures
             ➠ Lost/undeliverable messages
             ➠ Network partitioning
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 84
 Local Recovery Management –
 Architecture
        I   Volatile storage
             ➠ Consists of the main memory of the computer system
                   (RAM).
        I   Stable storage
             ➠ Resilient to failures and loses its contents only in the
               presence of media failures (e.g., head crashes on disks).
             ➠ Implemented via a combination of hardware (non-volatile
               storage) and software (stable-write, stable-read, clean-up)
               components.
                                                                    Main memory
                     Secondary             Local Recovery
                      storage                 Manager

                                                       Fetch,
                                                       Flush            Database
                      Stable  Read                         Write        buffers
                     database              Database Buffer              (Volatile
                              Write           Manager      Read         database)


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez               Page 10-12. 85
Update Strategies


         I   In-place update

               ➠ Each update causes a change in one or more data
                   values on pages in the database buffers

         I   Out-of-place update

               ➠ Each update causes the new value(s) of data item(s)
                   to be stored separate from the old value(s)




Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 86
In-Place Update Recovery
Information
            Database Log
                   Every action of a transaction must not only perform
                   the action, but must also write a log record to an
                   append-only file.


                    Old                                                  New
                                          Update
              stable database                                      stable database
                                         Operation
                    state                                                state




                                                                       Database
                                                                         Log



Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 87
 Logging

        The log contains information used by the
        recovery process to restore the consistency of a
        system. This information may include
              ➠ transaction identifier
              ➠ type of operation (action)
              ➠ items accessed by the transaction to perform the
                   action
              ➠ old value (state) of item (before image)
              ➠ new value (state) of item (after image)
                       …

Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 88
Why Logging?
         Upon recovery:
              ➠ all of T1's effects should be reflected in the database
                   (REDO if necessary due to a failure)
              ➠ none of T2's effects should be reflected in the
                   database (UNDO if necessary)

                                                           system
                                                            crash
                          Begin         T1         End
                              Begin           T2




                      0                                        t         time

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 89
REDO Protocol
                    Old                                                    New
              stable database                REDO                    stable database
                    state                                                  state




                   Database
                     Log



         I   REDO'ing an action means performing it again.
         I   The REDO operation uses the log information
             and performs the action that might have been
             done before, or not done due to failures.
         I   The REDO operation generates the new image.
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 90
UNDO Protocol
                    New                                                         Old
              stable database                  UNDO                      stable database
                    state                                                      state



                   Database
                     Log




          I   UNDO'ing an action means to restore the
              object to its before image.
          I   The UNDO operation uses the log information
              and restores the old value of the object.

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez                 Page 10-12. 91
 When to Write Log Records
 Into Stable Store
        Assume a transaction T updates a page P
        I Fortunate case
             ➠ System writes P in stable database
             ➠ System updates stable log for this update
             ➠ SYSTEM FAILURE OCCURS!... (before T commits)
            We can recover (undo) by restoring P to its old state
            by using the log
        I   Unfortunate case
             ➠ System writes P in stable database
             ➠ SYSTEM FAILURE OCCURS!... (before stable log is
                   updated)
            We cannot recover from this failure because there is
            no log record to restore the old value.
        I   Solution: Write-Ahead Log (WAL) protocol
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 92
Write–Ahead Log Protocol
       I   Notice:
             ➠ If a system crashes before a transaction is committed,
                   then all the operations must be undone. Only need the
                   before images (undo portion of the log).
             ➠ Once a transaction is committed, some of its actions
                   might have to be redone. Need the after images (redo
                   portion of the log).

       I   WAL protocol :
             ❶ Before a stable database is updated, the undo portion of
                   the log should be written to the stable log
             ❷ When a transaction commits, the redo portion of the log
                   must be written to stable log prior to the updating of
                   the stable database.
Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez     Page 10-12. 93
 Logging Interface

              Secondary
               storage
                                    Main memory
                                                                                       Log
                   Stable               Local Recovery
                                           Manager                                    buffers
                    log                                               a   d
                                                Fetch,             Re            e
                                                                              rit    Database
                                                Flush                W
                                                                   Read                buffers
               Stable       Read        Database Buffer
                                           Manager                                    (Volatile
              database      Write                                   Write            database)




Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez                            Page 10-12. 94
Out-of-Place Update
Recovery Information
       I   Shadowing
             ➠ When an update occurs, don't change the old page, but
               create a shadow page with the new values and write it
               into the stable database.
             ➠ Update the access paths so that subsequent accesses
               are to the new shadow page.
             ➠ The old page retained for recovery.
       I   Differential files
             ➠ For each file F maintain
                   N   a read only part FR
                   N   a differential file consisting of insertions part DF+ and
                       deletions part DF-
                   N   Thus, F = (FR ∪ DF+) – DF-
             ➠ Updates treated as delete old value, insert new value
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez           Page 10-12. 95
Execution of Commands


                   Commands to consider:
                     begin_transaction
                                                        Independent of execution
                     read
                                                        strategy for LRM
                     write
                     commit
                     abort
                     recover




Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez         Page 10-12. 96
Execution Strategies
           I   Dependent upon
                   ➠ Can the buffer manager decide to write some of
                     the buffer pages being accessed by a transaction
                     into stable storage or does it wait for LRM to
                     instruct it?
                      N   fix/no-fix decision
                   ➠ Does the LRM force the buffer manager to write
                     certain buffer pages into stable database at the
                     end of a transaction's execution?
                      N   flush/no-flush decision
           I   Possible execution strategies:
                   ➠ no-fix/no-flush
                   ➠ no-fix/flush
                   ➠ fix/no-flush
                   ➠ fix/flush
Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 97
No-Fix/No-Flush
       I   Abort
            ➠ Buffer manager may have written some of the updated
                   pages into stable database
            ➠ LRM performs transaction undo (or partial undo)
       I   Commit
            ➠ LRM writes an “end_of_transaction” record into the log.
       I   Recover
            ➠ For those transactions that have both a
                   “begin_transaction” and an “end_of_transaction” record
                   in the log, a partial redo is initiated by LRM
            ➠ For those transactions that only have a
                   “begin_transaction” in the log, a global undo is executed
                   by LRM
Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 98
No-Fix/Flush
           I   Abort
                   ➠ Buffer manager may have written some of the
                     updated pages into stable database
                   ➠ LRM performs transaction undo (or partial undo)
           I   Commit
                   ➠ LRM issues a flush command to the buffer
                     manager for all updated pages
                   ➠ LRM writes an “end_of_transaction” record into the
                     log.
           I   Recover
                   ➠ No need to perform redo
                   ➠ Perform global undo

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 99
Fix/No-Flush
            I      Abort
                    ➠ None of the updated pages have been written
                      into stable database
                    ➠ Release the fixed pages
            I      Commit
                    ➠ LRM writes an “end_of_transaction” record into
                      the log.
                    ➠ LRM sends an unfix command to the buffer
                      manager for all pages that were previously
                      fixed
            I      Recover
                    ➠ Perform partial redo
                    ➠ No need to perform global undo
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 100
Fix/Flush
       I   Abort
            ➠ None of the updated pages have been written into stable
              database
            ➠ Release the fixed pages
       I   Commit (the following have to be done atomically)
            ➠ LRM issues a flush command to the buffer manager for
              all updated pages
            ➠ LRM sends an unfix command to the buffer manager
              for all pages that were previously fixed
            ➠ LRM writes an “end_of_transaction” record into the log.
       I   Recover
            ➠ No need to do anything


Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 101
Checkpoints

            I   Simplifies the task of determining actions of
                transactions that need to be undone or
                redone when a failure occurs.
            I   A checkpoint record contains a list of active
                transactions.
            I   Steps:
                   ❶ Write a begin_checkpoint record into the log
                   ❷ Collect the checkpoint dat into the stable storage
                   ❸ Write an end_checkpoint record into the log




Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 102
 Media Failures –
 Full Architecture
             Secondary
              storage
                                  Main memory
                                                                                    Log
               Stable                    Local Recovery
                                            Manager                                buffers
                log                                                    d
                                                                Rea
                                              Fetch,                          e
                                                                           rit    Database
                                              Flush                W
                                                                 Read               buffers
               Stable    Read         Database Buffer
                                         Manager                                   (Volatile
              database   Write                                   Write            database)


                                 Write                     Write




                                  Archive                 Archive
                                 database                   log
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez                           Page 10-12. 103
Distributed Reliability Protocols
         I   Commit protocols
              ➠ How to execute commit command for distributed
                transactions.
              ➠ Issue: how to ensure atomicity and durability?
         I   Termination protocols
              ➠ If a failure occurs, how can the remaining operational
                sites deal with it.
              ➠ Non-blocking : the occurrence of failures should not force
                the sites to wait until the failure is repaired to terminate
                the transaction.
         I   Recovery protocols
              ➠ When a failure occurs, how do the sites where the failure
                occurred deal with it.
              ➠ Independent : a failed site can determine the outcome of a
                transaction without having to obtain remote information.
         I   Independent recovery Þ non-blocking termination
Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 104
Two-Phase Commit (2PC)
          Phase 1 : The coordinator gets the participants
           ready to write the results into the database
          Phase 2 : Everybody writes the results into the
           database
               ➠ Coordinator :The process at the site where the
                 transaction originates and which controls the
                 execution
               ➠ Participant :The process at the other sites that
                 participate in executing the transaction
          Global Commit Rule:
               ❶ The coordinator aborts a transaction if and only if at
                 least one participant votes to abort it.
               ❷ The coordinator commits a transaction if and only if
                 all of the participants vote to commit it.
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 105
Centralized 2PC

                                  P                                   P

                                  P                                   P
                   C                                C                             C
                                  P                                   P

                                  P                                   P




                       ready?             yes/no        commit/abort? commited/aborted


                                Phase 1                            Phase 2



Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez                Page 10-12. 106
    2PC Protocol Actions
                   Coordinator                                                          Participant

                     INITIAL                                                              INITIAL
                                                        A RE
                                                    PREP
                        write
                   begin_commit                                write abort      No        Ready to
                       in log                                    in log
                                               T                                          Commit?
                                           BOR
                                     VOTE-A                                                     Yes
                      WAIT                      VOTE-COMMIT                              write ready
                                                                                           in log



                                    Yes   write abort                 GLOBAL-ABORT        READY
                     Any No?
                                            in log
                                                             MI T
                          No                           - CO M
                                                   VOTE
                   write commit
                       in log
                                                                                Abort     Type of
                                                                                           msg
                                                        ACK
                     COMMIT                 ABORT                       write abort            Commit
                                                                          in log
                                                        ACK                             write commit
                                                                                            in log
                                  write
                           end_of_transaction
                                 in log                                      ABORT        COMMIT


Distributed DBMS                  © 1998 M. Tamer Özsu & Patrick Valduriez                              Page 10-12. 107
Linear 2PC

                                         Phase 1

           Prepare       VC/VA            VC/VA               VC/VA       VC/VA


      1              2             3                   4              5            N

            GC/GA        GC/GA           GC/GA               GC/GA        GC/GA

                                        Phase 2

       VC: Vote-Commit, VA: Vote-Abort, GC: Global-commit, GA: Global-abort




Distributed DBMS          © 1998 M. Tamer Özsu & Patrick Valduriez            Page 10-12. 108
Distributed 2PC
                   Coordinator       Participants             Participants




                                                            global-commit/
                                                             global-abort
                                                vote-abort/ decision made
                           prepare             vote-commit independently

                                     Phase 1




Distributed DBMS            © 1998 M. Tamer Özsu & Patrick Valduriez         Page 10-12. 109
 State Transitions in 2PC
                           INITIAL                                              INITIAL


            Commit command                                                             Prepare
               Prepare                              Prepare                          Vote-commit
                                                   Vote-abort

                           WAIT                                                 READY


          Vote-abort                  Vote-commit (all)         Global-abort              Global-commit
         Global-abort                  Global-commit                Ack                       Ack



                   ABORT             COMMIT                           ABORT               COMMIT




                        Coordinator                                            Participants

Distributed DBMS                  © 1998 M. Tamer Özsu & Patrick Valduriez                       Page 10-12. 110
 Site Failures - 2PC Termination
                                                                    COORDINATOR

  I   Timeout in INITIAL
       ➠ Who cares                                                         INITIAL

  I   Timeout in WAIT                                    Commit command
       ➠ Cannot unilaterally commit                         Prepare

       ➠ Can unilaterally abort
                                                                           WAIT
  I   Timeout in ABORT or COMMIT
       ➠ Stay blocked and wait for the acks              Vote-abort                   Vote-commit
                                                        Global-abort                 Global-commit


                                                                   ABORT             COMMIT



Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez                      Page 10-12. 111
 Site Failures - 2PC Termination
                                                                       PARTICIPANTS



                                                                            INITIAL
  I   Timeout in INITIAL
        ➠ Coordinator must have                                                    Prepare
          failed in INITIAL state           Prepare
                                                                                 Vote-commit
                                           Vote-abort
        ➠ Unilaterally abort
  I   Timeout in READY                                                      READY

        ➠ Stay blocked                                      Global-abort              Global-commit
                                                               Ack                        Ack


                                                                    ABORT         COMMIT




Distributed DBMS         © 1998 M. Tamer Özsu & Patrick Valduriez                       Page 10-12. 112
 Site Failures - 2PC Recovery
                                                                       COORDINATOR


I   Failure in INITIAL                                                     INITIAL
     ➠ Start the commit process upon recovery
I   Failure in WAIT                                    Commit command
                                                          Prepare
     ➠ Restart the commit process upon
          recovery
                                                                           WAIT
I   Failure in ABORT or COMMIT
     ➠ Nothing special if all the acks have              Vote-abort                   Vote-commit
                                                        Global-abort                 Global-commit
       been received
     ➠ Otherwise the termination protocol is
       involved                                                    ABORT             COMMIT




Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez                      Page 10-12. 113
 Site Failures - 2PC Recovery
                                                                      PARTICIPANTS



I   Failure in INITIAL                                                      INITIAL
     ➠ Unilaterally abort upon recovery
I   Failure in READY                                                               Prepare
                                                                                 Vote-commit
                                                               Prepare
     ➠ The coordinator has been informed                      Vote-abort
       about the local decision
                                                                            READY
     ➠ Treat as timeout in READY state
       and invoke the termination protocol                   Global-abort             Global-commit
                                                                Ack
I   Failure in ABORT or COMMIT                                                            Ack

     ➠ Nothing special needs to be done
                                                                   ABORT          COMMIT




Distributed DBMS        © 1998 M. Tamer Özsu & Patrick Valduriez                      Page 10-12. 114
2PC Recovery Protocols –
Additional Cases
     Arise due to non-atomicity of log and message send
       actions
     I Coordinator site fails after writing “begin_commit”
       log and before sending “prepare” command
           ➠ treat it as a failure in WAIT state; send “prepare”
               command
     I   Participant site fails after writing “ready” record in
         log but before “vote-commit” is sent
           ➠ treat it as failure in READY state
           ➠ alternatively, can send “vote-commit” upon recovery
     I   Participant site fails after writing “abort” record in
         log but before “vote-abort” is sent
           ➠ no need to do anything upon recovery
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 115
2PC Recovery Protocols –
Additional Case
         I   Coordinator site fails after logging its final
             decision record but before sending its decision to
             the participants
               ➠ coordinator treats it as a failure in COMMIT or
                   ABORT state
               ➠ participants treat it as timeout in the READY state
         I   Participant site fails after writing “abort” or
             “commit” record in log but before
             acknowledgement is sent
               ➠ participant treats it as failure in COMMIT or ABORT
                   state
               ➠ coordinator will handle it by timeout in COMMIT or
                   ABORT state
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez    Page 10-12. 116
Problem With 2PC

         I   Blocking
              ➠ Ready implies that the participant waits for the
                coordinator
              ➠ If coordinator fails, site is blocked until recovery
              ➠ Blocking reduces availability
         I   Independent recovery is not possible
         I   However, it is known that:
              ➠ Independent recovery protocols exist only for single
                   site failures; no independent recovery protocol exists
                   which is resilient to multiple-site failures.
         I   So we search for these protocols – 3PC


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 117
Three-Phase Commit
                   I   3PC is non-blocking.
                   I   A commit protocols is non-blocking iff
                        ➠ it is synchronous within one state
                          transition, and
                        ➠ its state transition diagram contains
                           N   no state which is “adjacent” to both a commit
                               and an abort state, and
                           N   no non-committable state which is “adjacent”
                               to a commit state
                   I   Adjacent: possible to go from one stat to
                       another with a single state transition
                   I   Committable: all sites have voted to
                       commit a transaction
                        ➠ e.g.: COMMIT state
Distributed DBMS                 © 1998 M. Tamer Özsu & Patrick Valduriez      Page 10-12. 118
 State Transitions in 3PC
 Coordinator                                                                                     Participants
                            INITIAL                                              INITIAL


           Commit command                                                                Prepare
              Prepare                                 Prepare                          Vote-commit
                                                     Vote-abort

                            WAIT                                                 READY


         Vote-abort                        Vote-commit            Global-abort             Prepared-to-commit
        Global-abort                    Prepare-to-commit             Ack                   Ready-to-commit


                                       PRE-                                                 PRE-
                    ABORT                                               ABORT
                                      COMMIT                                               COMMIT

                   Ready-to-commit                                     Global commit
                    Global commit                                          Ack
                                      COMMIT                                               COMMIT


Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez                       Page 10-12. 119
Communication Structure

                   P                                 P                                  P

                   P                                 P                                  P
  C                                C                                  C                                   C
                   P                                 P                                  P

                   P                                 P                                  P



                                       pre-commit/
      ready?             yes/no         pre-abort?           yes/no          commit/abort       ack


               Phase 1                            Phase 2                             Phase 3




Distributed DBMS                  © 1998 M. Tamer Özsu & Patrick Valduriez                      Page 10-12. 120
Site Failures –
3PC Termination
 Coordinator
                            INITIAL
                                                        I   Timeout in INITIAL
           Commit command
              Prepare
                                                              ➠ Who cares
                                                        I   Timeout in WAIT
                            WAIT                              ➠ Unilaterally abort
                                                        I   Timeout in PRECOMMIT
         Vote-abort                        Vote-commit
        Global-abort                    Prepare-to-commit     ➠ Participants may not be in
                                                                PRE-COMMIT, but at least in
                                       PRE-
                                                                READY
                    ABORT
                                      COMMIT                  ➠ Move all the participants to
                   Ready-to-commit
                                                                PRECOMMIT state
                    Global commit                             ➠ Terminate by globally
                                      COMMIT                    committing

Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez       Page 10-12. 121
Site Failures –
3PC Termination
 Coordinator
                            INITIAL


           Commit command
              Prepare
                                                           I   Timeout in ABORT or
                                                               COMMIT
                            WAIT
                                                                 ➠ Just ignore and treat the
                                                                   transaction as completed
         Vote-abort                        Vote-commit
        Global-abort                    Prepare-to-commit        ➠ participants are either in
                                                                   PRECOMMIT or READY
                                       PRE-
                                                                   state and can follow their
                    ABORT
                                      COMMIT                       termination protocols

                   Ready-to-commit
                    Global commit

                                      COMMIT


Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 122
 Site Failures –
 3PC Termination
  Participants              INITIAL                          I   Timeout in INITIAL
                                                                   ➠ Coordinator must have
                                     Prepare                         failed in INITIAL state
 Prepare                           Vote-commit
Vote-abort                                                         ➠ Unilaterally abort

                            READY
                                                             I   Timeout in READY
                                                                   ➠ Voted to commit, but does
             Global-abort             Prepared-to-commit             not know the coordinator's
                 Ack                   Ready-to-commit               decision
                                                                   ➠ Elect a new coordinator
                    ABORT
                                       PRE-                          and terminate using a
                                      COMMIT
                                                                     special protocol
                   Global commit                             I   Timeout in PRECOMMIT
                       Ack
                                                                   ➠ Handle it the same as
                                      COMMIT
                                                                       timeout in READY state

Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 123
Termination Protocol Upon
Coordinator Election
   New coordinator can be in one of four states: WAIT,
    PRECOMMIT, COMMIT, ABORT
        ❶ Coordinator sends its state to all of the participants asking
          them to assume its state.
        ❷ Participants “back-up” and reply with appriate messages,
          except those in ABORT and COMMIT states. Those in these
          states respond with “Ack” but stay in their states.
        ❸ Coordinator guides the participants towards termination:
              N    If the new coordinator is in the WAIT state, participants can be in
                   INITIAL, READY, ABORT or PRECOMMIT states. New
                   coordinator globally aborts the transaction.
              N    If the new coordinator is in the PRECOMMIT state, the
                   participants can be in READY, PRECOMMIT or COMMIT states.
                   The new coordinator will globally commit the transaction.
              N    If the new coordinator is in the ABORT or COMMIT states, at the
                   end of the first phase, the participants will have moved to that
                   state as well.
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez      Page 10-12. 124
Site Failures – 3PC Recovery
 Coordinator                                              I   Failure in INITIAL
                            INITIAL
                                                                ➠ start commit process upon
                                                                     recovery
           Commit command
              Prepare                                     I   Failure in WAIT
                                                                ➠ the participants may have
                            WAIT                                  elected a new coordinator and
                                                                  terminated the transaction
         Vote-abort                        Vote-commit          ➠ the new coordinator could be
        Global-abort                    Prepare-to-commit
                                                                  in WAIT or ABORT states
                                                                  transaction aborted
                                       PRE-
                    ABORT
                                      COMMIT
                                                                ➠ ask around for the fate of the
                                                                  transaction
                   Ready-to-commit
                    Global commit                         I   Failure in PRECOMMIT
                                      COMMIT                    ➠ ask around for the fate of the
                                                                     transaction
Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez         Page 10-12. 125
Site Failures – 3PC Recovery
 Coordinator
                            INITIAL


           Commit command
              Prepare                                      I   Failure in COMMIT or
                                                               ABORT
                            WAIT                                 ➠ Nothing special if all the
                                                                     acknowledgements have been
         Vote-abort                        Vote-commit               received; otherwise the
        Global-abort                    Prepare-to-commit            termination protocol is
                                                                     involved
                                       PRE-
                    ABORT
                                      COMMIT

                   Ready-to-commit
                    Global commit

                                      COMMIT


Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 126
 Site Failures – 3PC Recovery
   Participants                                            I   Failure in INITIAL
                            INITIAL
                                                                 ➠ unilaterally abort upon
                                                                      recovery
                                     Prepare
 Prepare                           Vote-commit             I   Failure in READY
Vote-abort                                                       ➠ the coordinator has been
                                                                   informed about the local
                            READY
                                                                   decision
                                                                 ➠ upon recovery, ask around
             Global-abort              Prepared-to-commit
                 Ack                   Ready-to-commit I       Failure in PRECOMMIT
                                                                 ➠ ask around to determine how
                    ABORT
                                       PRE-                           the other participants have
                                      COMMIT                          terminated the transaction
                   Global commit
                                                           I   Failure in COMMIT or
                       Ack                                     ABORT
                                      COMMIT                     ➠ no need to do anything

Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez           Page 10-12. 127
Network Partitioning
            I   Simple partitioning
                   ➠ Only two partitions
            I   Multiple partitioning
                   ➠ More than two partitions
            I   Formal bounds (due to Skeen):
                   ➠ There exists no non-blocking protocol that is
                     resilient to a network partition if messages are
                     lost when partition occurs.
                   ➠ There exist non-blocking protocols which are
                     resilient to a single network partition if all
                     undeliverable messages are returned to sender.
                   ➠ There exists no non-blocking protocol which is
                     resilient to a multiple partition.
Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 128
Independent Recovery Protocols
for Network Partitioning
         I   No general solution possible
              ➠ allow one group to terminate while the other is
                blocked
              ➠ improve availability
         I   How to determine which group to proceed?
              ➠ The group with a majority
         I   How does a group know if it has majority?
              ➠ centralized
                   N   whichever partitions contains the central site should
                       terminate the transaction
              ➠ voting-based (quorum)
                   N   different for replicated vs non-replicated databases


Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 129
Quorum Protocols for
Non-Replicated Databases

              I    The network partitioning problem is
                   handled by the commit protocol.
              I    Every site is assigned a vote Vi.
              I    Total number of votes in the system V
              I    Abort quorum Va, commit quorum Vc
                    ➠ Va + Vc > V where 0 ≤ Va , Vc ≤ V
                    ➠ Before a transaction commits, it must obtain
                      a commit quorum Vc
                    ➠ Before a transaction aborts, it must obtain an
                      abort quorum Va


Distributed DBMS             © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 130
 State Transitions in
 Quorum Protocols
 Coordinator                                                                                       Participants
                            INITIAL                                                INITIAL


           Commit command                                                                 Prepare
              Prepare                                              Prepare              Vote-commit
                                                                  Vote-abort

                            WAIT                                                   READY


           Vote-abort                      Vote-commit        Prepared-to-abortt             Prepare-to-commit
        Prepare-to-abort                Prepare-to-commit      Ready-to-abort                 Ready-to-commit


                     PRE-              PRE-                              PRE-                 PRE-
                    ABORT             COMMIT                            ABORT                COMMIT
   Ready-to-abort                           Ready-to-commit       Global-abort
    Global-abort                                                                                      Global commit
                                             Global commit            Ack                                 Ack

                    ABORT             COMMIT                            ABORT                COMMIT


Distributed DBMS                      © 1998 M. Tamer Özsu & Patrick Valduriez                        Page 10-12. 131
Quorum Protocols for
Replicated Databases
           I   Network partitioning is handled by the
               replica control protocol.
           I   One implementation:
                   ➠ Assign a vote to each copy of a replicated data
                     item (say Vi) such that Σi Vi = V
                   ➠ Each operation has to obtain a read quorum (Vr)
                     to read and a write quorum (Vw) to write a data
                     item
                   ➠ Then the following rules have to be obeyed in
                     determining the quorums:
                      N   Vr + Vw > V      a data item is not read and written
                                           by two transactions concurrently
                      N   Vw > V/2         two write operations from two
                                           transactions cannot occur
                                           concurrently on the same data item
Distributed DBMS                © 1998 M. Tamer Özsu & Patrick Valduriez         Page 10-12. 132
Use for Network Partitioning

           I   Simple modification of the ROWA rule:
                   ➠ When the replica control protocol attempts to read
                     or write a data item, it first checks if a majority of
                     the sites are in the same partition as the site that
                     the protocol is running on (by checking its votes).
                     If so, execute the ROWA rule within that
                     partition.
           I   Assumes that failures are “clean” which
               means:
                   ➠ failures that change the network's topology are
                     detected by all sites instantaneously
                   ➠ each site has a view of the network consisting of
                     all the sites it can communicate with

Distributed DBMS              © 1998 M. Tamer Özsu & Patrick Valduriez        Page 10-12. 133
Open Problems

         I   Replication protocols
              ➠ experimental validation
              ➠ replication of computation and communication
         I   Transaction models
              ➠ changing requirements
                   N   cooperative sharing vs. competitive sharing
                   N   interactive transactions
                   N   longer duration
                   N   complex operations on complex data
              ➠ relaxed semantics
                   N   non-serializable correctness criteria



Distributed DBMS               © 1998 M. Tamer Özsu & Patrick Valduriez   Page 10-12. 134
Transaction Model Design Space
            Object complexity
           active
           objects

           ADT +
           complex
           objects

           ADT
           instances

           simple
           data

                                                                      Transaction
                                                                      structure
                         flat      closed       open    mixed
                                   nesting      nesting
Distributed DBMS           © 1998 M. Tamer Özsu & Patrick Valduriez          Page 10-12. 135

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:3/3/2013
language:
pages:135