Docstoc

485_Concurrency_Notes

Document Sample
485_Concurrency_Notes Powered By Docstoc
					BACS 486                                                                       485_Concurrency_Notes.doc
8/5/2012 8:26 PM

                                          CONCURRENCY NOTES


I. PROBLEMS OF UNCONTROLLED CONCURRENCY


    If a database has multiple users and does not control concurrency, then a number of
problems can occur that leave the database in an inconsistent state. The point of concurrency
control is to make sure that none of these things can happen while still allowing concurrent
shared data access and update.

These problems can be grouped into 3 general categories:
    - Lost Update Problem
    - Temporary Update Problem (a.k.a. uncommitted dependency problem)
    - Incorrect Summary Problem (a.k.a. inconsistent analysis problem)

     All 3 of these problems involve a unit of work called a transaction. In database, a
transaction is a group of statements that should be run in “un-interrupted mode”. This means
that all the statements are completed successfully, or none are completed. This notion is the
basis for the “atomic transaction” principle which is covered in more detail in the recovery
lectures. For now, view transactions as small programs that users submit that are designed to
produce a specific result when executed.

Example: Two bank transactions, one moves $50 from checking to savings and another adds
   $100 to checking.

                  T1                                         T2
         READ checking-balance                      READ checking-balance
         Balance = Balance - 50                     Balance = Balance + 100
         WRITE checking-balance                     WRITE checking-balance
         READ savings-balance
         Balance = Balance + 50
         WRITE savings-balance

    Depending upon how these two transactions are interleaved, uncontrolled concurrency can
give erroneous results.

A. Lost Update Problem

One of the updates can be "lost" (overwritten). For example:1


1
 These concurrency diagrams are used to show how the various steps of each transaction are mixed. Implied in the
diagram is that time passes as you read down the columns.

BACS 485 – Database Concurrency Control                1                       Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                  T1             |               T2
        _________________________|____________________________
         READ checking-balance   |
         Balance = Balance - 50  |
                                 |      READ checking-balance
                                 |      Balance = Balance + 100
         WRITE checking-balance  |
         READ savings-balance    |
                                 |      WRITE checking-balance T1 update
         Balance = Balance + 50  |                              lost
         WRITE savings-balance   |


B. Temporary Update Problem

    This problem occurs when one transaction updates the database and then fails (for some
reason). Before the original value is restored, some other transaction reads the incorrect value
and uses it. If the 2nd transaction updates based on the item then the database is left in an
inconsistent state.

                  T1             |               T2
        _________________________|____________________________
         READ checking-balance   |
         Balance = Balance - 50  |
         WRITE checking-balance  |
                                 |      READ checking-balance
                                 |      Balance = Balance + 100
                                 |      WRITE checking-balance
         READ savings-balance    |

T1 fails and must be restored -->

      The result of this example is that T2 updated incorrectly and the database is inconsistent
(i.e., wrong).

C. Incorrect Summary Problem

     Another problem occurs if one transaction is calculating an aggregate (sum) while another is
updating the same data group. This is a different type of error because it involves a whole set of
data items instead of just one.

                 T1              |              T2
        _________________________|____________________________
                                 |      Sum = 0
                                 |      READ check-balance-1
                                 |      Sum = Sum + Balance


BACS 485 – Database Concurrency Control          2                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                                          | ... and so on …
         READ checking-balance-5          |
         Balance = Balance – 50           |
         WRITE check-balance-5            |
                                          |        READ check-balance-5
                                          |        Sum = Sum + Balance
                                          |        READ check-balance-6
                                          |        Sum = Sum + Balance
         READ savings-balance-6           |
         Balance = Balance + 50           |
         WRITE savings-balance-6          |

    T2 reads balance-5 after 50 is subtracted and reads balance-6 before 50 is added, so
incorrect summary result in T2 (off by $50).

    All 3 problems occur in various ways (so these are only one example of how they could
occur). The main point to realize is that uncontrolled concurrent access with update has the
potential for problems.

II. SCHEDULES

     A schedule is a plan for interleaved execution of several transactions in a multi-user system.
In uncontrolled concurrency, the schedules tend to be random (often with bad results). The idea
is to devise a schedule to avoid causing inconsistent database states.

A. Serial Schedules

    Serial schedules ensure database integrity by executing one transaction after the other has
completed. They have low concurrency because it removes the advantage gained by multi-
programming.

Example: T1 transfers $50 from checking to savings while T2 takes 10% of checking and puts it
   into savings. Initial value checking $1000, savings $2000.

               T1                                              T2
     READ checking-balance                READ checking-balance
     Balance = Balance - 50               Temp = Balance * 10%
     WRITE checking-balance               Balance = Balance - Temp
     READ savings-balance                 WRITE checking-balance
     Balance = Balance + 50               READ savings-balance
     WRITE savings-balance                Balance = Balance + Temp
                                          WRITE savings-balance




BACS 485 – Database Concurrency Control            3                  Copyright Jay M. Lightfoot, Ph.D, C.D.P.
SCHEDULE 1:
                 T1            |               T2
      _________________________|____________________________
       READ checking-balance   |
       Balance = Balance - 50  |
       WRITE checking-balance  |
       READ savings-balance    |
       Balance = Balance + 50  |
       WRITE savings-balance   |
                               |      READ checking-balance
                               |      Temp = Balance * 10%
                               |      Balance = Balance - Temp
                               |      WRITE checking-balance
                               |      READ savings-balance
                               |      Balance = Balance + Temp
                               |      WRITE savings-balance
     Checking Value = $855
     Savings Value = $2145

SCHEDULE 2:
                  T1             |               T2
        _________________________|____________________________
                                 |      READ checking-balance
                                 |      Temp = Balance * 10%
                                 |      Balance = Balance - Temp
                                 |      WRITE checking-balance
                                 |      READ savings-balance
                                 |      Balance = Balance + Temp
                                 |      WRITE savings-balance
         READ checking-balance   |
         Balance = Balance - 50  |
         WRITE checking-balance  |
         READ savings-balance    |
         Balance = Balance + 50  |
         WRITE savings-balance   |

     Checking Value = $850
     Savings Value = $2150

    Notice that Schedule 1 and Schedule 2 get different results. That is OK. Serial schedules
can produce different results and still be "correct". Any "serializable schedule" must be
equivalent to one of the possible serial schedules. (For n transactions there are n! possible serial
schedules.)




BACS 485 – Database Concurrency Control           4                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
B. Non-Serial Schedules

    You must build interleaved transaction schedules to improve concurrency and throughput in
a multi-user system. When transactions are interleaved, the result can be the same as either serial
or non-serial execution. There are far more non-serial schedules than serial ones. Non-serial
schedules can produce bad results (i.e., inconsistent database).

Example: Bad Non-Serial Schedule.

SCHEDULE 3:
                  T1             |               T2
        _________________________|____________________________
         READ checking-balance   |
         Balance = Balance - 50  |
                                 |      READ checking-balance
                                 |      Temp = Balance * 10%
                                 |      Balance = Balance - Temp
                                 |      WRITE checking-balance
                                 |      READ savings-balance
         WRITE checking-balance  |
         READ savings-balance    |
         Balance = Balance + 50  |
         WRITE savings-balance   |
                                 |      Balance = Balance + Temp
                                 |      WRITE savings-balance

     Checking Value = $950
     Savings Value = $2100

     This is not correct because it gains $50 by executing concurrently. The database is wrong
(that is, left in an inconsistent state). This is the "lost update problem".

C. Serializable Schedules

Other non-serial schedules produce correct results.

Example: Good Non-Serial Schedule.

SCHEDULE 4:
                  T1             |              T2
        _________________________|____________________________
         READ checking-balance   |
         Balance = Balance - 50  |
         WRITE checking-balance  |
                                 |      READ checking-balance
                                 |      Temp = Balance * 10%


BACS 485 – Database Concurrency Control         5                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                                               |     Balance = Balance - Temp
                                               |     WRITE checking-balance
         READ savings-balance                  |
         Balance = Balance + 50                |
         WRITE savings-balance                 |     READ savings-balance
                                               |     Balance = Balance + Temp
                                               |     WRITE savings-balance

     Checking Value = $855
     Savings Value = $2145

     The trick is to know which non-serial schedules produce "correct" results and which do not.
A schedule is called "serializable" if it produces the same results as a serial schedule regardless
of the initial state of the database. In theory you should be able to test to see if a schedule is
serializable; however, this is not practical in real database systems. Most systems get around this
problem by having each transaction conform to sets of rules (called protocols) that are
guaranteed to produce serializable schedules. In this way you never have to concern yourself
with the schedules, only that the rules are followed in transactions.

III. CONCURRENCY CONTROL MECHANISMS

    To avoid serializability testing, protocols (rules) are used to assure serializable schedules.
Protocols use locks and timestamps. As long as the transactions follow the protocol, you can
ignore schedules and be guaranteed concurrency and database consistency (integrity). There are
several types of locks and timestamps. These are the raw materials that make up the protocols.
They are concurrency control mechanisms.

A. Exclusive Locks (Binary or ‘X’ locks)

    Exclusive locks are the simplest kind. They have two states, locked and unlocked. When a
record is locked then no transactions can access it. When it is unlocked then transactions can
access it (and lock it). Hence, exclusive locks enforce mutual exclusion on the record. Two
simple protocols can be built using exclusive locks.

1. PX/PXC Protocol

Protocol PX: Before reading a record you intend to update, place an XLOCK on it. If the
    transaction is already locked go into wait state. After the update, release the lock. The PX
    protocol is weak by itself and can produce non-serializable schedules.

Example: Problem of PX protocol. T1 transfers $50 from B to A and T2 displays the sum of A
   and B. Lost update problem.

          T1                               T2
         XLOCK(B)                         XLOCK(B)
         READ(B)                          READ(B)


BACS 485 – Database Concurrency Control                6                  Copyright Jay M. Lightfoot, Ph.D, C.D.P.
         B = B – 50                       UNLOCK(B)
         WRITE(B)                         XLOCK(A)
         UNLOCK(B)                        READ(A)
         XLOCK(A)                         UNLOCK(A)
         READ(A)                          DISPLAY(A + B)
         A = A + 50
         WRITE(A)
         UNLOCK(A)

                T1         |            T2
          _________________|__________________
         XLOCK(B)          |
         READ(B)           |
         B = B - 50        |
         WRITE(B)          |
         UNLOCK(B)         |
                           |      XLOCK(B)
                           |      READ(B)
                           |      UNLOCK(B)
                           |      XLOCK(A)
                           |      READ(A)
                           |      UNLOCK(A)
                           |      DISPLAY(A + B) <-- Displays sum $50 too low
         XLOCK(A)          |
         READ(A)           |
         A = A + 50        |
         WRITE(A)          |
         UNLOCK(A)         |

     This is not a serializable schedule because it gets results different from running T1 and T2
serially. To fix this, the PX protocol is augmented to hold onto locks longer (2nd type of
exclusive protocol).

Protocol PXC: Same as PX except hold the XLOCK until end-of-transaction. (Or end of
    program in some implementations).

Example: PXC protocol of above transactions (modified to hold locks).


                T3         |            T4
          _________________|__________________
         XLOCK(B)          |
         READ(B)           |
         B = B - 50        |
         WRITE(B)          |
                           |      XLOCK(B)


BACS 485 – Database Concurrency Control                7              Copyright Jay M. Lightfoot, Ph.D, C.D.P.
         XLOCK(A)                         |    wait...
         READ(A)                          |    wait...
         A = A + 50                       |    wait...
         WRITE(A)                         |    wait...
         UNLOCK(B)                        |    wait...
         UNLOCK(A)                        |    READ(B)
                                          |    XLOCK(A)
                                          |    READ(A)
                                          |    DISPLAY(A + B)
                                          |    UNLOCK(B)
                                          |    UNLOCK(A)

    Note that PXC fixes the lost update problem and produces a serializable schedule, but this
schedule is basically a T1 before T2 serial schedule so the concurrency is very low.

2. Deadlock

     Deadlock is a problem with locking protocols. It is a situation where two (or more)
transactions are each waiting for the other to release a lock. Neither can proceed until one
transaction releases its lock first. Resolving deadlock involves selecting a "victim" transaction
and rolling it back to release its locks. More on this later.

Example: Deadlock from PXC protocol. (Modification of above transactions)

      T1                                   T2
     XLOCK(B)                             XLOCK(A)
     READ(B)                              READ(A)
     B = B - 50                           XLOCK(B)
     WRITE(B)                             READ(B)
     XLOCK(A)                             DISPLAY(A + B)
     READ(A)                              UNLOCK(A)
     A = A + 50                           UNLOCK(B)
     WRITE(A)
     UNLOCK(B)
     UNLOCK(A)

                T1         |            T2
          _________________|__________________
         XLOCK(B)          |
         READ(B)           |
         B = B - 50        |
         WRITE(B)          |
                           |      XLOCK(A)
                           |      READ(A)
                           |      XLOCK(B) <-- Deadlock occurs at this point,
                           |      READ(B)          rest of schedule not executed


BACS 485 – Database Concurrency Control                8              Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                                          |   DISPLAY(A + B)
                                          |   UNLOCK(A)
                                          |   UNLOCK(B)
         XLOCK(A)                         |
         READ(A)                          |
         A = A + 50                       |
         WRITE(A)                         |
         UNLOCK(B)                        |
         UNLOCK(A)                        |

3. Livelock

     Livelock is another lock related problem. A transaction is in a state of livelock if it cannot
proceed for an indefinite period of time while other transactions continue normally. Livelock
occurs because the waiting scheme for locked items is unfair and gives priority to some
transactions over others. It also occurs when deadlock recovery constantly chooses one
transaction as the "victim" for rollback over and over again.

B. Multiple-Mode Locks

    Exclusive locks are too restrictive because they do not recognize that many transactions can
READ records concurrently without harm. Multiple-mode locks allow several levels of lock on
each data item. The common ones are: Exclusive, Shared, Update. This makes the locking
scheme and protocol more complex, but increases the concurrency.

1. Exclusive (Write Lock)

     Exclusive locks are one of the lock options found in multi-mode locking schemes. They act
the same as they do in single mode exclusive lock schemes (i.e., two states, locked and
unlocked...).

2. Shared Lock (Read Lock)

     Shared Locks are used when a transaction will be strictly reading a record. The holder of a
shared lock can read a record but not write it. Shared locks give the reading transaction data
consistency so the record is not changed while they are looking at it. (They notify updating
transactions that the record is in use.) The interaction between shared and exclusive locks can be
expressed by a compatibility matrix. The words XLOCK and SLOCK stand for exclusive lock
and shared lock.

                                              T1
                                 XLOCK        SLOCK       UNLOCK
         T2 XLOCK                  no           no          yes
            SLOCK                  no           yes         yes
            UNLOCK                 yes          yes         yes



BACS 485 – Database Concurrency Control               9                Copyright Jay M. Lightfoot, Ph.D, C.D.P.
    If T1 lock is compatible with T2 lock (i.e., "yes" at intersection) then lock is granted. If
incompatible, then T2 waits.

(a) PS/PSC Protocol

Two protocols can be built using shared and exclusive locks.

PS Protocol: Before reading a record you intend to update place an SLOCK on it. If no
    XLOCKs are present you get access, otherwise you wait. When you decide to update you
    must promote the lock to an XLOCK. If no other SLOCKs you get XLOCK, otherwise you
    wait. After the update or read, release the lock. The PS protocol improves concurrency
    some over PX, but still allows problems.

Example: T1 summing range of accounts while T2 modifying one up by $50. Incorrect summary
   problem.

          T1                               T2
         SUM = 0                          SLOCK(B)
         SLOCK(A)                         READ(B)
         READ(A)                          XLOCK(B)
         SUM = SUM + A                    B = B + 50
         UNLOCK(A)                        WRITE(B)
         SLOCK(B)                         UNLOCK(B)
         READ(B)
         SUM = SUM + B
         UNLOCK(B)
         SLOCK(C)
         READ(C)
         SUM = SUM + C
         UNLOCK(C)

               T1          |             T2
          _________________|__________________
         SUM = 0           |
         SLOCK(A)          |
         READ(A)           |
         SUM = SUM + A     |
         UNLOCK(A)         |
         SLOCK(B)          |
         READ(B)           |
         SUM = SUM + B     |
         UNLOCK(B)         |
                           |      SLOCK(B)
                           |      READ(B)
                           |      XLOCK(B)
                           |      B = B + 50


BACS 485 – Database Concurrency Control          10                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                                          |   WRITE(B)
                                          |   UNLOCK(B)
         SLOCK(C)                         |
         READ(C)                          |
         SUM = SUM + C                    |   <-- Sum is $50 low

     This is a non-serializable schedule. Typically PS has problems with deadlock and the three
classic concurrency problems still occur. To fix this, the PSC protocol is augmented to hold onto
locks longer.

PSC Protocol: Same as PS except that you hold all locks until end-of-transaction.

Example: PSC schedule using modified T3 and T4 that hold onto locks.

               T3          |              T4
          _________________|__________________
         SUM = 0           |
         SLOCK(A)          |
         READ(A)           |
         SUM = SUM + A     |
         SLOCK(B)          |
         READ(B)           |
         SUM = SUM + B     |
                           |      SLOCK(B)
                           |      READ(B)
                           |      XLOCK(B)
         SLOCK(C)          |      wait...
         READ(C)           |      wait...
         SUM = SUM + C     |      wait...
         UNLOCK(A)         |      wait...
         UNLOCK(B)         |      wait...
         UNLOCK(C)         |      wait...
                           |      B = B + 50
                           |      WRITE(B)
                           |      UNLOCK(B)

   This fixes the incorrect summary problem and provides slightly better concurrency than
PXC protocol. Still, it is close to a full serial schedule.

3. Update Lock

     Protocol exclusive locks result in low concurrency and shared locks lead to increased
deadlocks. Update Locks are a method used to help fix these problems. An update lock is a
cross between shared and exclusive. It indicates that the transaction MAY update the record. It
is compatible with shared locks, but not with other update locks or exclusive locks.



BACS 485 – Database Concurrency Control                11           Copyright Jay M. Lightfoot, Ph.D, C.D.P.
The compatibility matrix for exclusive, shared, and update locks is:

                                          T1
                                 XLOCK    ULOCK        SLOCK     UNLOCK
            XLOCK                  no       no           no        yes
         T2 ULOCK                  no       no           yes       yes
            SLOCK                  no       yes          yes       yes
            UNLOCK                 yes      yes          yes       yes

(a) PU/PUC Protocol

     The update lock allows two different protocols to be defined. They are PU and PUC.

PU Protocol: Before reading a record you intend to update, place an ULOCK on it. If no
   XLOCKs or ULOCKs are present you get access, otherwise you wait. When you decide to
   update you must promote the lock to an XLOCK. If no other XLOCKs you get it, otherwise
   you wait. After the update or read, release.

    Holders of a ULOCK are guaranteed to eventually get a XLOCK (unless rollback). Since
SLOCKS can coexist, concurrency is improved. Only one XLOCK may ever exist on a data
item. There can be any number of SLOCKS. PU protocol can have the 3 uncontrolled
concurrency problems if you release the locks too soon. Holding the locks longer helps.

PUC Protocol: Same as PU except hold locks until end-of-transaction. Lowers chance of
   deadlock, but decreases concurrency.

Example: Good PUC protocol schedule. T1 reads and sums A and B while T2 updates A by $50.

          T1                               T2
         SLOCK(A)                         ULOCK(A)
         READ(A)                          READ(A)
         SLOCK(B)                         XLOCK(A)
         READ(B)                          A = A + 50
         DISPLAY(A+B)                     WRITE(A)
         UNLOCK(A)                        UNLOCK(A)
         UNLOCK(B)

               T1          |            T2
          _________________|__________________
         SLOCK(A)          |
         READ(A)           |
                           |      ULOCK(A)
                           |      READ(A)
         SLOCK(B)          |
         READ(B)           |
                           |      XLOCK(A)

BACS 485 – Database Concurrency Control           12                   Copyright Jay M. Lightfoot, Ph.D, C.D.P.
         DISPLAY(A+B)                     |    wait...
         UNLOCK(A)                        |    wait...
         UNLOCK(B)                        |    wait...
                                          |    A = A + 50
                                          |    WRITE(A)
                                          |    UNLOCK(A)

    This is a serializable schedule with good concurrency. The ULOCK allows T1 to continue
(because T1 has declared its intention to NOT update). Using update locks does not
automatically free you from problems. PU protocol allows uncontrolled update problems if
locks released too soon. Also allows deadlock. The PUC allows deadlock.

C. Two-Phase Locking Protocol

     The PXC, PSC, and PUC locking protocols are generally implemented automatically
(implicitly) as part of the DML of the database. Consequently they are not major protocols from
a users point of view. The Two-Phase Locking Protocol is a major user protocol. It is one that
is explicit to the transaction. The programmer follows the protocol to ensure serializable
schedules. When you combine a locking mechanism with the 2-phase locking protocol then you
have a guaranteed serializable schedule. It does NOT stop deadlock however. To work, ALL
transactions must follow the protocol.

    The protocol requires that each transaction issue lock and unlock requests in 2 phases: 1)
Growing phase and 2) Shrinking phase. During the growing phase the transaction may obtain
locks but may not release any. During the shrinking phase they may release locks but may not
obtain any new locks.

Example: Two-Phase lock protocol. T1 moves $50 from A to B and T2 sums A and B.

          T1                               T2
         SLOCK(A)                         SLOCK(A)
         READ(A)                          READ(A)
         SLOCK(B)                         SLOCK(B)
         READ(B)                          READ(B)
         XLOCK(A)                         DISPLAY (A + B)
         A = A - 50                       UNLOCK(A)
         WRITE(A)                         UNLOCK(B)
         XLOCK(B)
         B = B + 50
         WRITE(B)
         UNLOCK(A)
         UNLOCK(B)




BACS 485 – Database Concurrency Control                13          Copyright Jay M. Lightfoot, Ph.D, C.D.P.
          T1         |      T2
         ____________|______________
         SLOCK(A)    |
         READ(A)     |
                     |     SLOCK(A)
                     |     READ(A)
         SLOCK(B)    |
         READ(B)     |
                     |     SLOCK(B)
                     |     READ(B)
         XLOCK(A) |
         wait...     |     DISPLAY (A + B)
         wait...     |     UNLOCK(A)
         wait...     |     UNLOCK(B)
         A = A – 50  |
         WRITE(A)    |
         XLOCK(B) |
         B = B + 50  |
         WRITE(B)    |
         UNLOCK(A) |
         UNLOCK(B) |

      This is a good serializable schedule. It has nice concurrency; however, good concurrency is
not the 2-phase lock protocol’s forte. Its main selling point is that it easy to apply and always
works. It is possible to come up with other "correct" schedules for a set of transactions that do
not use 2-phase locking, but you need to impose a structure on the data to do so. Deadlock is
still possible using 2-phase lock protocol. For example:

          T1         |       T2
         ____________|____________
         SLOCK(Y) |
         READ(Y)     |
                     |      SLOCK(X)
                     |      READ(X)
         XLOCK(X) |
          wait...    |      XLOCK(Y)
          wait...    |      wait...
          wait...    |      wait...

     Main point is that 2-phase locking gives you a guaranteed serializable schedule without
testing. There may be other (better) schedules that do not use 2-phase locking, but to find them
you need more information about the transactions OR must impose a structure (ordering) on the
data items in the database.




BACS 485 – Database Concurrency Control        14                   Copyright Jay M. Lightfoot, Ph.D, C.D.P.
D. Multi-Level Locks (Intention Locks)

    So far, all locks are "flat" (single-level). That is, either they apply to a data-item, or a
record, or a file, ... . There is no way to lock a whole group of items (No hierarchy of locks).
Sometimes it would be handy to group several data items together and treat them as one
individual unit. You could place a single lock on this unit and hold it.

     To fix the problem, you need to define a hierarchy of data levels where small ones are
nested within larger ones. You also need levels of locks to hold the various data levels. This is
leading to the locking granularity of the system. You can graphically represent the levels of
data as a tree where each non-leaf node represents the data associated with its descendants. Each
node of the tree can be locked individually. You must start at the root and lock all the ancestors
of a node before that node can be locked explicitly. When you explicitly lock a node then you
also implicitly locks its descendants.

     Shared and exclusive locks are too coarse for a multi-level scheme. When you use an
explicit XLOCK you implicitly XLOCK everything below. New levels of intention locks are
needed for the upper levels of the hierarchy to indicate what you intend to do below. When you
traverse the data graph, you leave an intention lock at each level to warn others what is going on
below. Intention locking protocols dictate how to achieve serializability with these locks.

     Locking granularity is a trade-off. Finer granularity means more concurrency but more
overhead. Coarser granularity means less overhead but bad concurrency. The 5 intention lock
levels are used in intention locking protocols. These are introduced below.

1. S Locks

    The transaction can tolerate concurrent readers, but no updates. The transaction will not
update. This is different from single-level SLOCK because it says all descendent nodes are
shared, so no lower level SLOCKS are needed.

2. X Locks

     The transaction cannot tolerate any concurrent access at all because it may update
something. It says that all lower-level nodes are now XLOCKed implicitly. You do not need to
issue any lower level locks.

3. IS Locks

    The transaction intends to set SLOCKs on records. No updating will be done. Explicit
locking will be done in lower levels of the tree, but only with SLOCKs.

4. IX Locks

    The transaction intends to place XLOCKs on records. It may update data or it may just read.
Explicit locking will be done at lower levels with either XLOCKs or SLOCKs.


BACS 485 – Database Concurrency Control          15                   Copyright Jay M. Lightfoot, Ph.D, C.D.P.
5. SIX Locks

    The transaction can tolerate readers at this level, but no updaters. The transaction may
update individual records (using XLOCKs). The sub-tree rooted at this node is explicitly locked
with SLOCK but some lower levels are explicitly locked using XLOCKs.

6. Compatibility Matrix

The lock compatibility table for intension locks is as follows:

                                                           T1
                                           X     S         IS     SIX             IX          UNLOCKED
            X                             no    no         no      no             no             yes
            S                             no    yes        yes     no             no             yes
         T2 IS                            no    yes        yes    yes             yes            yes
            SIX                           no    no         yes     no             no             yes
            IX                            no    no         yes     no             yes            yes
            UNLOCKED                      yes   yes        yes    yes             yes            yes


The intent locking protocol uses this matrix to guarantee serializable transactions using multiple
granularity locks. It is similar to the 2-phase locking protocol.

E. Major Lock Protocols

        To this point we have been talking about minor lock protocols that are either taken care
of implicitly (i.e., by the operating system/DBMS) or are building blocks for more complex
concepts. What follows are 3 major concurrency protocols that use the locks that were discussed
in the preceding section.

1. Two-Phase Locking

      The Two-Phase Locking Protocol is a major protocol. It was briefly introduced earlier. If
all transactions in the system follow the protocol, then serializable schedules result. This is
important because by following two simple rules you avoid serialization testing. The two-phase
lock protocol does NOT prevent deadlock or livelock.

    The protocol requires that each transaction issue lock and unlock requests in two distinct
phases: Growing phase and shrinking phase. During the growing phase the transaction may
obtain locks but may not release any. During the shrinking phase they may release locks but
may not obtain any new locks. Note, "lock" means any kind of lock (e.g., shared, update, or
exclusive). Any type of lock following this protocol produces a "correct" schedule. If lock
conversion is allowed then promotion can only take place in the growing phase and demotion
can only take place in the shrinking phase.


BACS 485 – Database Concurrency Control               16                Copyright Jay M. Lightfoot, Ph.D, C.D.P.
         This protocol is a natural outgrowth of the PXC, PSC, and PUC record locks. These locks
    showed that it is always better to hold locks till the end of the transaction. When you combine
    this with the idea of collecting all locks you need before processing you get 2-phase locking
    protocol. Most implementations hold locks until end-of-transaction; however the protocol only
    requires that you not acquire new locks after you start releasing them.

    2. Data Ordering Protocols

         If you cannot control the way transactions are written (2-phase lock), then you must impose
    a structure on the data to ensure serializability. A protocol using data ordering can ensure
    serializability without 2-phase locking. To use it you must build a tree of all data in the database.
    Nodes higher in the tree "own" those lower. In order to gain access to a node you must go
    through its parents and put a lock on them.

    A simple tree protocol uses only exclusive locks, but it can get more complex. The rules are as
        follows:
        1. The first lock by T1 may be on any data item
        2. Subsequently, a data item, Q, can be locked by T1 ONLY if the parent
            of Q is currently locked by T1
        3. Data items may be unlocked at any time
        4. A data item that has been locked and unlocked by T1 cannot be
            locked again by T1.

    Advantages of the protocol:
       - Guaranteed serializable schedule
       - No deadlock (no rollbacks required)
       - Unlocking may occur earlier then 2-phase (less wait time)

    Disadvantages of the protocol:
        - More data items need to be locked (more overhead)
        - Not easy to define order on database

3. Intent Locking Protocol

         The intent locking protocol ensures serializability in a multiple granularity database. The
    protocol uses intention locks and is similar to two-phase locking. It enhances concurrency and
    reduces lock overhead. It is particularly well suited when transactions include a mix of: short
    transactions that access only a few data items and long transactions that access a whole file or set
    of files. Note, Deadlock can occur with this protocol.

    Each transaction T1 can lock a node using the following protocol:

         1. The lock compatibility table is observed. This is shown (again) below:




    BACS 485 – Database Concurrency Control          17                   Copyright Jay M. Lightfoot, Ph.D, C.D.P.
                                                                  T1
                                           X           S          IS          SIX             IX          UNLOCKED
            X                             no          no          no           no             no             yes
            S                             no          yes         yes          no             no             yes
         T2 IS                            no          yes         yes         yes             yes            yes
            SIX                           no          no          yes          no             no             yes
            IX                            no          no          yes          no             yes            yes
            UNLOCKED                      yes         yes         yes         yes             yes            yes


     2. The root of the tree is locked first and can be locked in any mode.

     3. A node, Q, can be locked by T1 in S or IS mode ONLY if the parent of
        Q is currently locked by T1 in IS (or stronger) mode

     4. A node, Q, can be locked by T1 in X, SIX, or IX mode ONLY if the
        parent of Q is currently locked by T1 in IX (or stronger) mode.

     5. T1 can lock a node only if it has not previously unlocked any node
        (same as two-phase locking).

     6. T1 can unlock a node Q only if none of the children of Q are
        currently locked by T1.

    Notice that this protocol acquires locks in top-down (root-to-leaf) order and releases them in
bottom-up (leaf-to-root) order. The notion of "stronger lock" follows this hierarchy (from the
compatibility matrix):

                                                  X               Strongest
                                                   |
                                                SIX
                                                   |
                                              -----------
                                          S                 IX
                                              -----------
                                                   |
                                                  IS              Weakest


IV. LOCK PROBLEMS

    Locks have two problems that are not corrected by any protocol: Deadlock and Livelock.
These are not serialization issues, but rather execution issues.




BACS 485 – Database Concurrency Control                      18                     Copyright Jay M. Lightfoot, Ph.D, C.D.P.
A. Deadlock

There are 2 basic plans for handling deadlock:
    1) Deadlock Prevention - Ensure that system will never enter deadlock
    2) Deadlock Detection & Recovery - Catch the problem after it happens and then fix it.

    In both cases transaction(s) may be rolled back and restarted. Rollback means that you undo
everything the transaction has done. You should use prevention if the probability of deadlock is
high. Use detection and recovery when it is low (optimistic outlook).

1. Prevention (avoidance)

     Deadlock prevention schemes generally reduce concurrency (and can cause livelock). There
are 5 basic prevention plans:

     1) Initial lock on everything (Request Rejection)
     2) Order data items in all transactions
     3) Transaction scheduling in advance
     4) Transaction Retry when conflict
          a) Wait-Die
          b) Wound-Wait
     5) Timestamping

2. Detection and Recovery

    If prevention schemes are not used, then the DBMS must use Detection and Recovery. The
"Detection" phase is usually an algorithm that looks at the system to see if deadlock has
occurred. If it has, then it must roll back one (or more) transactions ("Recovery phase"). These
schemes generally increase overhead because the deadlock exists before you catch it (and waste
CPU cycles).

(a) Wait-For Graph (detection)

     Deadlock can be detected in Wait-For Graphs. These graphs logically connect all the
transactions in the system to the other transactions that they are waiting for. The system must
maintain these graphs as locks are added and removed. If a cycle exists, then deadlock is
present. A "cycle detection algorithm" is used to look at the graphs and find cycles.

(b) Rollback (recovery)

After detecting a cycle you must correct it. This requires that you:

     - select a victim to rollback
     - roll it back
     - make sure that livelock does not occur



BACS 485 – Database Concurrency Control         19                     Copyright Jay M. Lightfoot, Ph.D, C.D.P.
B. Livelock (Starvation)

     Livelock occurs when the system keeps selecting the same transaction for rollback. The
transaction may never finish executing. Another case where livelock occurs is in deciding which
transaction gets a lock and which waits in a conflict situation. DBMS needs to provide a "fair"
selection method to keep this from happening.

      For lock selection problem, a First-In-First-Out waiting scheme helps avoid livelock. The
first transaction to request a lock gets it. For victim selection, FIFO also works well to make
sure older transactions have preference. Similar to "wound-wait" scheme for victim selection.
Another way to make victim selection more fair is to base it on transaction priority and raise a
transaction priority each time it is rolled back. This will keep the number of rollbacks to a small
(finite) number.

V. TIMESTAMPS

     Timestamps are the 2nd concurrency control mechanism (after locks) used to avoid
serializability testing. Protocols using timestamps guarantee serializable schedules and do not
use locks, so deadlock cannot occur.

Timestamps have a lot going for them as a concurrency control mechanism:
   - Protocols PX, PXC, PS, PSC, ... 2-phase locking, do not apply.
   - They do not use locks, so DML statements can ignore locking implications.
   - Deadlock cannot occur (no detection or prevention needed).
   - Livelock cannot occur.
   - Lots of promise for distributed database uses.

     A timestamp is a unique identifier associated with each transaction. They can be generated
by the system clock or a counter. The difference between locks and timestamps is that locks
ensure that some serializable schedule is produced. Timestamp protocols ensure that a specific
serializable schedule is produced (based upon the chronological order of the timestamps).

    Protocols using locks resolve conflict at the time they occur. Timestamps resolve conflict
when the transactions are started for execution. When conflict occurs in timestamp protocols,
then transaction(s) are restarted (usually with new timestamp).

There are 3 general protocols built using timestamps:

     1. Timestamp ordering
     2. Optimistic concurrency control
     3. Multi-version timestamp scheme

     Concurrency protocols differ from the PXC, PUC, PSC protocols described earlier in that
the programmer has to make transactions explicitly follow some rules (so it is not implicitly done
for them). The following sections describe the common concurrency protocols used with
timestamps.


BACS 485 – Database Concurrency Control         20                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
A. Timestamp Ordering

     This protocol uses a unique timestamp associated with each transaction. Typically it is the
system clock or a counter value when the transaction starts. The timestamps determine the
serializability order. If T1 comes before T2 then the system must generate a schedule equivalent
to a serial schedule where T1 comes before T2. To implement, each data item in the database
has two timestamp values:

     1) W-timestamp - the largest timestamp of any transaction that has successfully executed a
            write on this item
     2) R-timestamp - The largest timestamp of any transaction that has successfully executed a
            read on the item.

These are updated with each READ and WRITE. Conflicting reads and writes are executed in
timestamp order using the following rules:

         READ:
         1. If TS(T1) < W-timestamp then this implies that T1 needs to read a value that has
                 already been overwritten, so T1 is rolled back.
         2. If TS(T1) >= W-timestamp then read is executed and R-timestamp is updated.

         WRITE:
          1. If TS(T1) < R-timestamp then this implies that the value to be written by T1 was read
                  by a younger transaction. You cannot write now because it might impact the
                  other transaction. So you must roll back T1.
          2. If TS(T1) < W-timestamp then this implies that T1 is trying to write an obsolete
                  value so ignore T1.
          3. Otherwise write T1 and reset W-timestamp.

When a transaction is rolled back, it is restarted with a new timestamp.

Example: Timestamp order protocol. T1 sums A and B, T2 transfers $50 from A to B. Assume
   TS(T1) < TS(T2).

          T1                               T2
         READ(B)                          READ(B)
         READ(A)                          B = B - 50
         DISPLAY(A + B)                   WRITE(B)
                                          READ(A)
                                          A = A + 50
                                          WRITE(A)




BACS 485 – Database Concurrency Control           21                 Copyright Jay M. Lightfoot, Ph.D, C.D.P.
          T1               |       T2
            _______________|_______________
         READ(B)           |
                           |      READ(B)
                           |      B = B - 50
                           |      WRITE(B)
         READ(A)           |
                           |      READ(A)
         DISPLAY(A + B)    |
                           |      A = A + 50
                           |      WRITE(A)

     This is one possible schedule. It could also be produced by the 2-phase locking protocol,
but that is just coincidence. Some timestamp ordering schedules are not possible under 2-phase
locking.

Advantages of timestamping ordering protocol:
   - Ensures serializability
   - No deadlock

Disadvantages:
    - May produce "cascading rollback"
    - Livelock is possible (called cyclic restart problem)

B. Optimistic Concurrency Control

     The methods covered so far (locks and timestamps) have all assumed that concurrency
problems are likely, so they check before database operations are executed. For example, a
check is done first in locking protocols to determine if the item is currently locked. When most
transactions are read-only, this is extra work that is not needed. Optimistic concurrency
control protocols do no checking while the transaction is executing. Instead, it checks at
transaction COMMIT time. This lets the system run freely without the overhead of locks and
concurrency testing.

     To allow this, all updates are made to working-storage copies of the data (in memory). At
the end of the transaction a validation phase checks whether there were any serializability
violations. If there were, the memory updates are discarded and the transaction is restarted.
There are 3 phases to an optimistic concurrency control protocol:

     1. Read Phase - Read items from database, but any writes are to memory copies.
     2. Validation Phase - Checking is performed to ensure serializability.
     3. Write Phase - If validation successful, update the database with memory values,
            otherwise discard and restart.




BACS 485 – Database Concurrency Control         22                  Copyright Jay M. Lightfoot, Ph.D, C.D.P.
3 different timestamps are associated with each transaction:

     1. Start(T1) - the time T1 started executing
     2. Validate(T1) - the time when T1 started validation phase
     3. Finish(T1) - the time when T1 finished its write phase

In addition, the set of data items read and written by each transaction must be kept.

The validation test for T2 requires that for all transactions Ti with TS(Ti) < TS(T2), one of the
    following conditions must hold:

     1. Finish(Ti) < Start(T2). Since Ti completes before T2 started then serializability must be
             maintained.
     2. The set of data items written by Ti does not intersect with T2
                        AND
      Start(T2) < Finish(Ti) < Validation (T2). Ti completes its write phase before T2 starts its
             validation phase. Ensures that writes do not overlap.

Example: Optimistic Concurrency Protocol. Same transactions from timestamp ordering
   example above.

                     T1    |       T2
            _______________|_______________
               READ(B)     |
                           |      READ(B)
                           |      B = B - 50
                           |      WRITE(B)
                           |      READ(A)
                           |      A = A + 50
                           |      WRITE(A)
               READ(A)     |
         DISPLAY(A + B)    |

    This is one serializable schedule from the optimistic protocol that could not be produced by
2-phase locking or timestamp ordering (but it is still "correct").

Advantages:
   - Ensures serializability
   - No deadlock
   - No cascading rollbacks

Disadvantages:
    - Not applicable unless high read-only frequency




BACS 485 – Database Concurrency Control         23                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.
C. Multi-Version Timestamp Scheme

     The previous protocols ensure serializability by either delaying an operation or aborting the
transaction that issued it. These problems go away if you keep multiple versions of the data. In
a multiversion database, each WRITE(Q) creates a new version of Q and each READ(Q) selects
the proper version of Q to read. DBMS use a multiversion timestamp protocol to do this and
ensure serializability. This protocol is as follows:

Each T1 has a unique timestamp TS(T1). Each data item Q has a sequence of versions (Q1, Q2,
Q3, ..., Qn) associated with it.

Each Qk contains 3 data fields:
    - Content - Value for Qk
    - W-timestamp - Timestamp of the transaction that created Qk
    - R-timestamp - The timestamp of the youngest transaction to
               successfully read Qk.

     Transaction T1 creates a new version of Q each time a WRITE(Q) is issued. The new Qk
has the W-timestamp and R-timestamp of T1. The R-timestamp is updated whenever a younger
transaction reads Qk. The multi-version protocol is:

     1. If T1 issues READ(Q) then the value returned is the content of the version of Q with the
             largest W-timestamp <= TS(T1).

     2. If T1 issues a WRITE(Q) and T1 is older than the R-timestamp of the largest W-
             timestamp <= TS(T1) then rollback. Otherwise create a new version of Qk.

Rule 1 reads the most recent version that came before it in time. Rule 2 forces a transaction to
abort if it is "too late" in doing a write (i.e., some other transaction has already read the value, so
you cannot let the write succeed).

Advantages:
   - No deadlock
   - Read request never fails and never has to wait

Disadvantages:
    - Reads require a write to update the R-timestamp
    - Conflicts are resolved by rollbacks rather than waits, expensive
    - Allows cascading rollbacks

        There are other, more advanced, timestamp protocols; however, they are beyond the
scope of this course and will not be covered. From the examples given above, you should be
able to see the basic differences and advantages of lock and timestamp protocols.




BACS 485 – Database Concurrency Control           24                     Copyright Jay M. Lightfoot, Ph.D, C.D.P.
VI. SQL Concurrency Control

     Next, we look into the concurrency controls built into Oracle 8i SQL. Oracle uses a hybrid
concurrency control and recovery mechanism called multi-version read consistency protocol. In
this scheme, there are no read locks, so a read can never block an update. Instead, the database
guarantees that a user sees a consistent view of the data during a transaction regardless of the
operations that other users perform. Likewise, Oracle makes sure that an executing transaction
cannot read any values that were written to disk after the transaction started.

A. Oracle Levels of Isolation

     In database, the isolation level communicates the degree of interaction from other
transactions allowed during execution of a transaction. The table shown below lists the different
isolation levels and the violations of serializability allowed by each level.

     Isolation Level                  Temporary Update          Non-repeatable2 Read              Phantom Read3
     Read Uncommitted                        yes                        yes                             yes
     Read Committed                          no                         yes                             yes
     Repeatable Read                         no                          no                             yes
     Serializable                            no                          no                             no

        Oracle 8i implicitly implements the READ COMMITTED and the SERIALIZABLE
levels. Both use row-level locking and both wait when a transaction tries to change a row
updated by an uncommitted transaction. Waiting transactions proceed when a blocking
transaction rolls back or aborts. Oracle also supports the READ ONLY isolation level if the
SQL SET TRANSACTION command is issued. The description of these three isolation levels is
given below.

Read Committed – Statement level enforcement of serialization. Means that each statement has
        available only data that was committed before the statement started. This is the default
        isolation level.
Serializable – States that each statement within a transaction can see data as it existed before the
        transaction started. Thus, serialization is enforced at the transaction level.
Read only – Transactions see only data that was fully committed before the transaction began.

B. Multiversion Read Consistency

    The Multiversion read consistency concurrency control in Oracle is implemented primarily
through the rollback segments, the system change number and locks. These topics are briefly
described below. For more complete coverage, go to the Oracle web-site and search for security
and concurrency control papers.

2
  This is a form of the “incorrect summary” problem where a transaction reads the same data field 2 times and gets a
different answer each time because some other transaction has modified it in the meantime.
3
  This is another form of the “incorrect summary” problem. In this case, extra tuples are inserted into a set after a
transaction query has run. In this way, running the same query two times gets a result with extra tuples the second
time.

BACS 485 – Database Concurrency Control                  25                       Copyright Jay M. Lightfoot, Ph.D, C.D.P.
1. Rollback Segments

     Rollback segments are data structures in Oracle used to store information needed to undo
transactions. When transactions are about to change data in a block, the before-image of the data
is written to the rollback segment. In this way, the prior state of the data can be recalled and
restored when needed. There are also redo logs that allow transaction results to be restored (or
rolled forward) in the event of data recovery.

2. System Change Number

     Oracle uses a number called the system change number (SCN) to keep database operations
in chronological order. The SCN is a type of logical timestamp that is stored in the redo log.
When the log is used, the SCN assures that transaction operations are redone in the correct
sequence. In this way, the SCN determines which version of the data to use to within the
transaction.

3. Locks

     Oracle uses implicit locking for all SQL statements. Thus, the programmer never need
explicitly lock resources. The programmer can; however, set explicit locks to change the default
behavior of the system. Locks are set at the lowest level of restrictiveness. This is done to
guarantee data integrity while maintaining data availability. The locks are stored with the data in
the actual data block. Because of this scheme, Oracle does not need to escalate locks. A down
side of this plan is that many rows that are never changes are unnecessarily locked. This reduces
concurrency and increases the chances of a deadlock. Oracle supports the following types of
locks:

         DDL locks – protect schema objects
         DML locks – used to protect base tables and rows
         Internal locks – used to protect data structures
         Internal latches – protects data dictionary entries, data files, tablespaces, and rollback
                    segments
         Distributed locks – protect data in distributed (and parallel) operating environments
         PCM locks – special locks used for parallel cache management.




BACS 485 – Database Concurrency Control            26                    Copyright Jay M. Lightfoot, Ph.D, C.D.P.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/6/2012
language:Latin
pages:26