Concurrency Control by wuzhengqin


									Concurrency Control

 Purposes
 Approaches
 Protocols & their variants
     two phase locking
        basic, strict 2PL (dynamic) & static (conservative) 2PL
     optimistic methods
        forward validation & backward validation
     time-stamp ordering
        single and multi versions
 CC for distributed database systems
 deadlock and distributed deadlock

 performance and implementation issues

Concurrency Control

 Purposes:
    shared database
    transactions may request on the same data item at the same time
       data conflict: transaction t1 is accessing data item x. Before t1 commits,
        another transaction t1 also wants to access x
    serial execution is good for database consistency but bad for
       no data conflict
    concurrency execution of transactions is good for performance but may
     result in inconsistent database
    the bad effect is permanent, irreversible and accumulative
    rules have to be defined to ensure that all the executions are correct
     and the resulting database is consistent (with small overhead)

Concurrency Control

 Using serialization graphs
   requires checking of the graphs for each data request
   may need to restart transactions
   involve high overhead in searching the graphs
 Preventive method
   define rules so that all serialization graphs will never be cyclic

Concurrency Control

 Approaches:
    By the scheduler
       when the scheduler receives an operation from the TM, it may:
          • immediately schedule it
          • delay it
          • reject it
    Conservative scheduler
       Blocking: if there is any possibility of risks, it will delay the operation
    Aggressive scheduler
       tends to avoid delaying operations
       Later it may has to restart transactions
    Basic Approaches
       delay (blocking) & restart

Concurrency Control

 Approaches
    Locking
       similar to the management of critical section in operating systems
       blocking is used to resolve data conflict (lock conflict) between different
       rules are defined on when to allow a transaction to set a lock
       the operation of a transaction is allowed to execute only after the lock of
        the required data items have been set in appropriate modes
    Optimistic
       the accessed data items of a transaction is maintained in a set
       data conflict is checked before a transaction is allowed to commit
       if it is non-serializable, one of the conflicting transactions will be restarted

Concurrency Control

   Time-stamp ordering
      each transaction is assigned a unique time-stamp
      data conflict is checked at the execution time of a transaction by comparing
       the time-stamps of the conflicting transactions
      the access of data items is serializable by their time-stamps
      if it is not, one of the conflicting transactions will be restarted
      the restarted transaction will be assigned a new time-stamp
   Conflicts
      a transaction has written a data item but it has not committed
      other transactions want to read or write the same data item

Concurrency Control

 Two Phase Locking
    a conservative data scheduler
    it maintains database consistency by locking of data items
    the system maintains a lock table
    each data item is defined with a lock in the lock table
    granularity of lock
       a lock may be correspond to one or more data items, record, field, file
    Before a transaction is allowed to access a data item, it has to set a
     lock corresponding to the data item
    There are two modes for locking
       read mode for read operations (shared locks)
       write mode for write operations (exclusive locks)

Concurrency Control

 Compatibility
 Basic Rules
    growing phase
       when the scheduler receives an operation pi[x] from the TM, it tests if pli[x]
        (p is operation type, l is a lock operation, i is the transaction id) conflicts
        with some qlj[x] that is already set.
       If so, it delays pi[x], forcing Ti to wait until it can set the lock it needs. If
        not, the scheduler sets pli[x], and sends pi[x] to the DM for processing
    Shrinking phase
       once the scheduler has released a lock for a transaction, it may not
        subsequently obtain any more locks for that transaction
    Number of locks
       increases initially and then decrease to none

Basic 2PL

 Basic 2PL
    Rule 1: When the scheduler receives an operation pi[x] from the TM,
     the scheduler tests if pli[x] conflicts with some qlj[x] that is already set
       If so, it delays pi[x], forcing Ti to wait until it can set the lock it needs.
       If not, then the scheduler sets pli[x], and then sends pi[x] to the DM
    Rule 2: Once the scheduler has set a lock for Ti, say pli[x], it may not
     release that lock at least until after the DM acknowledges that it has
     processed the lock’s corresponding operation, pi[x]
    Rule 3: Once the scheduler has released a lock for a transaction, it may
     not subsequently obtain any more locks for that transaction
    A transaction may release locks once it does not want to set any more
     lock even though it may not has finished all its operations, e.g., Wl[x]
     W[x] R[x] Wu[x] Commit

Basic 2PL

 Rule 1: prevent two transactions from concurrently accessing a
  data item in conflicting modes
 Rule 2: ensure that the DM processes operations on a data item in
  the order that the scheduler submits them
 Rule 3: ensure the two phase rule
 Note:
    The rule does not specify when the commit/abort operation has to be
     performed relative to the lock releasing operations
    The lock release operation may be performed before the commit/abort
     operation (NOT Strict and may be unrecoverable or cascading abort)

Why 2 Phase Rule

 Example: NOT 2PL
  H1 = rl1[x]r1[x]ru1[x]wl2[x]w2[x]wl2[y]w2[y]wu2[x]c2
  The SG(H) is T1  T2  T1. It is non-serailizable
 The problem is due to T1 releases its lock on x before it gets the
  lock on y
 If T1 releases its lock on x after it gets its lock on y, then the H will
H1 = rl1[x]r1[x]wl1[y]w1[y]c1ru1[x] wu1[y]
        The SG(H) is T1  T2. It is serial.

Detail mechanism

 Detail mechanism of the last example
    Initially, neither transaction owns any locks
    The scheduler receives r1[x] from the TM. Accordingly, it sets rl1[x]
     and submits r1[x] to the DM. Then the DM ack the processing of r1[x]
    The scheduler receives w2[x] from the TM. The scheduler cannot set
     wl2[x], which conflicts with rl1[x], so it delays the execution of w2[x]
     by placing it on a queue
    The scheduler receives w1[x] from the TM. It sets wl1[y] and submits
     w1[y] to the DM. Then, the DM ack the processing of w1[y]
    The scheduler receives c1 from the TM, signaling that T1 has
     terminated. The scheduler sends c1 to the DM. After the DM ack
     processing c1, the scheduler releases rl1[x] and wl1[y]

Detail Mechanism

 The scheduler sets wl2[x] so that w2[x], which has been delayed,
  can now be sent to the DM. Then the DM ack w2[x]
 The scheduler receives w2[y] from the TM. It sets wl2[y] and sends
  w2[y] to the DM. The DM then ack processing w2[y]
 T2 terminates and the TM sends c2 to the scheduler. The scheduler
  sends c2 to the DM. After the DM ack processing c2, the scheduler
  releases wl2[x] and wl2[y]

Strict 2PL Principles

 Basic 2PL
   the lock release time may before the commitment
   cannot prevent cascading abort
   may not be recoverable
    WL1[x] W1[x] WU1[x] RL2[x] R2[x] C1 …. C2

 Strict 2PL
   holds the locks until commitment
    WL1[x] W1[x] C1 WU1[x] RL2[x] R2[x] …. C2
   the commit/abort operation is performed before the lock release

Variants of 2PL (strict)

 Dynamic 2PL
    It differs from the Basic 2PL in that it requires the scheduler to release
     all of a transaction’s locks together, when the transaction terminates
    It can ensure that the execution is a strict execution
 Static 2PL
    It requires a transaction to pre-declare its read-set and write-set
    The scheduler tries to set all of the locks needed by Ti
    If the scheduler can set all the locks, the processing of T1’s db
     operations can start
    Otherwise, none of Ti’s locks will be set
    It inserts Ti’s lock requests into a lock queue. Every time, the scheduler
     releases a lock, Ti’s lock requests will be checked again
    It is deadlock free

Comparing S2PL & C2PL

 Probability of lock conflict
     S2PL is higher
 Locking overhead
     S2PL is higher
 Number of locks
     S2PL is greater
 Deadlock
     D2PL is possible but S2PL is not
 S2PL may not be possible for some systems

Locking Problem

 Deadlock
    2PL may result in deadlock (for dynamic 2PL)
    probability of deadlock depends on the number of locks required by a
     transaction and the total number of locks, which are locked by other
     transactions, in the system
 Lock conversion problem
    when a transaction tries to strengthen a read lock to write lock,
     deadlock may occur
    Example:
     T4: r4[x]  w4[x]  c4
     T5: r5[x]  w5[x]  c5

Correctness of 2PL

 To prove all histories generated according to 2PL will be serializable
  or serial
 1: If oi[x] is in C(H), oli[x] and oui[x] are in C(H) and
 2: If pi[x] & qj[x] are conflicting operations in C(H), either pui[x]
  <qlj[x] or quj[x] <pli[x]
 3: If pi[x] & qi[y] are in C(H), pli[x] < qui[y]

Correctness of 2PL

 Lemma 1:
  Let H be a 2PL history, and suppose TiTj is in SG(H). Then, for
  some data item x and some conflicting operations pi[x] & qj[x] in H,
  pui[x] <qlj[x]
 Proof:
    Since Ti Tj, there must exist conflicting operations pi[x] and qj[x] such
       that pi[x] < qj[x].
    From 1,
       pli[x] < pi[x] < pui[x] and
       qlj[x] < qj[x] < quj[x]
    From 2,
       either pui[x] < qlj[x] or quj[x] < pli[x] (contradict)
    Then, pui[x] < qlj[x]

Correctness of 2PL

 Lemma 2:
    Let H be a 2PL history, and let T1 T2 ... Tn be a path in SG(H)
     where n >1. Then for some data items x and y, and some operations
     p1[x] and qn[y] in H, pu1[x] < qln[y]
 Lemma 3:
    Every 2PL history is serializable
    T1 T2 ... Tn T1 is a contradiction

Lock Implementation

 Scheduler is called lock manager (LM)
 LM maintains a table of locks and supports the lock operations such
  as Lock/Unlock(transaction-id, data item, mode)
 Lock operations are invoked very frequently. It must be very
  efficiently implemented
 Lock table is usually implemented as a hash table with the data
  item identifier as key
 An entry in the table for data item x contains a queue header,
  which points to a list of locks on x that have been set and a list of
  locks requests that are waiting
 Since a very large no. of data items can be locked, the LM limits the
  size of lock table by dynamic allocation of entries

Implementation of 2PL

 To make the lock release operations more efficient, all the read and
  write locks of a transaction is linked together
 When a transaction commit, all the locks will be released at the
  same time by making one call to the LM
 The lock table should be protected and only be accessed by the LM
 The lock operations must be atomic

  Lock Manager

 A lock manager services the operations
   Lock(trans-id, data-item-id, mode)
   Unlock(trans-id, data-item-id)
 It stores locks in a lock table. Lock op inserts [trans-id, mode] in
  the table. Unlock deletes it.

Data Item        List of Locks Wait List
     x           [T1,r] [T2,r]        [T3,w]
     y           [T4,w]               [T5,w] [T6, r]

Locking Granularity

 Granularity - size of data items to lock
   e.g., files, pages, records, fields
 Coarse granularity implies
   very few locks, so little locking overhead
   must lock large chunks of data, so high chance of conflict,
    so concurrency may be low
 Fine granularity implies
   many locks, so high locking overhead
   locking conflict occurs only when two transactions try to
    access the exact same data concurrently
 High performance TP requires record locking

Hot Spot Techniques

 If each txn holds a lock for t seconds, then the max
  throughput is 1/t txns/second for that lock.
 Hot spot - A data item that’s more popular than others,
  so a large fraction of active txns need it
   summary information (total inventory)
   end-of-file marker in data entry application
   counter used for assigning serial numbers
 Hot spots often create a convoy of transactions. The
  hot spot lock serializes transactions.

Hot Spot Techniques

  Special techniques are needed to reduce t
    keep the hot data in main memory
    delay operations on hot data till commit time
    use optimistic methods
    batch up operations to hot spot data
    partition hot spot data

Distributed Database & 2PL

 Distributed database
     the database is partitioned at different sites
     transaction execution may require the creation of different processes to
      access the data items at different sites
     one of the processes is called coordinator and the others are called
 Local Serializability Vs. Global Serializability
     Site 1: W1[x] R2[x]
     Site 2: W2[y] R1[y]
     SG(S1): T1  T2
     SG(S2): T2  T1
     Globally non-serializable

Distributed 2PL

 Approaches:
    distributed
    central
    hybrid
    primary copies (for replicated database)
 Distributed 2PL
    each site has a local lock scheduler
    the local scheduler is responsible to set the locks at its site
    if the required lock of a transaction is in a remote site, the TM forwards
     the lock request to the remote site
    the scheduler at the remote site will process the lock request and
     operation at that site
    Advantages: fully distributed, fault tolerance, load balancing
    Disadvantages: distributed deadlock
Distributed 2PL

 Centralized 2PL
    only one lock scheduler in the whole system
    the lock scheduler is situated in one of the site, usually called central
     site (or primary site)
    the lock requests from all the sites are sent to the central siteby their
    the lock scheduler processes the lock requests in the same way as it is
     in a single site db system
    after setting the lock, an ack will be sent to the TM which originates
     the lock request
    the TM does not need to send the operation to the site for processing
    Advantage: no distributed deadlock
    Disadvantages: ???

Distributed 2PL

 Hybrid 2PL
    combining distributed 2PL and centralized 2PL
    there are several lock scheduler in the system
    a scheduler is responsible for the locking of one to several sites
    Advantages: combining the benefit of distributed and centralized 2PL
    Disadvantages: distributed deadlock is still possible

 Replicated Database
    to reduce the access delay in read operations different replica may be
     maintain at different sites in the system
    one of the copies is called primary copy
    locking has to be set on the primary copy before it is accessed


To top