Concurrency Control in Distributed Databases_2_

Document Sample
Concurrency Control in Distributed Databases_2_ Powered By Docstoc
					Concurrency Control in Distributed

                    Gul Sabah Arif
 Concurrency control is the activity of
  coordinating concurrent accesses to a
  database in a multi-user database
  management system (DBMS).
 Several problems
1. The lost update problem.
2. The temporary update problem.
3. The incorrect summary problem.
 Serializability Theory.
Distributed DD Architecture.
 Transaction Manager
 Data Manager
 Scheduler

     DDBS Architecture   Processing Operation
Scheduling Algorithms
 Modify concurrency control schemes for use in
  distributed environment. There are 3 basic methods for
  transaction concurrency control.
   Locking (two phase locking - 2PL).
   Timestamp ordering
   Optimistic
   Hybrid
Locking Protocols
 Majority Protocol
 Local lock manager at each site administers
  lock and unlock requests for data items stored
  at that site.
 When a transaction wishes to lock an un
  replicated data item Q residing at site Si, a
  message is sent to Si „s lock manager.
   If Q is locked in an incompatible mode, then the
    request is delayed until it can be granted.
   When the lock request can be granted, the lock
    manager sends a message back to the initiator
    indicating that the lock request has been
Majority Protocol (Cont.)
   In case of replicated data
       If Q is replicated at n sites, then a lock request message must
        be sent to more than half of the n sites in which Q is stored.
       The transaction does not operate on Q until it has obtained a
        lock on a majority of the replicas of Q.
       When writing the data item, transaction performs writes on all
   Benefit
       Can be used even when some sites are unavailable

   Drawback
       Requires 2(n/2 + 1) messages for handling lock requests, and
        (n/2 + 1) messages for handling unlock requests.
       Potential for deadlock even with single item - e.g., each of 3
        transactions may have locks on 1/3rd of the replicas of a data.
Biased Protocol
 Local lock manager at each site as in majority
  protocol, however, requests for shared locks are
  handled differently than requests for exclusive locks.
 Shared locks. When a transaction needs to lock data
  item Q, it simply requests a lock on Q from the lock
  manager at one site containing a replica of Q.
 Exclusive locks. When transaction needs to lock
  data item Q, it requests a lock on Q from the lock
  manager at all sites containing a replica of Q.
 Advantage - imposes less overhead on read
 Disadvantage - additional overhead on writes
2 Phase Locking (2PL)
   Centralized 2PL.
   Primary copy 2PL.
   Distributed 2PL.
   Voting 2PL.
 Simulation Models for 2PL

              Simulation model of centralized 2PL
Simulation model of distributed 2PL
 Secure Time-Stamp Based Concurrency Control
  Protocol For Distributed Databases
   Security level is assigned to each transaction and data.
   common instances of totally ordered security levels are the Top-Secret
    (TS), Secret (S), Confidential(C), and Unclassified (U).
 System Model
   N number of sites, where each site Ni is having a secure database,
    which is a partition of global database scattered on all the N sites.
   The secure distributed database is defined as a five tuples < Dt ,Tt ,Ts ,
    Sc ,Lv >
         Dt is the set of data items,
         Tr is the set of distributed transactions,
         Ts is the timestamp
         Sc is the partially ordered set of security levels
          Lv is a mapping
Secure Time-Stamp Based Concurrency Control
    Security level Sci is said to dominate security level Sc j if Sc j <= Sci
    The security policy used enforces the following restrictions:
    Simple Security Property: A transaction T (subject) is allowed to read a data
     item (object) x , only if Lv (x) <= Lv (T ) .
    Restricted Property: A transaction T is allowed to write a data item x only if Lv
     (x) = Lv (T)
    System Architecture
    (GTr Global Transaction Manager) is a software module which translates and
     decomposes the transaction into subtransactions against local schemas, and
     coordinates the execution of the subtransactions.
    GTr Layers
1.   Transaction Interface
2.   Authentication Check Layer
3.   Security Level Assignment layer
4.   Data Manager and Transaction Manager Layer
5.   Data Access Tracker(DAT)
Secure Time-Stamp Based Concurrency
Control (cont)

 Local Transaction Manager (LTr)
   Sub transaction interface layer
   Sub Query Manager
   Data Administrator Layer
   Local Database
A Secure Concurrency Control Protocol

If ( RTs(x) > Tsi ) {
  Abor t ( Si ) ;
  } El seI f ( WTs(x) > Tsi ) {
Ignor e ( Si ) ;
}El seI f( Lv (x) == Lv (Si ) ) /* Lv (x)&Lv (Si ) is security level of data item x & transact ion Si
Wr itelockTo( x ) ;
Execut ion( x ) ;
WTs(x) = Tsi ;
Update DAT to Tsi ;
Abor t( Si ) ;/* acces s denied due to secur ity */}
Algorithm for read operation on data item x issued by sub transaction Si with Timestamp Tsi :
If (WTs(x) > Tsi ){
Abort( Si );
Rollback( Si );
}ElseIf( Lv (x) <= Lv (Si ) ){
ReadlockTo( x );
ExecuteOn( x );
RTs(x) = Tsi ;
Update DAT to Tsi ;
}Else{Abort( Si );
Rollback( Si );}
 Three basic technique and each can be used for rw or
  ww scheduling or both.
 Schedulers can be centralized or distributed.
 Replicated data can be handled in three ways (Do
  Nothing, Primary Copy, Voting).
 System R*
  Use a 2PL scheduler for rw and ww synchronization.
  The schedulers are distributed at the DM's. Replication
  is handled by the do nothing approach.
 Distributed INGRES
  INGRES uses primary copy for replication.
New Approaches to Concurrency
   Total Ordering
   Total ordering in networking terms describes the property of a network
    guaranteeing that all messages are delivered in the same order across
    all destinations.
   In combination with the concept of transactions, one can make use of
    this property to ensure that transactions are received in the same
    order at all sites — called the ORDER CC technique.
   Algorithm
   Each transaction is initiated by sending its reads and write predeclares
    to the corresponding schedulers as a single atomic action in totally
    ordered fashion.
   Each scheduler stores the received operation requests in a FIFO-type
   If read is at the head of the queue, it is immediately executed.
   transaction can now issue the write requests in accordance with the
    previously given predeclares.
   Upon commit, the committed values are send in non-ordered fashion
    to the schedulers, which re-place the corresponding predeclare
    statements in the queue with the received committed writes.
Timestamp Ordering Revisited
   Whenever a network layout provides predictability
    regarding the time at which a message will arrive at its
    destination, such as interconnection networks, this property
    can be exploited for concurrency control .
 Algorithm
   The transaction manager initiates a transaction by sending
    its reads and write predeclares to the corresponding
    schedulers as a single atomic action.
   This atomic action is assigned a timestamp t, denoting the
    time by which all operations will have arrived at their
    respective schedulers.
   When a scheduler receives an operation o, it can either wait
    until time t has arrived .
   The alternative option is to process o ahead of time t, and
    causing conflicting operations that arrive afterwards, but
    with a lower timestamp, to abort.
 Performance Comparison
   2PL, the standard technique used for centralised DBMSs,
    proves to perform rather poorly for distributed systems,
    whereas timestamp ordering based protocols in their various
    forms seem to provide the best overall performance.
   In 2PL, and other locking techniques as well, the deadlock
    prevention or detection in a distributed environment, which is
    much more complex and costly .
   Timestamp ordering techniques (TO) avoid deadlocks entirely.
   Basic TO (BTO) usually shows better overall performance in a
    distributed environment than 2PL.
   ORDER outperforms both 2PL and BTO, i.e. low network
    latency and an efficient implementation of the total ordering
    algorithm.For high network latencies, ORDER appears to be a
    rather disadvantageous approach.
   PREDICT shows basically the same advantages ORDER does.
 ” A Secure Time-Stamp Based Concurrency Control
  Protocol For Distributed Databases” Journal of Computer
  Science 3 (7): 561-565, 2007
 “Some Models of a Distributed Database Management
  System with Data Replication", International Conference
  on Computer Systems and Technologies -
   “A Sophisticated introduction to distributed database
    concurrency control”, Harvard University Cambridge,
 “Database system concepts”,from Silberschatz Mc-
  graw Hill 2001.

Shared By: