Transactions

Document Sample
Transactions Powered By Docstoc
					                       Transactions
• Concurrent execution of user programs is essential for good DBMS
  performance
     – Disk access is frequent and slow
     – Want to keep the CPU busy

• A user‟s program may carry out all sorts of operations on the data, but
  the DBMS is only concerned about what data is read from/written to
  the database

• Thus a transaction is the DBMS‟s abstract view of a user program: a
  series of reads/writes of database objects
• Users submit transactions, and can think of each transaction as
  executing by itself
• The concurrency is achieved by the DBMS, which interleaves actions
  of the various transactions

• Issues:
• Interleaving transactions, and
• Crashes!
                Goal: The ACID properties
• Atomicity: Either all actions are carried out, or none are
• Consistency: If each transaction is consistent, and the database is
  initially consistent, then it is left consistent
• Isolation: Transactions are isolated, or protected, from the effects of
  other scheduled transactions
• Durability: If a transactions completes successfully, then its effects
  persist

Atomicity

• A transaction can
      • Commit after completing its actions, or
      • Abort because of
            • Internal DBMS decision: restart
            • System crash: power, disk failure, …
• Unexpected situation: unable to access disk, data value, transaction
  interrupted in the middle could leave the database inconsistent
• DBMS needs to remove the effects of partial transactions to ensure
  atomicity: either all a transaction‟s actions are performed or none

• A DBMS ensures atomicity by undoing the actions of partial
  transactions
• To enable this, the DBMS maintains a record, called a log, of all
  writes to the database
• The component of a DBMS responsible for this is called the recovery
  manager
Consistency

• Users are responsible for ensuring transaction consistency
     – when run to completion against a consistent database
        instance, the transaction leaves the database consistent
• For example, consistency criterion that my inter-account-
  transfer transaction does not change the total amount of
  money in the accounts!

• Database consistency is the property that every transaction
  sees a consistent database instance. It follows from
  transaction atomicity, isolation and transaction consistency



Isolation

• Guarantee that even though transactions may be interleaved,
  the net effect is identical to executing the transactions serially
• For example, if transactions T1 and T2 are executed
  concurrently, the net effect is equivalent to executing
     – T1 followed by T2, or
     – T2 followed by T1
• NOTE: The DBMS provides no guarantee of effective order
  of execution
 Durability

 • DBMS uses the log to ensure durability
 • If the system crashed before the changes made by a
   completed transaction are written to disk, the log is used to
   remember and restore these changes when the system is
   restarted
 • Again, this is handled by the recovery manager



Schedules

 • A schedule is a list of actions from a set of transactions
      – A well-formed schedule is one where the actions of a
         particular transaction T are in the same order as they
         appear in T
 • For example
      – [RT1(a), WT1(a), RT2(b), WT2(b), RT1(c), WT1(c)] is
         a well-formed schedule
      – [RT1(c), WT1(c), RT2(b), WT2(b), RT1(a), WT1(a)] is
         not a well-formed schedule
 • A complete schedule is one that contains an abort or commit
   action for every transaction that occurs in the schedule
 • A serial schedule is one where the actions of different
   transactions are not interleaved
Serialisability

  • A serialisable schedule is a schedule whose effect on any
    consistent database instance is identical to that of some
    complete serial schedule
  • NOTE:
      – All different results assumed to be acceptable
      – It‟s more complicated when we have transactions that
         abort
      – We‟ll assume that all „side-effects‟ of a transaction are
         written to the database

  • E.g of a serializable schedule:

T1                   T2
R(A)
W(A)
                     R(A)
                     W(A)
R(B)
W(B)
                     R(B)
                     W(B)
                     commit
commit
     Anomalies with interleaved execution
  • Two actions on the same data object conflict if at least one of
    them is a write
  • We‟ll now consider three ways in which a schedule involving
    two consistency-preserving transactions can leave a
    consistent database inconsistent

  WR conflicts

  • Transaction T2 reads a database object that has been
    modified by T1 which has not committed. Such a read is
    called a dirty read.
Debit €100                                 Credit €100
from a                                     to b




T1: R(a),W(a),                  R(b),W(b),C
  T2:          R(a),W(a),R(b),W(b),C



             Read a and b
             and add 6%
             interest
RW conflicts

  • Transaction T2 could change the value of an object
    that has been read by a transaction T1, while T1 is still
    in progress

T1: R(a),            R(a), W(a), C
   T2:        R(a),W(a),C

WW conflicts

  • Transaction T2 could overwrite the value of an object which
     has already been modified by T1, while T1 is still in progress
  T1: [W(Britney), W(John)] “Set both salaries at £1m”
  T2: [W(John), W(Britney)] “Set both salaries at $1m”

  • But:

     T1: W(Britney),                W(John)
     T2:          W(John), W(Britney)

Serialisability and aborts

  • Things are more complicated when transactions can abort

  T1:R(a), W(a),                   Abort
  T2:          R(a),W(a),R(b),W(b),C
Lock based concurrency control

DBMS typically uses a locking protocol to achieve serializable
schedule.
A lock is a small book keeping object associated with a database
object.
A locking protocol is a set of rules o be followed by each
transaction to ensure that, even though actions of several
transactions might be interleaved, the net effect is identical to
executing all transactions in some serial order.

Strict two-phase locking

  • DBMS enforces the following locking protocol:
        – Each transaction must obtain an S (shared) lock before
          reading, and an X (exclusive) lock before writing
        – All locks held by a transaction are released when the
          transaction completes
        – If a transaction holds an X lock on an object, no other
          transaction can get a lock (S or X) on that object
  • Strict 2PL allows only serializable schedules
  • If two transactions are completely independent parts of the
    database, they concurrently obtain the locks they need and
    proceed on their ways.
  • If two transactions access the same object, and one wants to
    modify it, their actions are effectively ordered serially.
T1       T2

X(A)
R(A)
W(A)
X(B)
R(B)
W(B)
Commit
         X(A)
         R(A)
         W(A)
         X(B)
         R(B)
         W(B)
         commit


T1       T2

S(A)
R(A)
         S(A)
         R(A)
         X(B)
         R(B)
         W(B)
         commit
X(C)
R(C)
W(C)
Commit
Performance of Locking:

Locking schemes use two mechanisms
  1. Aborting
  2. Blocking

  Blocked transactions may hold locks that force other
  transactions to wait.
  Aborting and restarting a transaction obviously wastes the work
  done so far.



   throughput




                       # active transactions

                      Lock trashing

Throughput can be increased by :

  1. By locking the smallest sized objects possible.
  2. By reducing the time that transaction hold locks.
  3. By reducing the hot spots. A hot spot is a database object is
     frequently accessed and modified, and causes a lot of
     blocking delays. Hot spots can significantly affect
     performance.
Transaction support in SQL

Creating and Terminating Transactions:
A transaction is automatically started when a user executes a
statement that accesses either the database or the catalog such as a
SELECT query, an UPDATE command, or a CREATE TABLE
statement.
Once a transaction is started, other statements can be executed as
part of this transaction until the transaction is terminated by either
a COMMIT command or a ROLLBACK command.

Savepoint :
savepoint allows us to identify a point in a transaction and
selectively roll back operations carried out after this point.

  • In a long running transaction, we may want to define a series
     of savepoints.
  •
           SAVEPOINT <savepoint name>
A subsequent rollback command can specify the savepoint to roll
back to
           ROLLBACK TO SAVEPOINT <savepoint name>
  • If we define three savepoints A,B,and C in that order, and then
    rollback to A, all operations since A are undone, including the
    creation of savepoints B and C. savepoint A itself is undone when we
    rollback to it and we must reestablish it if we wish to be able to
    rollback to it again.
  • Locks obtained after savepoint A will be released when we roll back
    to A.
  • Chained transactions allow us to commit or rollback a transaction and
    immediately initiate another transaction. This is done by using the
    optional keywords AND CHAIN in the COMMIT and ROLLBACK
    statements.
What should we lock

Consider the following Query
SELECT S.rating, MIN(S.age)
FROM Sailor S
WHERE S.rating = 8
Suppose this query runs as part of transaction T1 and an SQL statement that
modifies the age of a sailor with rating 8, as part of T2, then what objects
should be locked by DBMS ?

   • Locking the entire table leads to decreased concurrency.
   • To increase concurrency, DBMS could set a shared lock on every row
     with rating=8, and set an exclusive lock on the row for the tuple
     getting modified.
   • DBMS can lock objects at different granularity. Entire tables can be
     locked or row level locks can be asserted. The latter approach offers
     better performance.

Phantom problem:
DBMS sets shared locks on every existing sailor row with rating=8 for T1.
This does not prevent transaction T3 from creating a new row with rating=8.
Hence T1 retrieves a collection of objects twice and sees different results
even though it does not modify any of these tuples itself.
Transaction characteristics in SQL

    • Three characteristics:
    1. Access mode
    2. Diagnostics size
    3. Isolation level
Diagnostics size determines the number of error conditions that can be
recorded.
Access mode can be a. READ ONLY b. READ WRITE
If the transaction is not allowed to modify the database, then access mode
should be set to READ ONLY else to READ WRITE

Isolation level controls the extent to which a given transaction is exposed to
the actions of other transactions executing concurrently.
Transaction Isolation Levels:

   1.   READ UNCOMMITED
   2.   READ COMMITED
   3.   REPEATABLE READ
   4.   SERIALIZABLE (highest degree of isolation)



SERIALIZABLE isolation level ensures that a transaction reads only the
changes made by the committed transactions, hence avoids the phantom
phenomenon. The transaction obtains locks before reading and writing
objects, including locks on sets of objects that it requires to be unchanged
and holds them until the end, according to strict 2PL

REPEATABLE READ : ensures that the transaction reads only the changes
made by the committed transactions and no value read or written by this
transaction is changed by any other transaction until it completes its
execution. This transaction could experience the phantom problem. E.g., T
examines all sailor records with rating=5 and another transaction might add
a new sailor record , which is missed by T. The locks are obtained before
hand but it locks only individual objects, not set of objects.
READ COMMITTED: ensured that T reads only the changes made by the
committed transactions, and that no value written by T is changed by any
other transaction until T is complete. However, a value read by T may well
be modified by another transaction while T is still in progress, and T is
exposed to the phantom problem.


READ UNCOMMITTED:
  - Does not obtain shared locks before reading objects.
  - Represents the greatest exposure to uncommitted changes of other
      transactions.
  - Transaction is required to have an access mode of READ ONLY.
  - It is not allowed to write objects.
  - It never makes any lock requests.
The isolation level and access mode can be set using the SET
TRANSACTION command.

Eg: SET TRANSACTION ISOLATON LEVEL SERIALIZABLE READ
ONLY

When a transaction is started, the default is SERIALIZABLE and access
mode is READ WRITE.
Stealing Frames and Forcing Pages


Stealing Frames :
The changes made by a transaction T to an object O are written to the disk
before the transaction commits when another transaction wants to bring in a
page and the buffer manager chooses to replace the frame containing O. To
do so, this page must have been unpinned by T. Such a approach is called
steal approach.

Forcing Pages: when a transaction commits, all changes it has made to
objects in the buffer pool are immediately forced to disk.

   • If no-steal approach is used, then the recovery manager need not
      undo the changes of an aborted transaction and if a force approach is
      used, the recovery manager need not redo the changes of a committed
      transaction if there is a subsequent crash.
   • The no-steal approach assumes that all pages modified by ongoing
      transactions can be accommodated in the buffer pool, and in presence
      of large transactions, this assumption is unrealistic.
   • With a no-force approach, the in-memory copy of the page would be
      successfully modified and written to disk just once, reflecting the
      effects of all 20 updates, when the page is eventually replaced in the
      buffer pool.
Most systems use a steal, no-force approach. Thus, if a frame is dirty and
chosen for replacement, the page it contains is written to disk even if the
modifying transaction is still alive, in addition, pages in the buffer pool that
are modified by a transaction are not forced to disk when the transaction
commits.
                   Concurrency control
How locking protocol ensures serializability??

Serializability

Two schedules are said to be conflict equivalent if they involve the
same actions of the same transactions and they order every pair of
conflicting actions of two committed transactions in the same way.

A schedule is conflict serializable if it is conflict equivalent to
some serial schedule.


T1          T2          T3
                                                  T1                  T2

R(A)
            W(A)
            Commit
                                                  T3
W(A)
Commit
                        W(A)                       Precedence graph
                        Commit

The precedence graph contains:
  1. A node for each committed transactions in S.
  2. An arc from Ti to Tj if an action of Ti precedes and conflicts
     with one of Tj‟s actions.
A variant of Strict 2PL called Two-Phase Locking relaxes the
second rule to allow transactions to release locks before the
commit or the abort action. Here the second rule is replaced by

A transaction cannot request additional locks once it releases any
lock.
Every transaction has a Growing Phase and a Shrinking Phase.
Nonstrict 2PL also ensures acyclicity of the precedence graph.

View serializability

Conflict serializability is sufficient but not necessary for
serializability. A more general condition is view serialiazability.

Two schedules are view equivalent if :

   1. If Ti reads the initial value of object A in S1, it must also read
      the initial value of A in S2.
   2. If Ti reads a value of A written by Tj in S1, it must also read
      the value of A written by Tj in S2.
   3. For each data object A, the transaction that performs the final
      write on A in S1 must also perform the final write on A in
      S2.
A schedule is view serializable if it is view equivalent to some
serial schedule. Every conflict serializable schedule is view
serializable.
                        Lock Management

Lock manager:
       keeps track of the locks issued to transactions.
     • Maintains a lock table , a hash table with the data object
        identifier as the key.
     • Maintains a descriptive entry for each transaction in a
        transaction table. From the transaction table a pointer
        points to the list of locks held by the transaction.
     • This list id checked before requesting a lock to ensure that
        a transaction does nor request the same lock twice.


Lock table contains :

  • The number of transactions currently holding a lock on the
    object
  • The nature of lock
  • A pointer to a queue of lock requests
Implementation of lock


Before a transaction T reads or writes a database object O, it must
obtain a shared or exclusive lock on O and must hold on to the lock
until it commits or aborts.
  • If a shared lock is requested, the queue of requests is empty,
       and the object is not currently locked in the exclusive mode,
       lock manager grants the lock and updates the lock table entry
       for the object.
  • If an exclusive lock is requested and no transaction currently
       holds a lock on the object, the lock manager grants the lock
       and updates the lock table entry.
  • Otherwise, the requested lock cannot be immediately granted
       and the lock request is added to the queue of lock requests for
       this object. The transaction requesting the lock is suspended.
  • When a transaction aborts or commits, it releases all its locks.
       When a lock is released, the lock manager updates the lock
       table entry for the object and examines the lock request can
       now be granted , the transaction that made the request is
       woken up and given the lock. If several requests for a shared
       lock on the object are at the front of the queue, all of these
       requests can now be granted.

Lock and unlock must be implemented as atomic actions.
Latches and Convoys:

Setting a latch before reading or writing a page ensures that the
physical read or write operation is atomic. Latches are unset
immediately after the physical read or write operation is
completed.

A transaction T holding a heavily used lock may be suspended by
the operating system. Until T is resumed, every other transaction
that needs this lock is queued. Such queues are called convoys,
once formed tends to be stable.

Lock conversions

  • A transaction can acquire an exclusive lock on an object for
    which it already holds a shared lock.
  • Eg: a SQL update statement could result in shared locks
    being set on each row in a table. If a row satisfies the
    condition for being updated, an exclusive lock must be
    obtained for that row.
  • Lock upgrade is granted immediately if no other transaction
    holds a shared lock on the object and inserting the request at
    the front of the queue because it already holds a shared lock
    on the object and queuing it behind another transaction that
    wants an exclusive lock on the same object causes a
    deadlock. But this approach does not prevent deadlocks
    caused by two conflicting upgrade requests. Eg., if two
    transactions that hold a shared lock on an object both request
    an upgrade to an exclusive lock, leads to a deadlock.
To overcome the disadvantages of lock conversion , the current
implementations use a third type of lock called the update lock.
By setting an update lock initially, we can prevent conflicts with
other read operations. Once we are sure we need not update the
object, we can downgrade to a shared lock. If we need to update
the object, we must first upgrade to an exclusive lock. This
upgrade does not lead to a deadlock, because no other transaction
can have an upgrade or exclusive lock on the object.
                               Deadlocks
Consider two transactions T1 and T2 . T1 sets an exclusive lock on the
object A, T2 sets an exclusive lock on B, T1 requests an exclusive lock on B
and is queued, and T2 requests an exclusive lock on A and A is queued.
Such a cycle of transactions waiting for locks to be released is called a
deadlock.

In a typical environment, deadlock tends to be rare and involves very few
transactions. Hence DBMS periodically checks for deadlocks.
To detect deadlocks, the lock manager maintains a structure called a wait-for
graph. The nodes correspond to active transactions and there is an arc from
Ti to Tj if Ti is waiting for Tj to release a lock.



T1 T2 T3 T4
                                           T1                T1
S(A)
R(A)
     X(B)
     W(B)
S(B)
          S(C)                             T1                T1
          R(C)
     X(C)
               X(B)
                                                Wait-for graph
             X(A)

DBMS checks the wait-for graph perioadically for cycles, which indicate
deadlock. A deadlock is resolved by aborting a transction that is on a cycle
and releasing its locks. The choice of which transaction to abort can be made
on several criteria : the one with fewer locks, the one that has done the least
work, the one that is farthest from completion and so on.
An alternative to maintaining the wait for graph is to use timeout mechanism
for detecting deadlocks. If a transaction has been waiting for too long for a
lock, we assume that it is in a cycle and abort it

Deadlock prevention:
Deadlocks can be prevented by giving each transaction a priority and
ensuring that lower priority transactions are not allowed to wait for higher
priority transactions and vice versa.
Give each transaction a timestamp when it starts up. The lower the
timestamp, the higher is the transaction‟s priority that is , the oldest
transaction has the highest priority.

If a transaction Ti requests a lock and transaction Tj holds a conflicting lock,
the lock manager can use one of the following policies:

Wait-die: If Ti has higher priority, it is allowed to wait, else it is aborted.

Wound – wait : if Ti has higher priority, abort Tj, else Ti waits

In the wait-die scheme, lower priority transactions can never wait for higher
priority transactions. In the wound-wait scheme, higher priority transactions
never wait for lower priority transactions. In either case, no deadlock cycle
develops.

When a transaction is aborted and restarted, it should be given the same
timestamp it had originally. This ensures that each transaction will
eventually become the oldest transaction, and therefore the one with the
highest priority, and will get all the locks it requires.

The wait-die scheme is non preemptive – only a transaction requesting a
lock can be aborted. As the transaction grows older and its priority increases,
it tends to wait for more and more younger transactions.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:8/12/2011
language:English
pages:23