Concurrent Union A System for SQL-level Heterogenous Database

W
Description

Secondaries used for venture capital and private equity limited partnership interest of the original investors in the market, because the original investors in the limited partnership intends to invest cash prior to the termination. The original variety of reasons investors may want to sell their shares held by private equity firms, such as the need for cash, change the investment strategy or focus, or need to re-balance the portfolio and so on. For investors, concerned about the sub-unit of the main advantages is that they do not have to as long as the IPO investors to invest in private equity funds.

Shared by: jlhd32
-
Stats
views:
2
posted:
7/1/2011
language:
English
pages:
3
Document Sample
scope of work template
							                         Concurrent Union
      A System for SQL-level Heterogenous Database Replication

                                              Ben Vandiver

                                              May 17, 2005



System Interface
The replication system, called ConcurrentUnion (CU), acts as an intermediary between a database appli-
cation and the database replicas. Sequences of SQL statements, grouped into transactions, are submitted
to the CU, which then forwards them on to the database replicas. Database applications interface with the
CU by opening a network connection for each transaction. SQL statements are issued and the responses
returned. When the transaction is complete, the application sends either a rollback or a commit signal.
At this point, the CU either aborts the transaction or verifies the correctness of the transaction execution
against the replicas and commits the transaction. Any number of transactions may be in progress with the
CU at any time.


System Implementation
The CU divides the database replicas into a primary replica and a set of secondary replicas. The CU first
executes each transaction on the primary replica, stopping after all the statements have been executed, but
before committing the transaction. This will be called the commit point, and is important because all
required locks have been acquired at this point. Furthermore, since the system is interactive, the results
produced by executing each statement on the primary will be sent back to the database application. The CU
knows that the commit point has been reached when all statements have been executed and the database
application sends a commit signal for the transaction. Should errors be encountered in the SQL, these are
detected running against only the primary, and the transaction discarded.
    After executing the transaction until the commit point on the primary, the transaction is executed to the
commit point on the secondary replicas. The results of the executions on each replica are compared, and
if they all match, the transaction is committed on the secondaries and then on the primary. If the results
do not match, the majority result is declared the correct result and the transactions on the replicas which
returned incorrect results are rolled back. If the primary generated incorrect results, the entire transaction
must be aborted and an error reported to the database application. In order to recover from transient faults,
the secondaries attempt to re-execute the transaction immediately. Should this retry fail, then a database
repair operation commences.


Transaction Ordering and Serializability
In order to ensure that the data remains the same on every replica, each replica must execute and commit
the transactions in an equivalent serial order. The primary executes all transactions at the serializable
transaction isolation level, ensuring that it commits transactions in a serial order. Since transactions are


                                                      1
executed on the secondaries while all appropriate locks are held on the primary, two transactions running
concurrently on the secondaries must not interact. Therefore, any commit order for these transactions on
the secondaries is an equivalent serial order.
   The possibly of deadlock in the CU arises because of inter-database lock contention. Transaction A and
B both modify the same row of a table. If A acquires the lock in database 1, but B acquires the lock
in database 2, then neither transaction A or B will reach the commit point on all replicas. The deadlock
detectors in the database do not detect this deadlock because the dependancies that cause it are present in
the CU, not the database.
   If the primary and the secondaries all share the same granularity locking scheme, then no deadlocks will
occur in the CU. Since all transactions are executed to the commit point on the primary before any statements
are issued to the secondaries, the transaction will have acquired all the relevant locks on the primary when
executing on the secondaries. In order for a deadlock to occur in the secondaries, two transactions must
both need the same lock. However, since only one of the transactions could have acquired the lock on the
primary, only one of the transactions would have reached commit point and been able to execute on the
secondaries. This proof sketch can be extended to show that cycles larger than two transactions also will
not penetrate the primary to cause deadlock on the secondaries.
   A transaction serialized by the primary after a previous transaction cannot overrun its predecessor on
the secondaries. If the primary establishes the order A then B, if B could sneak ahead and overrun A on
the secondaries, a deadlock could result. However, the B is not allowed to issue to the secondaries until A
has committed on the primary. Since the primary commits after all the secondaries, A is guaranteed to have
committed on all secondaries before B issues onto them.
   Finer granularity locking by the primary than the secondaries could result in deadlocks in the CU.
Suppose two transactions A and B modify rows I and II, respectively, in some table. If the primary supports
row-level locking, it will pass both transactions through to the secondaries (no conflicts). However, if the
secondaries support page-level locking and rows I and II fall on the same page, then a deadlock could result.
Larger granularity locking by the primary results in less concurrency, but less chance of deadlock.
    Assuming that the primary correctly serializes transactions, any deadlocks in the secondaries are the
artifact of locking granularity. By running the secondaries at a very low isolation level, these deadlocks
could be avoided without loss of correctness due to the serialization provided by the primary. However, if the
primary malfunctions and doesn’t correctly serialize transactions, this scheme results in the CU unknowingly
behaving incorrectly.


Performance
Other than the indirection overhead, the CU performs like a normal database until the commit is issued.
The commit can take quite a long time, as the transaction must be re-run against the secondaries and the
results compared. Assuming no tricks with locks, this process should take about the same time as it took
the primary to execute the transaction. Thus, the total time to complete a transaction is expected to be
about twice normal. Unfortunately, no pipelining can occur due to the requirement of holding the locks on
the primary until the transaction has finished on the secondaries.
    Transactions that don’t interact run concurrently on the primary and then on the secondaries. Concurrent
execution is limited in the presence of writes. Particularly bad is the case where many transactions perform a
long duration read followed by an unrelated write. An optimal scheduler would realize that since no updates
are happening to the tables that source the long duration read, this statement could be executed immediately
on the secondaries without risk of deadlock or incorrect serialization.




                                                      2
Future Thoughts

Incorrect Serialization

Bugs in the locking code that allow simultanious access where none should occur are very hard to detect.
It is impossible to detect this type of error by checking it against a database with larger lock granularity.
The only recourse is to drop down to the data being serialized upon. Extract the primary keys of the
rows/pages/tables being locked and verify that the set is a superset of the rows being modified. However,
the data direct method is both slow and very hard to get accurate enough to matter compared to the expected
rate of lock bugs.


Statement-level Pass Through

Instead of executing the whole transaction on the primary before passing it through to the secondaries, pass
statements on to the secondaries after they finish executing on the primary. When a statement returns a
result, all the appropriate locks must be held. Thus it is suitable to pass it on to the secondaries at this point.
This modification doesn’t really change the deadlock or correctness issues discussed above. It does lift most
of the work performed by the secondaries to be concurrent with the primary execution. Thus, the commit
phase requires only waiting for any straggling secondaries to finish up instead of a full execution. If the
transaction is interactive, the round-trip latency of sending the response and receiving the next statement
may be enough for the secondaries to execute the previous statement. Finally, the result comparison can be
done incrementally, thus allowing the CU to work concurrently as well. This could also allow a fail-faster
scheme where data errors are caught before the whole transaction finishes execution.




                                                        3

						
Related docs
Other docs by jlhd32
Deputy Sheriff _50_ Cooper Standard_
Views: 5  |  Downloads: 0
Body Contouring by Liposculpture
Views: 46  |  Downloads: 0
Hockey Boys 1011 results
Views: 7  |  Downloads: 0
Cartelettes menu Spa - Milan GB.indd
Views: 13  |  Downloads: 0
Rally for Romanoff Denver Invite
Views: 8  |  Downloads: 0
CONSOLIDATED INCOME STATEMENT_3_
Views: 2  |  Downloads: 0