Concurrent Union A System for SQL-level Heterogenous Database
Secondaries used for venture capital and private equity limited partnership interest of the original investors in the market, because the original investors in the limited partnership intends to invest cash prior to the termination. The original variety of reasons investors may want to sell their shares held by private equity firms, such as the need for cash, change the investment strategy or focus, or need to re-balance the portfolio and so on. For investors, concerned about the sub-unit of the main advantages is that they do not have to as long as the IPO investors to invest in private equity funds.
Concurrent Union A System for SQL-level Heterogenous Database Replication Ben Vandiver May 17, 2005 System Interface The replication system, called ConcurrentUnion (CU), acts as an intermediary between a database appli- cation and the database replicas. Sequences of SQL statements, grouped into transactions, are submitted to the CU, which then forwards them on to the database replicas. Database applications interface with the CU by opening a network connection for each transaction. SQL statements are issued and the responses returned. When the transaction is complete, the application sends either a rollback or a commit signal. At this point, the CU either aborts the transaction or veriﬁes the correctness of the transaction execution against the replicas and commits the transaction. Any number of transactions may be in progress with the CU at any time. System Implementation The CU divides the database replicas into a primary replica and a set of secondary replicas. The CU ﬁrst executes each transaction on the primary replica, stopping after all the statements have been executed, but before committing the transaction. This will be called the commit point, and is important because all required locks have been acquired at this point. Furthermore, since the system is interactive, the results produced by executing each statement on the primary will be sent back to the database application. The CU knows that the commit point has been reached when all statements have been executed and the database application sends a commit signal for the transaction. Should errors be encountered in the SQL, these are detected running against only the primary, and the transaction discarded. After executing the transaction until the commit point on the primary, the transaction is executed to the commit point on the secondary replicas. The results of the executions on each replica are compared, and if they all match, the transaction is committed on the secondaries and then on the primary. If the results do not match, the majority result is declared the correct result and the transactions on the replicas which returned incorrect results are rolled back. If the primary generated incorrect results, the entire transaction must be aborted and an error reported to the database application. In order to recover from transient faults, the secondaries attempt to re-execute the transaction immediately. Should this retry fail, then a database repair operation commences. Transaction Ordering and Serializability In order to ensure that the data remains the same on every replica, each replica must execute and commit the transactions in an equivalent serial order. The primary executes all transactions at the serializable transaction isolation level, ensuring that it commits transactions in a serial order. Since transactions are 1 executed on the secondaries while all appropriate locks are held on the primary, two transactions running concurrently on the secondaries must not interact. Therefore, any commit order for these transactions on the secondaries is an equivalent serial order. The possibly of deadlock in the CU arises because of inter-database lock contention. Transaction A and B both modify the same row of a table. If A acquires the lock in database 1, but B acquires the lock in database 2, then neither transaction A or B will reach the commit point on all replicas. The deadlock detectors in the database do not detect this deadlock because the dependancies that cause it are present in the CU, not the database. If the primary and the secondaries all share the same granularity locking scheme, then no deadlocks will occur in the CU. Since all transactions are executed to the commit point on the primary before any statements are issued to the secondaries, the transaction will have acquired all the relevant locks on the primary when executing on the secondaries. In order for a deadlock to occur in the secondaries, two transactions must both need the same lock. However, since only one of the transactions could have acquired the lock on the primary, only one of the transactions would have reached commit point and been able to execute on the secondaries. This proof sketch can be extended to show that cycles larger than two transactions also will not penetrate the primary to cause deadlock on the secondaries. A transaction serialized by the primary after a previous transaction cannot overrun its predecessor on the secondaries. If the primary establishes the order A then B, if B could sneak ahead and overrun A on the secondaries, a deadlock could result. However, the B is not allowed to issue to the secondaries until A has committed on the primary. Since the primary commits after all the secondaries, A is guaranteed to have committed on all secondaries before B issues onto them. Finer granularity locking by the primary than the secondaries could result in deadlocks in the CU. Suppose two transactions A and B modify rows I and II, respectively, in some table. If the primary supports row-level locking, it will pass both transactions through to the secondaries (no conﬂicts). However, if the secondaries support page-level locking and rows I and II fall on the same page, then a deadlock could result. Larger granularity locking by the primary results in less concurrency, but less chance of deadlock. Assuming that the primary correctly serializes transactions, any deadlocks in the secondaries are the artifact of locking granularity. By running the secondaries at a very low isolation level, these deadlocks could be avoided without loss of correctness due to the serialization provided by the primary. However, if the primary malfunctions and doesn’t correctly serialize transactions, this scheme results in the CU unknowingly behaving incorrectly. Performance Other than the indirection overhead, the CU performs like a normal database until the commit is issued. The commit can take quite a long time, as the transaction must be re-run against the secondaries and the results compared. Assuming no tricks with locks, this process should take about the same time as it took the primary to execute the transaction. Thus, the total time to complete a transaction is expected to be about twice normal. Unfortunately, no pipelining can occur due to the requirement of holding the locks on the primary until the transaction has ﬁnished on the secondaries. Transactions that don’t interact run concurrently on the primary and then on the secondaries. Concurrent execution is limited in the presence of writes. Particularly bad is the case where many transactions perform a long duration read followed by an unrelated write. An optimal scheduler would realize that since no updates are happening to the tables that source the long duration read, this statement could be executed immediately on the secondaries without risk of deadlock or incorrect serialization. 2 Future Thoughts Incorrect Serialization Bugs in the locking code that allow simultanious access where none should occur are very hard to detect. It is impossible to detect this type of error by checking it against a database with larger lock granularity. The only recourse is to drop down to the data being serialized upon. Extract the primary keys of the rows/pages/tables being locked and verify that the set is a superset of the rows being modiﬁed. However, the data direct method is both slow and very hard to get accurate enough to matter compared to the expected rate of lock bugs. Statement-level Pass Through Instead of executing the whole transaction on the primary before passing it through to the secondaries, pass statements on to the secondaries after they ﬁnish executing on the primary. When a statement returns a result, all the appropriate locks must be held. Thus it is suitable to pass it on to the secondaries at this point. This modiﬁcation doesn’t really change the deadlock or correctness issues discussed above. It does lift most of the work performed by the secondaries to be concurrent with the primary execution. Thus, the commit phase requires only waiting for any straggling secondaries to ﬁnish up instead of a full execution. If the transaction is interactive, the round-trip latency of sending the response and receiving the next statement may be enough for the secondaries to execute the previous statement. Finally, the result comparison can be done incrementally, thus allowing the CU to work concurrently as well. This could also allow a fail-faster scheme where data errors are caught before the whole transaction ﬁnishes execution. 3