On Choosing a LoadBalancing Algorithm for Parallel Databases with TimeConstraints
Luís Fernando Orleans1, Geraldo Zimbrão1, Carlo Emmanoel Tola de Oliveira2
1
COPPE/UFRJ – Computer Science Department – Graduate School and Research in Engineering – Federal University of Rio de Janeiro NCE/UFRJ – Computer Science Department – Federal University of Rio de Janeiro
{lforleans,zimbrao}@cos.ufrj.br, carlo@nce.ufrj.br
2
1. Motivation
Loadbalancing is an almost exhausted theme for research. Lots of different strategies have been proposed, most of them targeting different scenarios. As the use of parallel databases has been increasing during the last years, it is inevitable to adopt some of these loadbalancing algorithms to distribute incoming transactions among the database servers. The question that remains open is how these loadbalancing strategies perform when applied to a transactional database. Do concurrency, isolation, replication constraints impact on their performances? Furthermore, supposing that transactions have to be executed before a deadline, can the loadbalancing strategy impact – positively or negatively – on the miss rate (number of broken deadlines per unit of time)? It is worth noticing that we do not propose a new loadbalancing algorithm, but our concern relies on how those loadbalancing strategies behave when: 1. They are applied in a complex scenario, such as relational databases with ACID transactions; and 2. Submitted tasks (or transactions) have to accomplish deadlines. Intuitively, the number of readonly transactions has a great impact on performance as they need to provide neither replication nor concurrency and isolation guarantees. For such transactions, deadlines can be satisfied through good query optimization – although it is still dependent on the current load of the system (stressed servers have higher response times). On the other hand, update transactions have to accomplish ACID properties. It is far more complicated than the above mentioned. Rows, tables or pages should have been locked prior transaction executions in order to guarantee isolation. Data modifications must be replicated on all sites to keep replicas consistent. It is hard to create a model that guarantees deadlines in this scenario, specially when the system is under high loads.
2. Experiments
To better understand the implications of ACID properties on loadbalancing algorithms, we performed an initial round of experiments using the TPCC benchmark (http://www.tpc.org/tpcc/) in a 2servers fullreplicated database. Also, to create a more realistic scenario, we modified the workload generation. According to related work [HarcholBalter, M. 2002], typical transactional workloads present hightailed properties, which is not the case of TPCC's default workload. We also made an extra modification on the TPCC's code, simulating an open model, i.e., transactions are sent
to the system according to a statistical distribution (e.g. an Exponential Distribution). For more informations about open and closed simulation models, please refer to [Schroeder, B.; Wierman, A.; HarcholBalter, M. 2006]. Due to space limitations, we do not extend the description of the experimental setup any further. Readers who are interested on this, please refer to [Orleans, L.F 2007]. 2.1. Midas Middleware The Midas middleware [Orleans, L.F. 2007] is a tool that uses the Proxy Design Pattern, thus intercepting commands sent from an application to a database server, adding extra functionalities – such as Admission Control, Replication and LoadBalancing. A simplified classdiagram is shown in figure 1.
Figure 1: Midas classdiagram. It uses Proxy and Strategy Design Patterns
2.2. LoadBalancing and Replication Algorithms So far, this work considered three loadbalancing algorithms that are different on their natures: 1. LeastWorkRemaining (LWR): it is an algorithm that takes into consideration the utilization of each server before dispatching a new task. According to [Nelson, Philips 1989] and [Nelson, Philips 1993], this algorithm is supposed to be optimal when variability of task durations is low – which is not the case in realworld scenarios. We chose this algorithm as an example of dynamic loadbalancing. 2. SizeInterval Task Assignment with Equal Load (SITAE): a sizeaware strategy, which means that the sizes (or durations) of transactions should be known a priori. For more information, please refer to [HarcholBalter, M.; Crovella, M.: Murta, C. 1999]. This algorithm is supposed to perform better when variability of task durations is high. We chose this algorithm to analyze the performance of a sizeaware technique with no deadline concerns.
3. Ondemand Restriction for Big Tasks (ORBITA): this algorithm is both size and deadlineaware. For more information about this algorithm, please refer to [Orleans, L.F., Furtado, P.N. 2007a]. To classify transactions in short or long we used a map. As TPCC has only five different transactions, we could store pounded mean response times (PMRT) for each one of them. Hence, each map entry was a pair {transaction name, PMRT}. If the PMRT value was greater (lower) than a threshold (1 second, arbitrarily chosen) then its corresponding transaction would be classified as long (short). To ensure replication, we used the primary copy strategy, since reference [Wiesmann, Schiper 2005] attests it should be the best choice for eager replication.
3. Results
The most important metric for this initial study was throughput of transactions within the deadline (TWD). In figures 2a and 2b, we can see that LWR has an awful performance, since its TWD is inexistent for long transactions. This is caused by the number of locks obtained by transactions to ensure ACID properties. As expected, both SITAE and ORBITA have better TWDs, but only the latter remains robust when arrival rate increases (SITAE's TWD of long transactions decreases to zero very fast as arrival rate increases). We can see that for small transactions, again the LWR strategy performs badly. This happens because both long and short transactions are mixed inside all servers. Intuitively, long transactions are I/Obound, generating a large amount of locks due to data modifications. For short transactions, both SITAE and ORBITA perform similarly. Interesting to notice, replication cost seems to have an almost insignificant impact on the server dedicated to execute the short transactions. It can be explained by the combination of two factors: 1. The replication module forwards only the writeset of the whole transaction. Thus, their execution on the targeted server is faster than on the original site; 2. Short transactions have isolation levels set to “READCOMMITED”. Furthermore, if a transaction is readonly, it is marked to not acquire locks.
8 7 6
T ransactions per Minute T ransactions per Minute
900 800 700 600 500 400 300 200 100 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Arrival Rate (T ransactions per Second) Arrival Rate (T ransactions per Second)
LWR SITA-E ORBITA
5 4 3 2 1 0
(a)
(b)
Figure 2: TWD of long transactions (a) and small transactions (b)
4. Earlier Conclusions and Future Works
This work proposes a study about the implications of ACID and TimeConstraint properties on loadbalancing strategies. Preliminary results showed that concurrency control, isolation and replication indeed impacts negatively on performance, but deeper study about the reasons and some possible solutions are missing. These remain as future works. Also, we intend to investigate how ORBITA can be made to adapt automatically to actual workloads and arrival rates. We also intend to work on a timeconstrained version of LWR and compare the approaches.
References
HarcholBalter, M. (2002), Task assignment with unknown duration.. Journal of the ACM HarcholBalter, M.; Crovella, M.: Murta, C. (1999), On choosing a task assignment policy for a distributed server system. Journal of Parallel and Distributed Computing, v.59 n.2, 204228. HarcholBalter, M.; Downey, A. (1997), Exploiting process lifetime distributions for dynamic loadbalancing. ACM Transactions on Computer Systems. Nelson, R.; Philips, T. (1993), An approximation for the mean response time for shortest queue routing with general interarrival and service times. Performance Evaluation, p.123139. Nelson, R.; Philips, T. (1989), An approximation to the response time for shortest queue routing. Performance Evaluation Review, p.181189. Orleans, L.F., Furtado P.N. (2007a), Fair LoadBalancing on Parallel Systems for QoS, ICPP International Conference on Parallel Processing, p. 22 Orleans, L.F., Furtado, P.N. (2007b), Optimization for QoS on WebServiceBased Systems with Tasks Deadlines, ICAS Third International Conference on Autonomic and Autonomous Systems,, p. 6 Orleans, L.F. (2007), “ORBITA: Uma Estratégia de Balanceamento de Carga para Tarefas com Restrições Temporais”, M.Sc. Dissertation, NCE/UFRJ. Schroeder, B.; Wierman, A.; HarcholBalter, M. (2006), Open Versus Closed: A Cautionary Tale. Network System Design and Implementation, San Jose, CA. Pp 239252. Wiesmann, M., Schiper, A. (2005), Comparison of Database Replication Techniques Based on Total Order Broadcast. IEEE Journal Transactions On Knowledge And Data Engineering, v. 17, n. 4, p. 551 – 566.