VIEWS: 23 PAGES: 5 CATEGORY: Research POSTED ON: 11/29/2012 Public Domain
ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Clouds using the concept of Application Metadata Bharath C. and Spoorthy A. Raman International Institute of Information Technology, Bangalore, India Email: {bharath.c, spoorthya.raman}@iiitb.net Abstract—Cloud computing is an emerging computing scalability is poor on larger systems. The other approach is paradigm. It aims to share data, resources and services to try and make estimations to somehow predict the future transparently among users of a massive grid. Although the system state and then behave in a pre-coded manner as industry has started selling cloud-computing products, required as in [7, 8]. This is a far efficient solution but, it research challenges in various areas, such as architectural design, task decomposition, task distribution, load is rather complicated in terms of its algorithmic distribution, load scheduling, task coordination, etc. are still complexity and ease of modeling. But, all said and done unclear. Therefore, we study the methods to reason and model - the grass roots of this approach is elegant and cloud computing as a step towards identifying fundamental imposing- and this in fact seems to be the future in research questions in this paradigm. In this paper, we propose this domain, more so, because of the problem in the a model for load distribution on cloud computing by modeling classical approach that can drive to dramatic increase of them as cognitive systems and using aspects which not only need for resources. In another words the achieved speed- depend on the present state of the system, but also, on a set of up in the classical approach is either very low or worst. predefined transitions and conditions. The entirety of this This happens due to the workstation user comportment, model is then bundled to cater the task of job distribution using the concept of application metadata. Later, we draw a which can decide to change the computing needs at any qualitative and simulation based summarization for the moment. For this reason remote tasks are unacceptably proposed model. We finally evaluate the results and draw up delayed and might even be considered as dead by some a series of key conclusions in cloud computing for future elements and therefore would increase the need for the exploration. cloud elements to compute the task locally or send to other low charged station elsewhere. One way of Index Terms—Cloud Computing, Load Balancing, Application Metadata. estimating the workstations load is by using various statistical approaches such as load functions which are obtained through repeated measurements and a large I. INTRODUCTION AND RELATED WORK number of apriori experimentations. These functions often The problem of load balancing has been ever have a Gaussian aspect, like many other models of natural challenging, be it in terms of a mechanical machine or be it processes. But the basic flaw in these methods is that distributed systems and cloud-computing systems [1]. This they ignore the very cause of the load of a workstation, is attributed to the fact that these systems have polymorphic which is dependent on the behavior of the individual user. requirements that span over different domains and domain Since, both types of algorithms mentioned above parameters. Cloud computing, in particular, has had a fail to solve practical problems, there must be some wide range of solutions to this problem, ranging from the suitable variations made to them to appreciate their strata involving operating systems, to distributed capabilities. In this paper, we modify algorithms of the application design, to designing high level game theoretic first type which include continuous monitoring of the optimizations and even building machines to host system state. But in addition to monitoring, there are a computing sub-trees (similar to a remote computing set of transition rules specified for the system states machine) [1]. However, even though these models are based on the type of the inputs received. So given an different morphologically, they all have had a common input a, when the present state of the system is q1, goal of increasing the economic efficiency and providing then the system goes to state q2. But a simple supplementary computing power to a cloud of users when it transition of the above type is not possible without some is needed. These methods also have the same problem to apriori knowledge of the input a. This knowledge is sort out, which is to “Optimally Balance and Distribute given in terms of parameters mentioned in the Loads”. application metaheader that accompany every task that The classic approach to solving this problem is to use are to be processed on the cloud. The major parameters algorithms, which try to solve the problem by analyzing the included for the efficient working of the system include current state of the system as worked by Shivaratri et the number of processor cycles required for applications al. in [4]. This method has some advantages, but, it also execution as observed during development and testing has several disadvantages like the need for a medium or phases, the total memory occupied by it while executing large computing power and due to these problems, the © 2010 ACEEE 23 DOI: 01.IJNS.01.03.85 ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 and architecture based metrics. These parameters further system is already loaded, the load balancer must schedule help in defining an optimal transition function for the the new process as well as balance the load globally across system. Thus, the system is able to cognitively think as to the system. The value t in the Equation(2) R (to reflect what should be done with the load and where and how it the situation at a certain moment) and k is termed the must be scheduled to balance the entire system load. fatigue factor and is in the closed interval [0, 1]. The value k can be represented as follows (As per Equation (3)). II. SYSTEM MODEL A. The Application Metaheader Model Here, k is the Fatigue factor,is the Present Performance Application Metaheader is a concept that is conceived by Capacity of the CPU which is basically a complex us. It deals with imbibing the apriori knowledge about the statistically determined function that is dependent on cloud application in consideration. The overview of this factors like the age of the CPU, its average fan speed, clock technique is to affix a compulsory, descriptive and a speed, average fetch time etc., and are the present structured metaheader to each of the incoming cloud application (analogous to the well known TCP/IP headers). temperature and Normal (Optimal) temperature rating of Formally this technique could be worded as follows: the CPU respectively and is the deterioration quotient Let S be the header of an incoming cloud application . S is which is determined by real life temperature tests and is modeled as a set containing the sub-application parameters provided by the vendor.(Its calculated by running the CPU as shown at a constant 60oC, then 70oC, and the resultant decrease in CPU life determines (Explained as in [3]). Of course, apart from these other indicators may be considered for a Where, denotes the ith sub-application parameter of the detailed analysis like video memory, virtual memory and n sub-application parameters constituting the cloud so on as depicted in the following relation: application . Now, each inturn is composed of a set of Now each of these values is dependent on the user the following task associated data. It could be behavior, thereby leading to various values of the sub- mathematically formulized as an ordered set as follows: application parameters of different tasks. Given these relations and function for the system, the load balancer algorithm must be a function of the following type: Where, Estimated percentage of processer usage that is needed for the sub-application i as observed on an architecture . (Multiple such values of are present for This equation basically states that, given a present state different architectures of processors.) of the system and an input task with the sub-application - Estimated memory utilized by the process in bytes. metaheader given as , and the present load on the system - Description of each sub-application component along being , then the load balancer acts “cognitively” such that with its length and displacement from the beginning of the it puts the system state to and the load being. code. These above equations illustrate the situation for an The source of the sub-application and application incremental load on the system since its start. Now, let us related data are obtained during the development and model the case of a decremental load, where the load testing phase of the cloud application. They are affixed to decreases in the system. This is modeled as another the code before it is run on the cloud system. They form the incoming task with the values of the sub-application metaheader since they are the meta-knowledge required to metaheader reading all zeros for the values of. Now that provide information about the application in the system. we have values of the metaheader, let us see how the total B. The Mathematical Model system load is recalculated. Let be the load of the system when the sub-application Starting with the mathematical modeling of the system, with header was executing. Let the load contributed by the load of the system is defined as follows: the sub-application be . So, the total load after the sub-application in consideration exits will be as follows: The system load of a cloud computing system is defined as the sum of loads on each processing node forming the cloud that is determined by the processor activity. The This change is triggered once there is an immediate values of are determined by the “on” or “off” state of the change in the processor and memory utilization of a particular node. Let us associate a value of 1 if is on and a particular node. 0 otherwise. The load on each node, according to [9], is given by: III. DETAILED ALGORITHM Since we have got a basic mathematical model of the Now, there are two ways of looking at in the above algorithmic prototype, our next aim would be to wire these equation. In the start state of the entire system, the values mathematical models to the working of the proposed of will be zero since there is no load on the system. In this algorithm. Our basic objective as discussed earlier is to case, the load balancer is expected to do an optimal load derive a new state for the total load in a cloud, by scheduling before balancing. On the other hand, when the intelligently modeling the transition rules and transition © 2010 ACEEE 24 DOI: 01.IJNS.01.03.85 ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 parameters. For, this we define three basic criterion of parameterization: Current-State-of-System , Affinity-to- work ,the-possible-system-fatigue-state k and the expected- dominant-behavior-rule , which maps the objective parameterized values of the load to a more generic abstract defined set of three labels “High”, “Moderate” and “Low”. The importance of the sub application metaheader is demonstrated by arriving at a suitable value of the Affinity to work parameter using the load contribution as mentioned in the metaheader. Since the value of is dependent on the affinity to work and the incoming application, the arriving decision can be assumed to be accurate and optimal. To begin the modeling process from the scratch, let us now consider some basic behavioral rules from the user Figure 2: A simple state transition diagram for mapped tasks. point of view which would be to define a set of rules in the To model the global system, we must try to integrate all form of “If the current-state-of-system is , when the these basic user behavior and form a higher order fatigue is k and the affinity-to-work is , then the favorable abstraction and approximate them into a global behavior. behavior would be ” . The above mentioned concepts are shown in Figure 1 as a state transition diagram as IV. WORKING OF THE MODEL shown below. After the mathematics and the state diagrams behind the proposed model, it now becomes important for us to know how exactly the entire algorithm is supposed to be working. We will then carry forward this “ideal” expectation to the simulation phase to compare the correctness of the model. For the working of this model the basic requirements in terms of the inputs would be the values τ_start (Start Time of work), τ_work (Total time of work), etc. along with the dependencies that are needed to calculate the values like k, Ω, etc. The functions that are binary in behavior take in values equivalent to “on” and “off” which we have modeled to be identified with a crisp values like 1 and 0. On the other hand if we consider the values for subjective entities like fatigue, affinity-to-work, etc. we cannot model them to be along the lines of either being a 0 and 1 or “low” and “high” - For this we might make a simulation Figure 1: A simple state transition diagram for the user behavior model. phase assumption that a “low” motivation may be Having the user behavior model and the transition rules, considered if the value of the function previously defined there could also be a generalization performed on the above X<X_threshold, where the value of is purely model that maps the node states of node 1 to 5 to daily based on the aggressiveness and the granularity that is performed common computing tasks that would be the demanded from the algorithm (for example for our ideal clients to run on a cloud computing system. These simulation we have considered a to be at about generalizations are made keeping in account the broad 30% for each scale jump). It must also be emphasized that resource utilization pattern of these ubiquitous task. For all those values and types of activities are arbitrary, and e.g. if there is a system state such that the processor only studying real users in real working situations can utilization is high and the memory utilization is also high, produce a valid ecological model. we conclude that there has been a heavy computation Now, with the help of our user behavior and system intensive task being accomplished at that node. behavior models, we can then approximate the loads of the Similarly, if the memory utilization is moderate and corresponding workstations in the cloud. Three load processor utilization is low, it could be an Input/output functions were selected as follows: intensive task like a document processing task. These i. State, which shows whether the workstation is on values of high, low and moderate are obtained as a result of or off. the metaheader knowledge used cognitively to define the ii. Processor, which shows the processor activity. system state. Figure 2 shows the state transition diagram iii. Memory, which shows the memory use on the for the Figure 1 based on the task mapping using the state workstation. parameters and characteristics. These functions can then be defined in a fuzzy manner, Now that we have modeled the user based behavior for as presented in Figure 2, with the help of the types of some basic state clauses, we can now extrapolate the user behavior already computed. We then apply the behavior to work in tandem with the system behavior. characteristics of using previously mentioned functions and mathematical models in a simulated environment to find © 2010 ACEEE 25 DOI: 01.IJNS.01.03.85 ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 out the working of the model. During the phase of simulation- there were a number of key efficiency enumerations made. Since algorithms in most of the literature solely use the instantaneous run-queue length (i.e. the number of jobs being served or waiting for service at the sampling instant) as the load index of a computing node, it is slightly based on the “do not overload others” intuition. The run-queue length may be a good load index if we assume that all the nodes of the system are homogeneous and the inter-node communication delay is negligible or constant. However, it is not a reliable load indicator in a heterogeneous environment, since; it ignores the variations in computing power. Thus, we made modifications to these two and decided upon using an accumulative job execution load based algorithm. Figure 3: Load Prediction Test V. SIMULATIONS AND OBTAINED RESULTS B. Simulation #2 In order to verify our model an application was designed The second test involved a method to test the load and implemented using a combination of Perl, NetSim-2, balancing ability of the proposed model. The simulation’s AWK Scripting and Java development platform. In our result is shown below via the Graph (in Figure 4). Here, the application the user behavior characteristics are modeled red line represents the actual load that’s applied on the using the following data values: cloud, whereas the green line represents the balanced load of the system. TABLE I: ASSUMPTIONS DURING SIMULATIONS Data Value 8+-2 a.m. 8+-1 (in hours) 21 (in hours) 30% (each stage) up to 90% Number of Computing 5 (Heterogeneous 2+2+1) Elements 600C (arbitrary) a 1 (arbitrarily chosen) The descriptions of simulations along with their Figure 4: Load Balancing Test corresponding results are written below. A detailed analysis of the graph shows us that the result A. Simulation #1 has been positive. We observed that the system did not allow the load to cross the 85% mark, even though the The first test that was performed via simulation was to actual load applied was at about 98%. It was also observed test the load predicting ability of the proposed model. For that the load was brought down considerably at very high this we simulated a processor which in the Graph (Figure loads (by about 40% when actual load is about 98%), but, 3) is represented by a red line. We then ran our algorithm when the load is low the balancing capability reduces to simultaneously to predict the load, which in the Graph less than 5% (perhaps, due to the fact that the model (Figure 3) is represented by a green line. basically concentrates at high load systems and because of A detailed analysis of the graph gave a positive result the fact that we have capped the overload buffer to around with respect to the ability of the model to predict the load. 90%. We found that the initial latency of the model to predict the load jump of the first 100% spike was about 1.7 seconds C. Simulation #3 and we observed that at a higher load the prediction is The final simulation was to test how the system behaves usually higher by about 6%, which we felt was a good as integration of elements. The results of this test are observation. On the other hand, the downward fall was depicted in the Graph (show in Figure 5). The basic testing predicted at a very fast pace (perhaps, due to its affinity to criterion here was not a normal one. In order to test the sense the fatigue levels). It could predict a downward fall integral performance along with the flexibility, we decided of about 18.67% in about 0.5 seconds. to simulate three systems and run two of them(shown as red and green line respectively in Figure 5) in such a way that they are “on” from t=0 to t=10 and after t=10 they are © 2010 ACEEE 26 DOI: 01.IJNS.01.03.85 ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 switched-off (i.e. removed from the cloud ) and the third more easily. The simulation results are in good agreement one is “idle” in the interval [0,10] and then after 10 we load with this model. the third(shown as a blue line in the Figure 5) to work at However, it is very important for us to test the system as 100\% load. a whole in a real world model. This would perhaps need a slightly different approach, more so, because of the “bystander” behavior [10] of the cognitive elements in the real world. It is also important for us to extrapolate this model in terms of more heterogeneity, more load variance and model it to provide a better transient behavior in balancing loads. This would form a major cluster of work that could be done in times to come. REFERENCES [1] Ian Foster (Editor), Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure, The Elsevier Series in Grid Computing, Morgan Kaufman, 2005. [2] S. Zhou, et al., “A trace-driven simulation study of dynamic load balancing”, IEEE Transactions on Software Figure 5: Integration and Flexibility testing Engineering, 14(9): pp.1327-1341, Sept. 1988. [3] Xbit Labs, X-bit labs Investigation: Influence of Intel The result was totally in-sync with our expectation. A Pentium 4 Core Temperature on CPU Performance, careful analysis of the graph (Figure 5) shows that even Available at http://www.xbitlabs.com/articles/cpu/display though the third system (blue line) is idle in the interval /p4-temp.html. [0,10], it still shows a positive value of load in it (Perhaps, [4] Shivaratri NG, Krueger P, Singhal M. “Load distributing for the load was distributed to it to balance the load in the first locally distributed systems”, ComputerMagazine, 1992; pp. system(depicted by a red line) and after 10th second we 33-44. observed that 3rd system continues to work at 100% [5] Zomaya, A.Y. and Yee-Hwei Teh, “Observations on using genetic algorithms for dynamic load-balancing”, IEEE showing lack of balancing. Transactions on Parallel and Distributed Systems, Volume 12, Issue 9, Sept. 2001 Page(s):899-911. CONCLUSIONS AND FUTURE WORKS [6] Tony Bourke, Server Load Balancing, O'Reilly, ISBN 0-596- After a series of simulations over the proposed model 00050-2 [7] University of Paderborn, Dynamic Load Balancing and one can notice that the amount of resources that is available Scheduling. Available at http://wwwcs.uni- for a cloud is not exactly a function of the Gaussian paderborn.de/cs/ag-monien/RESEARCH/LOADBAL/ predictability. Although the shape of the behavior may [8] SocketPro, Article on Dynamic Load Balancing across Many seem similarly Gaussian, it is clear that it’s not congruent, Real Servers with Disaster Recovery. Available at and its dependent not only on a “individual system” but http://www.udaparts.com/document/Tutorial/TutorialFour.ht also on the behavioral aspect of it. So, it is important to m. model them along the lines of a behavior model rather than [9] Domenico Ferrari and Songnian Zhou, “An Empirical purely statistical models. It was observed that fatigue and Investigation of Load Indices For Load Balancing behavior prediction played a vital role in this model’s Applications”, The 12th International Symposium. On Computer Performance Modeling, Measurement, and success on simulation tests, so, even though the user has Evaluation, Page(s) 515-528. different subjective parameters it is very important to [10] Peter Prevos, Explanation Models for the Bystander Effect in capture the key cognitive aspects of the user and model it Helping Behaviour, Monash University, Victoria, Australia, to the cloud as a system of cognitive users, which in turn Unpublished. would allow us to predict the future state of the system © 2010 ACEEE 27 DOI: 01.IJNS.01.03.85