Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Clouds using the concept of Application Metadata by ides.editor


More Info
									                                                                   ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

                   Multilevel Hybrid Cognitive Load
                Balancing Algorithm for Private/Public
                Clouds using the concept of Application
                                                   Bharath C. and Spoorthy A. Raman
                                   International Institute of Information Technology, Bangalore, India
                                        Email: {bharath.c, spoorthya.raman}

Abstract—Cloud computing is an emerging computing                       scalability is poor on larger systems. The other approach is
paradigm. It aims to share data, resources and services                 to try and make estimations to somehow predict the future
transparently among users of a massive grid. Although the               system state and then behave in a pre-coded manner as
industry has started selling cloud-computing products,                  required as in [7, 8]. This is a far efficient solution but, it
research challenges in various areas, such as architectural
design, task decomposition, task distribution, load
                                                                        is rather complicated in terms of its algorithmic
distribution, load scheduling, task coordination, etc. are still        complexity and ease of modeling. But, all said and done
unclear. Therefore, we study the methods to reason and model            - the grass roots of this approach is elegant and
cloud computing as a step towards identifying fundamental               imposing- and this in fact seems to be the future in
research questions in this paradigm. In this paper, we propose          this domain, more so, because of the problem in the
a model for load distribution on cloud computing by modeling            classical approach that can drive to dramatic increase of
them as cognitive systems and using aspects which not only              need for resources. In another words the achieved speed-
depend on the present state of the system, but also, on a set of        up in the classical approach is either very low or worst.
predefined transitions and conditions. The entirety of this             This happens due to the workstation user comportment,
model is then bundled to cater the task of job distribution
using the concept of application metadata. Later, we draw a
                                                                        which can decide to change the computing needs at any
qualitative and simulation based summarization for the                  moment. For this reason remote tasks are unacceptably
proposed model. We finally evaluate the results and draw up             delayed and might even be considered as dead by some
a series of key conclusions in cloud computing for future               elements and therefore would increase the need for the
exploration.                                                            cloud elements to compute the task locally or send to
                                                                        other low charged station elsewhere. One way of
Index Terms—Cloud Computing, Load Balancing,
Application Metadata.
                                                                        estimating the workstations load is by using various
                                                                        statistical approaches such as load functions which are
                                                                        obtained through repeated measurements and a large
                                                                        number of apriori experimentations. These functions often
    The problem of load balancing has been ever                         have a Gaussian aspect, like many other models of natural
challenging, be it in terms of a mechanical machine or be it            processes. But the basic flaw in these methods is that
distributed systems and cloud-computing systems [1]. This               they ignore the very cause of the load of a workstation,
is attributed to the fact that these systems have polymorphic           which is dependent on the behavior of the individual user.
requirements that span over different domains and domain                    Since, both types of algorithms mentioned above
parameters. Cloud computing, in particular, has had a                   fail to solve practical problems, there must be some
wide range of solutions to this problem, ranging from the               suitable variations made to them to appreciate their
strata involving operating systems, to distributed                      capabilities. In this paper, we modify algorithms of the
application design, to designing high level game theoretic              first type which include continuous monitoring of the
optimizations and even building machines to host                        system state. But in addition to monitoring, there are a
computing sub-trees (similar to a remote computing                      set of transition rules specified for the system states
machine) [1]. However, even though these models are                     based on the type of the inputs received. So given an
different morphologically, they all have had a common                   input a, when the present state of the system is q1,
goal of increasing the economic efficiency and providing                then the system goes to state q2. But a simple
supplementary computing power to a cloud of users when it               transition of the above type is not possible without some
is needed. These methods also have the same problem to                  apriori knowledge of the input a. This knowledge is
sort out, which is to “Optimally Balance and Distribute                 given in terms of parameters mentioned in the
Loads”.                                                                 application metaheader that accompany every task that
    The classic approach to solving this problem is to use              are to be processed on the cloud. The major parameters
algorithms, which try to solve the problem by analyzing the             included for the efficient working of the system include
current state of the system as worked by Shivaratri et                  the number of processor cycles required for applications
al. in [4]. This method has some advantages, but, it also               execution as observed during development and testing
has several disadvantages like the need for a medium or                 phases, the total memory occupied by it while executing
large computing power and due to these problems, the

© 2010 ACEEE                                                       23
DOI: 01.IJNS.01.03.85
                                                                ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

and architecture based metrics. These parameters further             system is already loaded, the load balancer must schedule
help in defining an optimal transition function for the              the new process as well as balance the load globally across
system. Thus, the system is able to cognitively think as to          the system. The value t in the Equation(2) R (to reflect
what should be done with the load and where and how it               the situation at a certain moment) and k is termed the
must be scheduled to balance the entire system load.                 fatigue factor and is in the closed interval [0, 1]. The value
                                                                     k can be represented as follows (As per Equation (3)).
                    II.   SYSTEM MODEL

A. The Application Metaheader Model
                                                                         Here, k is the Fatigue factor,is the Present Performance
   Application Metaheader is a concept that is conceived by          Capacity of the CPU which is basically a complex
us. It deals with imbibing the apriori knowledge about the           statistically determined function that is dependent on
cloud application in consideration. The overview of this             factors like the age of the CPU, its average fan speed, clock
technique is to affix a compulsory, descriptive and a
                                                                     speed, average fetch time etc.,        and    are the present
structured metaheader to each of the incoming cloud
application (analogous to the well known TCP/IP headers).            temperature and Normal (Optimal) temperature rating of
Formally this technique could be worded as follows:                  the CPU respectively and is the deterioration quotient
Let S be the header of an incoming cloud application . S is          which is determined by real life temperature tests and is
modeled as a set containing the sub-application parameters           provided by the vendor.(Its calculated by running the CPU
as shown                                                             at a constant 60oC, then 70oC, and the resultant decrease in
                                                                     CPU life determines (Explained as in [3]). Of course,
                                                                     apart from these other indicators may be considered for a
   Where, denotes the ith sub-application parameter of the           detailed analysis like video memory, virtual memory and
n sub-application parameters constituting the cloud                  so on as depicted in the following relation:
application . Now, each     inturn is composed of a set of               Now each of these values is dependent on the user
the following task associated data. It could be                      behavior, thereby leading to various values of the sub-
mathematically formulized as an ordered set as follows:              application parameters of different tasks. Given these
                                                                     relations and function for the system, the load balancer
                                                                     algorithm must be a function of the following type:
          Where, Estimated percentage of processer usage
that is needed for the sub-application i as observed on an
architecture . (Multiple such values of are present for
                                                                         This equation basically states that, given a present state
different architectures of processors.)
                                                                     of the system and an input task with the sub-application
  - Estimated memory utilized by the process in bytes.
                                                                     metaheader given as , and the present load on the system
   - Description of each sub-application component along             being , then the load balancer acts “cognitively” such that
with its length and displacement from the beginning of the           it puts the system state to and the load being.
code.                                                                    These above equations illustrate the situation for an
    The source of the sub-application and application                incremental load on the system since its start. Now, let us
related data are obtained during the development and                 model the case of a decremental load, where the load
testing phase of the cloud application. They are affixed to          decreases in the system. This is modeled as another
the code before it is run on the cloud system. They form the         incoming task with the values of the sub-application
metaheader since they are the meta-knowledge required to             metaheader reading all zeros for the values of. Now that
provide information about the application in the system.             we have values of the metaheader, let us see how the total
B. The Mathematical Model                                            system load is recalculated.
                                                                         Let be the load of the system when the sub-application
   Starting with the mathematical modeling of the system,            with header      was executing. Let the load contributed by
the load of the system is defined as follows:                        the sub-application be      . So, the total load     after the
                                                                     sub-application in consideration exits will be as follows:
   The system load of a cloud computing system is defined
as the sum of loads on each processing node forming the
cloud that is determined by the processor activity. The                 This change is triggered once there is an immediate
values of are determined by the “on” or “off” state of the           change in the processor and memory utilization of a
particular node. Let us associate a value of 1 if is on and a        particular node.
0 otherwise. The load on each node, according to [9], is
given by:                                                                             III.   DETAILED ALGORITHM

                                                                         Since we have got a basic mathematical model of the
   Now, there are two ways of looking at in the above                algorithmic prototype, our next aim would be to wire these
equation. In the start state of the entire system, the values        mathematical models to the working of the proposed
of will be zero since there is no load on the system. In this        algorithm. Our basic objective as discussed earlier is to
case, the load balancer is expected to do an optimal load            derive a new state for the total load in a cloud, by
scheduling before balancing. On the other hand, when the             intelligently modeling the transition rules and transition

© 2010 ACEEE                                                    24
DOI: 01.IJNS.01.03.85
                                                                            ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

parameters. For, this we define three basic criterion of
parameterization: Current-State-of-System , Affinity-to-
work ,the-possible-system-fatigue-state k and the expected-
dominant-behavior-rule , which maps the objective
parameterized values of the load to a more generic abstract
defined set of three labels “High”, “Moderate” and “Low”.
The importance of the sub application metaheader is
demonstrated by arriving at a suitable value of the Affinity
to work parameter using the load contribution as mentioned
in the metaheader. Since the value of is dependent on the
affinity to work and the incoming application, the arriving
decision can be assumed to be accurate and optimal.
    To begin the modeling process from the scratch, let us
now consider some basic behavioral rules from the user                                Figure 2: A simple state transition diagram for mapped tasks.
point of view which would be to define a set of rules in the                        To model the global system, we must try to integrate all
form of “If the current-state-of-system is        , when the                     these basic user behavior and form a higher order
fatigue is k and the affinity-to-work is , then the favorable                    abstraction and approximate them into a global behavior.
behavior would be ” . The above mentioned concepts
are shown in Figure 1 as a state transition diagram as                                            IV.   WORKING OF THE MODEL
shown below.
                                                                                     After the mathematics and the state diagrams behind the
                                                                                 proposed model, it now becomes important for us to know
                                                                                 how exactly the entire algorithm is supposed to be working.
                                                                                 We will then carry forward this “ideal” expectation to the
                                                                                 simulation phase to compare the correctness of the model.
                                                                                 For the working of this model the basic requirements in
                                                                                 terms of the inputs would be the values τ_start (Start Time
                                                                                 of work), τ_work (Total time of work), etc. along with the
                                                                                 dependencies that are needed to calculate the values like  
                                                                                  k, Ω, etc. The functions that are binary in behavior take in
                                                                                 values equivalent to “on” and “off” which we have
                                                                                 modeled to be identified with a crisp values like 1 and 0.
                                                                                 On the other hand if we consider the values for subjective
                                                                                 entities like fatigue, affinity-to-work, etc. we cannot model
                                                                                 them to be along the lines of either being a 0 and 1 or
                                                                                 “low” and “high” - For this we might make a simulation
 Figure 1: A simple state transition diagram for the user behavior model.
                                                                                 phase assumption that a “low” motivation may be
    Having the user behavior model and the transition rules,                     considered if the value of the function previously defined
there could also be a generalization performed on the above                      X<X_threshold, where the value of                    is purely
model that maps the node states of node 1 to 5 to daily                          based on the aggressiveness and the granularity that is
performed common computing tasks that would be the                               demanded from the algorithm (for example for our
ideal clients to run on a cloud computing system. These                          simulation we have considered a                 to be at about
generalizations are made keeping in account the broad                            30% for each scale jump). It must also be emphasized that
resource utilization pattern of these ubiquitous task. For                       all those values and types of activities are arbitrary, and
e.g. if there is a system state such that the processor                          only studying real users in real working situations can
utilization is high and the memory utilization is also high,                     produce a valid ecological model.
we conclude that there has been a heavy computation                                  Now, with the help of our user behavior and system
intensive task being accomplished at that node.                                  behavior models, we can then approximate the loads of the
    Similarly, if the memory utilization is moderate and                         corresponding workstations in the cloud. Three load
processor utilization is low, it could be an Input/output                        functions were selected as follows:
intensive task like a document processing task. These                                i.     State, which shows whether the workstation is on
values of high, low and moderate are obtained as a result of                                or off.
the metaheader knowledge used cognitively to define the                             ii.     Processor, which shows the processor activity.
system state. Figure 2 shows the state transition diagram                          iii.     Memory, which shows the memory use on the
for the Figure 1 based on the task mapping using the state                                  workstation.
parameters and characteristics.                                                      These functions can then be defined in a fuzzy manner,
    Now that we have modeled the user based behavior for                         as presented in Figure 2, with the help of the types of
some basic state clauses, we can now extrapolate the user                        behavior       already computed. We then apply the
behavior to work in tandem with the system behavior.                             characteristics of using previously mentioned functions and
                                                                                 mathematical models in a simulated environment to find

© 2010 ACEEE                                                                25
DOI: 01.IJNS.01.03.85
                                                                 ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

out the working of the model. During the phase of
simulation- there were a number of key efficiency
enumerations made. Since algorithms in most of the
literature solely use the instantaneous run-queue length (i.e.
the number of jobs being served or waiting for service at
the sampling instant) as the load index of a computing
node, it is slightly based on the “do not overload others”
intuition. The run-queue length may be a good load index if
we assume that all the nodes of the system are
homogeneous and the inter-node communication delay is
negligible or constant. However, it is not a reliable load
indicator in a heterogeneous environment, since; it ignores
the variations in computing power. Thus, we made
modifications to these two and decided upon using an
accumulative job execution load based algorithm.                                       Figure 3: Load Prediction Test

         V.   SIMULATIONS AND OBTAINED RESULTS                        B. Simulation #2
   In order to verify our model an application was designed               The second test involved a method to test the load
and implemented using a combination of Perl, NetSim-2,                balancing ability of the proposed model. The simulation’s
AWK Scripting and Java development platform. In our                   result is shown below via the Graph (in Figure 4). Here, the
application the user behavior characteristics are modeled             red line represents the actual load that’s applied on the
using the following data values:                                      cloud, whereas the green line represents the balanced load
                                                                      of the system.
                           TABLE I:
               Data                          Value
                                           8+-2 a.m.
                                         8+-1 (in hours)
                                          21 (in hours)
                                   30% (each stage) up to 90%

       Number of Computing          5 (Heterogeneous 2+2+1)

                                        600C (arbitrary)

                a                     1 (arbitrarily chosen)

   The descriptions of simulations along with their                                     Figure 4: Load Balancing Test
corresponding results are written below.                                  A detailed analysis of the graph shows us that the result
A. Simulation #1                                                      has been positive. We observed that the system did not
                                                                      allow the load to cross the 85% mark, even though the
    The first test that was performed via simulation was to           actual load applied was at about 98%. It was also observed
test the load predicting ability of the proposed model. For           that the load was brought down considerably at very high
this we simulated a processor which in the Graph (Figure              loads (by about 40% when actual load is about 98%), but,
3) is represented by a red line. We then ran our algorithm            when the load is low the balancing capability reduces to
simultaneously to predict the load, which in the Graph                less than 5% (perhaps, due to the fact that the model
(Figure 3) is represented by a green line.                            basically concentrates at high load systems and because of
    A detailed analysis of the graph gave a positive result           the fact that we have capped the overload buffer to around
with respect to the ability of the model to predict the load.         90%.
We found that the initial latency of the model to predict the
load jump of the first 100% spike was about 1.7 seconds               C. Simulation #3
and we observed that at a higher load the prediction is                   The final simulation was to test how the system behaves
usually higher by about 6%, which we felt was a good                  as integration of elements. The results of this test are
observation. On the other hand, the downward fall was                 depicted in the Graph (show in Figure 5). The basic testing
predicted at a very fast pace (perhaps, due to its affinity to        criterion here was not a normal one. In order to test the
sense the fatigue levels). It could predict a downward fall           integral performance along with the flexibility, we decided
of about 18.67% in about 0.5 seconds.                                 to simulate three systems and run two of them(shown as
                                                                      red and green line respectively in Figure 5) in such a way
                                                                      that they are “on” from t=0 to t=10 and after t=10 they are

© 2010 ACEEE                                                     26
DOI: 01.IJNS.01.03.85
                                                                  ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

switched-off (i.e. removed from the cloud ) and the third              more easily. The simulation results are in good agreement
one is “idle” in the interval [0,10] and then after 10 we load         with this model.
the third(shown as a blue line in the Figure 5) to work at                 However, it is very important for us to test the system as
100\% load.                                                            a whole in a real world model. This would perhaps need a
                                                                       slightly different approach, more so, because of the
                                                                       “bystander” behavior [10] of the cognitive elements in the
                                                                       real world. It is also important for us to extrapolate this
                                                                       model in terms of more heterogeneity, more load variance
                                                                       and model it to provide a better transient behavior in
                                                                       balancing loads. This would form a major cluster of work
                                                                       that could be done in times to come.

                                                                       [1] Ian Foster (Editor), Carl Kesselman, The Grid: Blueprint for
                                                                            a New Computing Infrastructure, The Elsevier Series in Grid
                                                                            Computing, Morgan Kaufman, 2005.
                                                                       [2] S. Zhou, et al., “A trace-driven simulation study of dynamic
                                                                            load balancing”, IEEE Transactions on Software
             Figure 5: Integration and Flexibility testing                  Engineering, 14(9): pp.1327-1341, Sept. 1988.
                                                                       [3] Xbit Labs, X-bit labs Investigation: Influence of Intel
   The result was totally in-sync with our expectation. A                   Pentium 4 Core Temperature on CPU Performance,
careful analysis of the graph (Figure 5) shows that even                    Available at
though the third system (blue line) is idle in the interval                 /p4-temp.html.
[0,10], it still shows a positive value of load in it (Perhaps,        [4] Shivaratri NG, Krueger P, Singhal M. “Load distributing for
the load was distributed to it to balance the load in the first             locally distributed systems”, ComputerMagazine, 1992; pp.
system(depicted by a red line) and after 10th second we                     33-44.
observed that 3rd system continues to work at 100%                     [5] Zomaya, A.Y. and Yee-Hwei Teh, “Observations on using
                                                                            genetic algorithms for dynamic load-balancing”, IEEE
showing lack of balancing.
                                                                            Transactions on Parallel and Distributed Systems, Volume
                                                                            12, Issue 9, Sept. 2001 Page(s):899-911.
             CONCLUSIONS AND FUTURE WORKS                              [6] Tony Bourke, Server Load Balancing, O'Reilly, ISBN 0-596-
   After a series of simulations over the proposed model                    00050-2
                                                                       [7] University of Paderborn, Dynamic Load Balancing and
one can notice that the amount of resources that is available
                                                                            Scheduling.        Available      at      http://wwwcs.uni-
for a cloud is not exactly a function of the Gaussian             
predictability. Although the shape of the behavior may                 [8] SocketPro, Article on Dynamic Load Balancing across Many
seem similarly Gaussian, it is clear that it’s not congruent,               Real Servers with Disaster Recovery. Available at
and its dependent not only on a “individual system” but           
also on the behavioral aspect of it. So, it is important to                 m.
model them along the lines of a behavior model rather than             [9] Domenico Ferrari and Songnian Zhou, “An Empirical
purely statistical models. It was observed that fatigue and                 Investigation of Load Indices For Load Balancing
behavior prediction played a vital role in this model’s                     Applications”, The 12th International Symposium. On
                                                                            Computer Performance Modeling, Measurement, and
success on simulation tests, so, even though the user has
                                                                            Evaluation, Page(s) 515-528.
different subjective parameters it is very important to                [10] Peter Prevos, Explanation Models for the Bystander Effect in
capture the key cognitive aspects of the user and model it                  Helping Behaviour, Monash University, Victoria, Australia,
to the cloud as a system of cognitive users, which in turn                  Unpublished.
would allow us to predict the future state of the system

© 2010 ACEEE                                                      27
DOI: 01.IJNS.01.03.85

To top