Lab 6 Dynamic Load Balancing by wuzhenguang


									                                          Lab 6
                                  Dynamic Load Balancing
                                           Kenneth Sundberg
                                           November 24, 2007

1    Dynamic Load Balancing                                  generated the new tasks and added them to the ap-
                                                             propriate queues. If no specific tasks were available
Many real world applications have dynamic structure          the computation thread would draw an anonymous
that makes static load balancing either impractical or       task instead.
ineffective. In these situations it is often desireable          The communication thread had three tasks. The
to use some processing time to re-balance the load as        first task was to communicate specific tasks to the
a program runs. It is critical that care be taken so         appropriate processors as they were generated. The
that this added overhead is overcome by the benefit           second task was to perform load balancing as detailed
of the rebalanced load.                                      later. The final task was to detect termination.

2    Dynamic Task Structure
                                                             3     Load Balancing Techniques
The task structure used for this was a tree-like task
graph. Each task consisted of a random sleep for 1-10
                                                             This simple task structure was run on three different
seconds. After completion each task had a chance to
                                                             load balancing schemes. First it was run with no load
spawn two new tasks. The first was an anonymous
                                                             balancing to provide a baseline. Then it was run
task that could be performed by any processor. The
                                                             with Asynchronous Round Robin, and finally with
second task was a specific task that could only be
                                                             Dimension Exchange.
performed by one processor which was chosen at ran-
dom when the task was created. Everytime a task
was spawned the chance of its children spawning the
same type of task was reduced, this guarantees an            3.1    Asynchronous           Round         Robin
eventual termination.                                               (ARR)
   Sixteen processors were used for the timing, all of
the processors began with ten anonymous jobs and             In this method each processor kept track of who it
five jobs specific to that processor. In addition pro-         would ask for work from next, beginning with its
cessor 0 was given fifty additional anonymous tasks           immediate successor in terms of processor number.
to insure an initial load imbalance. Each processor          When a processor ran out of work it made a request,
ran two threads, the first was responsible for commu-         the designated processor then sent half of its load
nication and the second for computation.                     (measured in terms of queued tasks) to the requesting
   The computation thread would pull a task off of the        processor. After making a request the processor in-
queue of specific tasks and sleep the specified amount         cremented the processor number from which it would
of time. Once done sleeping the computation thread           ask for work.

    Method          Average    St. Dev.
     None           45888.20   708.97
     ARR            44208.90   527.29
 Dim. Exchange      42607.60   761.17

Figure 1: Timing results for different load balancing

3.2    Dimension Exchange
This is a much more intense method than ARR. At
prespecified intervals (every 1000 seconds), the pro-
cessors would communicate along the dimensions of
the hypercube. With each communication the two Figure 2: Load statistics with no load balancing
involved processors would insure that they had equal strategy
load after the communication. Thus when the load-
balance step was completed all of the processors are
in balance.

4     Results
All three methods were run 10 times with averages
and standard deviations being computed to account
for the random nature of the task structure. This
data is shown in figure 1. Also every 100 seconds
each processor outputted its current load, in terms of
queued tasks. After running this files were compiled
into graphs of the minimum, maximum, and average
loads on the system throughout the process. These
graphs for the three different methods are shown in                  Figure 3: Load statistics using ARR
figures 2,3, and 4.

5     Conclusions
While the effects of the load balancing techniques are
noticeable, they are not dramatic. This is in part due
to the nature of the task structure. In the beginning
the number of queued tasks on each processor grows
very quickly, until the probabilities of spawning new
tasks lessens and the average load begins to drop.
This means that for the majority of the time all of
the processors are working and load balancing is not
   The quality of the load balancing is also evident         Figure 4: Load statistics using dimension exchange
from the load graphs. ARR, does decrease the time

taken at the tasks, but it does not begin to take affect
until the end of the run. Its effects are seen in the
quickly dropping maximum load line and the jagged
minimum line at the end of the process. ARR does
have the advantage of lower communication overhead
and may be a very appropriate choice for some tasks.
  Dimension exchange had a higher communication
overhead as it performed load balancing throughout
the process. This can be seen as the minimum, aver-
age, and maximum lines are indistinguishable on this
graph. In this specific case the communication costs
were negligible compared to the computations as the
processors was easily able to keep up with the needed
communication while sleeping. However if this com-
munication cost were to be commensurate with the
calculation costs this might no longer be the most
desirable strategy.


To top