LDCP : An Optimal Algorithm for Static Task Scheduling in Grid Systems by ijcsiseditor


More Info
									                                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                    Vol. 8, No. 4, July 2010

        LDCP+: An Optimal Algorithm for Static Task
               Scheduling in Grid Systems

            Negin Rzavi                                      Safieh Siadat                                   Amir Masoud Rahmani
      Islamic Azad University,                         Islamic Azad University,                              Islamic Azad University,
    Science and Research Branch,                     Science and Research Branch,                          Science and Research Branch,
             Tehran, Iran                                     Tehran, Iran                                         Tehran, Iran
        n.razavi@srbiau.ac.ir                            s.siadat@srbiau.ac.ir                                 rahmani@srbiau.ac.ir

Abstract— after a computational job is designed and realized as a           scheduling algorithms all information needed for scheduling
set of tasks, an optimal assignment of these tasks to the                   such as the structure of the parallel application, the execution
processing elements in a given architecture needs to be                     time of individual tasks and the communication costs between
determined. In grid system with the existence of heterogeneous              tasks must be known, in contrast, these information are
processing elements and data transferring time between them,                unknown in dynamic task scheduling algorithms.
determining an assignment of tasks to processing elements in
order to optimize the performance and efficiency is so important.               Among different types of scheduling algorithms, HEFT is a
In this paper a heuristic algorithm named LDCP+ is presented,               scheduling algorithm for heterogeneous distributed computing
which has optimized the Longest Dynamic Critical Path                       systems which is consists of two phases: first, cost computing
algorithm (LDCP) presented by Mohammad L. Daoud and                         for each task and task selection, second, processor selection. In
Nawwaf Kharma in 2007. This algorithm is a list-based algorithm             the task selection phase the algorithm sets the computation
in the way it assigns each task a priority for its execution. Using         costs of tasks to their mean values and this may limit the ability
task duplication, using idle processing element's time and also             of scheduling algorithm to precisely compute the priorities of
optimizing priority assignment method which is used in LDCP                 tasks. The CPOP algorithm is same as HEFT in the two phases
algorithm, are the basic specifications of LDCP+, since LDCP                but with different strategies in assigning priorities to tasks and
algorithm is executable with the assumption that computation                processor selection. These two algorithms have been
cost of tasks are monotonic, our algorithm which is presented in
                                                                            mentioned as optimal algorithms in the parameter of total finish
this paper has made the scheduling algorithm free from this
restriction and in the case of non-monotonic computation costs,
LDCP+ has the minimum total finish time in the comparison of                    In this paper we present a heuristic list-based algorithm
other scheduling algorithms such as HEFT and CPOP.                          called LDCP+ (optimized of Longest Dynamic Critical Path
                                                                            algorithm) for static task scheduling in Grid systems with
    Keywords- Grid; Static task scheduling; Longest Dynamic                 limited number of processors and we compare our scheduling
Critical Path.                                                              results with other algorithms such as CPOP, HEFT and LDCP
                                                                            for performance evaluation.
                       I.    INTRODUCTION
    A Grid system is a group of connected computers that has                                     II.   RELATED WORKS
the ability of executing parallel programs via a high speed                     Static task scheduling for Grid systems, in general is known
interconnection. The efficiency of program parallelism in Grid              to be NP-Complete problem [4, 7, 9] and most of these
systems depends on methods used in task scheduling on                       algorithms are heuristic [1, 2, 3, 4, 7]. One of the most
available processing elements. Inner connection of processing               important classes of heuristic algorithms is list-based
elements in Grid causes an overhead when two tasks assigned                 algorithms [6], in such algorithms each task is assigned with a
to different processing elements of distinct computers, transfer            priority and three steps of task selection, processor selection
data. In fact, task scheduling in distributed heterogeneous                 and status update are repeated until all tasks are scheduled. In
systems are more complex in which each task can have                        the task selection phase the unscheduled task with the highest
different execution time on different processing elements, so               priority is selected. In the processor selection phase, the
scheduling algorithms for a Grid system should consider the                 selected task is assigned to the processor that minimizes a
execution time of each task on different processing elements                predefined cost criterion that can be minimizing the schedule
and even one incorrect decision can restrict the system                     length. At last in status update phase, the status of the system is
performance to the slowest processing element [2].                          updated. Examples of list-based algorithms are: Heterogeneous
There are two kinds of scheduling algorithms: static scheduling             Earliest Finish Time (HEFT) [9], Critical Path on a Processor
algorithms and dynamic scheduling algorithms. In static                     (CPOC) [9], Critical Path on a Cluster (CPOC) [5], Dynamic

                                                                      335                               http://sites.google.com/site/ijcsis/
                                                                                                        ISSN 1947-5500
                                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                        Vol. 8, No. 4, July 2010
Level Scheduling (DLS) [8], Modified Critical Path (MCP)
[10], Mapping Heuristic (MH) [3], Dynamic Critical Path
(DCP), and Longest Dynamic Critical Path (LDCP) [2].                                          p0         t0             t2
                                                                                                    0           6                    15
                    III.     PROBLEM DEFINITION
                                                                                           p1            t1             Idle                   t3
    In static task scheduling in Grid system, the execution
precedence between tasks is represented by a Directed Acyclic                                           0 4                      15       20 23
Graph (DAG), each DAG is shown by tupple (T, E) where T is                      Figure 2. Schedule length of the presented DAG in Fig. 1. on two processors
a set of n tasks and E is a set of e edges. Each ti ∈ T
represents a task and each ei, j = (ti , t j ) ∈ E represents the
execution precedence between the two tasks which are
                                                                                              6     n0              n1           5        n0             n1      4
connected with the edge ei, j .                                                                                              6                  2
                                                                                                          2         1                                    1
    If (ti , t j ) ∈ E then the execution of task t i ∈T cannot be
started before task finishes its execution. For the edge (t i , t j ) ,                             3         n2 9                        3         n2       8
the source task t i is parent of the sink task t j , while t j is a
                                                                                                                5                                    5
child of t i . A task with no parents is called an entry task and a
task with no children is called an exit task. Associated with                                            n3         7                          n3        3
each edge (t i ,t j ) is a value d i , j that represents the amount of
data to be transmitted from task to task t j , and in some cases it                                       (a)                                  (b)
also represents the minimum time that a task needs to wait for                   Figure 3. Tasks computation time on each processor that will be acquired
starting after task t i finishes its execution.                                                       from cost matrix in Fig. 1.

    A Grid system is represented by a set P of m processors, a                       Assigning task priorities in Grid system the efficiency of
set T of n tasks and n × m computation cost matrix ( Wn×m ).                    list-based scheduling algorithms depends on the methods which
Each element w i , k ∈ W ,1 ≤ i < n ,1 ≤ k ≤ m represents the                   assign priorities to tasks.
execution time (computation cost) of task t i on processor Pk .
We have the same assumption as LDCP that all processors are                         In our suggested algorithm LDCP+, if selecting a task in
fully connected and communications between processors occur                     one step of scheduling causes the minimum schedule length we
via independent communication units [2], so, we can have task                   assign a high priority to that task. There are some basic
execution and data transferring in parallel. Also the data                      definitions which are used in LDCP algorithm and because
                                                                                LDCP+ is the result of optimization of LDCP, we decided to
transfer rate between any two processors on the network is
                                                                                represent this basic knowledge too.
assumed to be fixed and constant as same as LDCP. The
communication cost between two processors is represented by
n × n matrix ( Dn×n ). d i , j ∈ D is zero if two tasks t i and t j of          B. Definition 2
and are scheduled on the same processor and it is equal to                          Critical Path: For a given DAG, the Critical Path (CP) is
communication cost (non zero) in the other case. A task can                     defined as the path from an entry task to an exit task for which
start its execution on a processor only when all data from its                  the sum of the computation costs of tasks and the
parent become available to that processor. The goal of our                      communication costs of edges is maximal.
algorithm is to assign tasks in processors in a way that
minimizes the total finish time, or the schedule length.                                IV.        LDCP: LONGEST DYNAMIC CRITICAL PATH

A. Definition 1                                                                 A. Definition 3
    Schedule Length: The maximum execution time of the                              Longest Dynamic Critical Path: Given a DAG with n tasks
processors or the finish time of the final task after task                      and e edges and a Grid system with m processors, DCP during
scheduling is called scheduled length. There is a DAG and a                     a particular scheduling step is a path of tasks and edges from an
computation cost matrix with two processors as shown in                         entry task to an exit task.
Fig.1. The schedule length is computed in Fig.2. and it is equal                    LDCP is the largest DCP, considering that communication
to 23.                                                                          costs between tasks scheduled on the same processors are
              t0      t1             Task     p0    p1                          assumed zero, and the execution constraints are preserved.
                    2   1
                                            t0      6      5                    Fig.3. represents two dynamic critical paths. First path in
             3                                                                  Fig.3.a. is composed of tasks t 0 , t 2 and t 3 which is scheduled
                       t2                   t1      6      4
                                                                                on processor p0 and has the length of 29. The second DCP in
                         5                  t2      9      8                    Fig.3.b. is composed of tasks t 0 , t 2 and t 3 which is scheduled
                  t3                        t3      7      3                    on processor p1 and has the length of 23, so at the first step of
                   (a)                             (b)                          scheduling, LDCP is composed of tasks t 0 , t 2 and t 3 and with
        Figure 1. An example of (a) DAG (b) computation matrix                  the schedule length of 29.

                                                                          336                                           http://sites.google.com/site/ijcsis/
                                                                                                                        ISSN 1947-5500
                                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                 Vol. 8, No. 4, July 2010
            V.      LDCP+: THE PROPOSED ALGORITHM                                          5) Definition 8
    In the algorithm of LDCP+, each scheduling iteration                                     KeyNodeSet: This set includes KeyNodes that are selected
includes three phases below:                                                             for scheduling and in the first scheduling iteration it can
                                                                                         include several tasks, but in other iterations it has only one task
   1. Task selection                                                                     for scheduling and in the first scheduling iteration it can
                                                                                         include several tasks, but in other iterations it has only one
   2. Processor selection phase
   3. Status update phase
                                                                                           6) Definition 9.
   These 3 phases will be accomplished for each task until last                              Least Execution Time (LET): Least Execution Time is
input task is selected for scheduling.                                                   defined as

A. Task Selection Phase                                                                       min{ processTim e( pk ) + wi ,k + d i , j }                   (3)
    LDCP+ selects a set of tasks that play main role in                                        where processTim e ( p k ) is the time that find scheduled
determining schedule length.                                                             task on processor p k finishes its execution, w i , k is the
   In first step of this phase, DAG of each processor is                                 computation time of task corresponded to i on processor k, and
required for scheduling.                                                                 d i , j is the communication time between t i and t j . If both
                                                                                         t i and t j are scheduled on processor p k , then communication
  1) Definition 4                                                                        cost between them will be assumed zero. After computing
    Directed Graph: With the assumption of having a DAG                                  URankSet, the destined task for scheduling algorithm is the
including n tasks, e edges and a Grid system with m processors                           task corresponding to existing KeyNode in URankSet. In the
( p0 , p1 ,..., pm ), DAGP is the directed acyclic graph that
                           k                                                             first iteration to obtain minimum execution time on available
corresponds to processor p k . The computation cost of each                              processing elements, if the number of entry tasks is equal or
task in the processor p k , is represented by a number on the                            less than processors number, all entry tasks will be consider as
related node of the DAGPk .                                                              KeyNode, in other case, as same as the number of processors,
    DAGP0 is shown in Fig.3.a. and DAGP1 is shown in                                     tasks with maximum URanks will be selected as KeyNodes and
Fig.3.b. These figures are related with the DAG and the Grid                             place in KeyNodeSet. In the next iterations, KeyNodeSet
system which is represented in Fig.1. Trough the course of this                          merely includes one KeyNode (a set with one member).
paper, ti is used to refer to the i'th task in directed acyclic graph
and the node n i identifies task t i in DAGPk . The number                               B. Processor Selection Phase
associated with this node represents the computation cost of                                 In this phase, selected task will be assigned to a processor
task ti on processor pk. In each DAGPk , all nodes are assigned                          in the way to gain the minimum schedule length in each
with a number named UpwardRank (URank). URanks are used                                  iteration of scheduling. Therefore, in LDCP+, these stages will
to determine tasks priorities in DAGPk .                                                 be passed: As mentioned above, in the first scheduling
                                                                                         iteration, KeyNodeSet can have more than one KeyNode. For
  2) Definition 5
                                                                                         the purpose of optimizing LDCP algorithm, LDCP+ computes
    URank: UpwardRank of i'th node ( n i ) in DAGPk is                                   distinct permutation of tasks, which their corresponding
defined recursively as                                                                   KeyNodes are available in KeyNodeSet, on different
                                                                                         processors and the permutation with the minimum average
    URankk (ni ) = wi ,k + max nl ∈succk ( ni ){ck (ni , nl ) + URank (nl )} (1)
                                                                                         execution time on processors will be the first assignment of
     where succk (ni ) is a set of immediate successors of node                          tasks to processors. This average execution time can be
                                                                                         achieved from
n i , ck ( ni , nl ) is the communication cost between nodes
n i and nl in DAGPk , and wi , k is the computation cost of                                       ⎧ m−1     ⎫
t i on processor p k .                                                                            ⎪ ∑ wi ,k ⎪
                                                                                                  ⎪         ⎪
                                                                                              min ⎨ k =0    ⎬                                               (4)
  3) Definition 6                                                                                 ⎪     m ⎪
   URankSet: Each element of URankSet is defined as                                               ⎪
                                                                                                  ⎩         ⎪
            m −1                                                                             Where i is the number of tasks corresponding to their
    Max {∑ URank k ( ni )}                                                 (2)           KeyNodes, w i ,k ∈w and m is the number of processors. In
             k =0                                                                        the next iterations, the only available KeyNode in KeyNodeSet
                                                                                         is selected to be scheduled.
   where URank k (n i ) is URank (n i ) in DAGPk .
                                                                                            1) Definition 10
  4) Definition 7                                                                            Idle Space: In a processor when there is a gap between the
    KeyNode: KeyNode is the node that has the maximum                                    start time of a task and the end time of the previous task, that
URank in URankSet. Corresponded task to this node is used as                             interval time is called idle space.
selected task for scheduling algorithm.

                                                                                   337                               http://sites.google.com/site/ijcsis/
                                                                                                                     ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                  Vol. 8, No. 4, July 2010
   2) Definition 11
    Replacement Ability: One task can be placed in an idle                Establish  DAGP for all processors in the system where   0 ≤ k ≤ m − 1      
space when parents of that task have been terminated before the           Calculate URanks for all  DAGPk     
start time of the task. If any of its parents have been scheduled         Compute the URankSet 
                                                                          While there are unscheduled tasks in task list do  
on a different processor, the required time for transferring data
                                                                              Find the KeyNode(s) in the URankSet 
between processors should be mentioned.                                        Put the KeyNode(s) in KeyNodeSet 
    If there is a processor with the idle space and selected task              If (it’s the first step of scheduling) then  
                                                                                       Choose the processors which have the minimum permutation;  
has the ability of locating in that space (replacement ability),
that processor will be selected. At the end of this phase,                         If (there is any processor with idle time and the task have the  
LDCP+ algorithm uses duplication process to decrease the                              replacement ability) then 
schedule length. With this definition, after selecting the                           Selected the processor;  
processor if the selected task has a parent scheduled on a                         Else  
different processor and the selected processor has an idle space                       Compute the finish time of the selected task on every system; 
before the start time of the selected task, then duplication                           Find and select the processor that minimizes the finish time of the
process in the idle space will be used (regarding to the                              Selected task; 
replacement ability).                                                              End if 
                                                                                                Duplicate the parent(s) of the selected task if needed;  
  3) Definition 12                                                             End if 
   Duplication Process: Duplication process is repeating the                   Assign the selected task to the selected processor; 
                                                                               Update the selected processor time; 
execution of one task on other processors.
                                                                               Update the URANK set; 
                                                                               Update unscheduled task list;
C. Status Update Phase                                                    End while 
   After selecting the task and assigning it to a processor,
                                                                                                        Figure 4. LDCP+ algorithm
appropriate URank with the selected task will be deleted from
URankSet. Finish process time of the selected processor will be
updated after the task has been assigned to the processor.
Selected task will be deleted from the list of unscheduled tasks.
                                                                                 Task     p1           p2    p3
LDCP+ algorithm is proposed in Fig.4.
                                                                                    1     14           16    9
                                                                                    2     13           19    18
                       VI.   CASE STUDY                                             3     11           13    19
                                                                                    4     13           8     17
    In this section, execution results of CPOP, HEFT and                            5     12           13    10
LDCP+ algorithms are compared in the case of having non                             6     13           16    9
monotonic computation cost matrix. A Grid system compose of                         7     7            15    11
three single-processor computers (m=3), fifteen tasks (n=15), a                     8     5            11    14
non monotonic computation cost matrix and a DAG with                                9     18           12    20
                                                                                    10    21           7     16
communication costs assigned to graph edges are shown in
Fig.5. which also presents scheduling results of the mentioned
DAG, executed by HEFT, CPOP and LDCP+ algorithms.                                          (a)                                           (b)
Execution results of LDCP and LDCP+ are compared
according to monotonic computation cost matrix. A Grid
system with the parameters m=2 and n=10, a monotonic                                           p1       p2    p3                           p1         p2    p3
computation cost matrix and a DAG with communication costs                      0
assigned to graph edges are shown in Fig.6. Fig.6 also shows                                                  n1                                     n1
scheduling results of the mentioned DAG presented in Fig6.b,                    10
executed by LDCP and LDCP+ scheduling algorithms.                                                             n3
                                                                                                       n4                                            n2
                                                                                30                            n5                          n3
                                                                                                       n6                                            n5
                                                                                40                                                       n7
                                                                                                              n7                                            n6

                                                                                60         n8                                                               n8
                                                                                70                                                                   n9
                                                                                80                     n10
                                                                                                                                                     n1 0
                                                                                                 (c)                                           (d)

                                                                    338                                            http://sites.google.com/site/ijcsis/
                                                                                                                   ISSN 1947-5500
                                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                            Vol. 8, No. 4, July 2010

step     Selected         Selected                                                       step                Selected
           task           processor                                                               Selecte
 1           t2              p0                                                                   d task
 2           t1              p1                                                          1          n1          p1
 3           t4              p1                                                          2          n4          p2
 4           t9              p0                                                          3          n3          p1
 5           t5              p0                                                          4          n2          p3
 6           t3              p0                                                          5          n5          p2
 7           t7              p1                                                          6          n6          p3
 8           t6              p1                                                          7          n9          p2
 9           t8              p0                                                          8          n7          p1
 10         t11              p0                                                          9          n8          p1
 11         t10              p1                                                          10         n10         p2

              (e)                                (f)                                               (e)                                      (f)

Figure 5. Scheduling results for HEFT, CPOP, LDCP+ algorithms. (a) A               Figure 6. Scheduling results for LDCP and LDCP+ algorithms. (a) A graph
graph with 10 tasks. (b) Graph cost matrix. (c) HEFT Scheduling algorithm           with 11 tasks. (b) Graph cost matrix (c) tasks execution sequence in LDCP
with schedule length of 80. (d) CPOP algorithm with schedule length of 89.         algorithm (d) LDCP algorithm with schedule length of 64 (e) tasks execution
  (e) LDCP+ algorithms with schedule length of 68. (f) Tasks execution             sequence in LDCP+ algorithm. (f) LDCP+ algorithm with schedule length of
            sequence in LDCP+ algorithm. Duplicated tasks:                                                             61.5

                                                                                                 VII. CONCLUSION AND FUTURE WORK
       task    p0           p1
         t1    4            6
                                                                                       In Grid systems, task scheduling is an important problem in
         t2    15          22.5                                                    the domain of optimizing heterogeneous distributed systems. In
         t3    4            6                                                      this paper a new heuristic scheduling algorithm, named
         t4    13          19.5                                                    LDCP+, is proposed. This algorithm has optimized LDCP
         t5    10           15                                                     algorithm that better result are attained for schedule length by
         t6    7           10.5                                                    improving these three phases: task selection phase, processor
         t7    8            12                                                     selection phase and status update phase. LDCP+ can schedule
         t8    4            6                                                      tasks in Grid systems in both case of having monotonic and
         t9    12           18                                                     non monotonic cost matrix. Using duplication process for
        t10    6            9                                                      optimizing priority assigns to tasks and also using idle spaces
        t11    9           13.5
                                                                                   of processors will result in having better schedule length rather
                                                                                   than other scheduling algorithms. In real time environment, the
                                                                                   assignment of resources such as processors in a specific time is
                    (a)                        (b)                                 so important. More works can be done to improve algorithms
                                                                                   with less computation cost for such environments.

step      Selected          Selected                                               [1]   S. Bansal, P. Kumar, and K. Singh. An improved duplication strategy for
            task           processor                                                     scheduling precedence constrained graphs in multiprocessor systems. In
 1            t2               p0                                                        IEEE Transactions on Parallel and Distributed Systems 14(6), pages
 2            t1               p1                                                        533-544, 2003.
 3            t4               p1                                                  [2]   M. I. Daoud and N. N. Kharma. A high performance algorithm for static
 4            t9               p0                                                        task scheduling in heterogeneous distributed computing systems. In
 5            t5               p0                                                        Journal of Parallel and Distributed Computing 68(4), pages 399-409,
 6            t3               p0                                                        2008.
 7            t7               p1                                                  [3]   H. El-Rewini and T. G. Lewis. Scheduling parallel program tasks onto
 8            t6               p1                                                        arbitrary target machines. Journal of Parallel and Distributed Computing
 9            t8               p0                                                        9(2), pages 138-153, 1990.
 10          t10               p0                                                  [4]   E. Ilavarasan, P. Thambidurai, and R. Mahilmannan. Performance
 11          t11               p0                                                        effective task scheduling algorithm for heterogeneous computing
                                                                                         system. 4th International Symposium on Parallel and Distributed
                                                                                         Computing, 0:28-38, 2005.
                                                                                   [5]   J. Kim, J. Rho, J.-O. Lee, and M.-C. Ko. Cpoc: Effective static task
               (c)                                     (d)                               scheduling for grid computing. In Proceeding of the 2005 International
                                                                                   [6]   Conference on High Performance Computing and Communcations,
                                                                                         pages 477-486, 2005.

                                                                             339                                    http://sites.google.com/site/ijcsis/
                                                                                                                    ISSN 1947-5500
                                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                             Vol. 8, No. 4, July 2010
[7]  Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating
     directed task graphs to multiprocessors. ACM Comput. Surv. 31(4),
     pages 406-471, 1999.
[8] Y. kwong Kwok, I. Ahmad, and I. Ahmad. Dynamic critical-path
     scheduling: An effective technique for allocating task graphs to
     multiprocessors. IEEE Transactions on Parallel and Distributed Systems
     7(5), pages 506-521, 1996.
[9] G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for
     interconnection constrained heterogeneous processor architectures. IEEE
     Transaction on Parallel and Distributed Systems 4(2), pages 175-187,
[10] H. Topcuoglu, S. Hariri, and W. Min-You. Performance-effective and
     low complexity task scheduling forheterogeneous computing. IEEE
     Transaction on Parallel and Distributed Systems 13(3), pages 260-274,

                                                                               340                           http://sites.google.com/site/ijcsis/
                                                                                                             ISSN 1947-5500

To top