VIEWS: 204 PAGES: 6 CATEGORY: Emerging Technologies POSTED ON: 8/13/2010 Public Domain
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 LDCP+: An Optimal Algorithm for Static Task Scheduling in Grid Systems Negin Rzavi Safieh Siadat Amir Masoud Rahmani Islamic Azad University, Islamic Azad University, Islamic Azad University, Science and Research Branch, Science and Research Branch, Science and Research Branch, Tehran, Iran Tehran, Iran Tehran, Iran n.razavi@srbiau.ac.ir s.siadat@srbiau.ac.ir rahmani@srbiau.ac.ir Abstract— after a computational job is designed and realized as a scheduling algorithms all information needed for scheduling set of tasks, an optimal assignment of these tasks to the such as the structure of the parallel application, the execution processing elements in a given architecture needs to be time of individual tasks and the communication costs between determined. In grid system with the existence of heterogeneous tasks must be known, in contrast, these information are processing elements and data transferring time between them, unknown in dynamic task scheduling algorithms. determining an assignment of tasks to processing elements in order to optimize the performance and efficiency is so important. Among different types of scheduling algorithms, HEFT is a In this paper a heuristic algorithm named LDCP+ is presented, scheduling algorithm for heterogeneous distributed computing which has optimized the Longest Dynamic Critical Path systems which is consists of two phases: first, cost computing algorithm (LDCP) presented by Mohammad L. Daoud and for each task and task selection, second, processor selection. In Nawwaf Kharma in 2007. This algorithm is a list-based algorithm the task selection phase the algorithm sets the computation in the way it assigns each task a priority for its execution. Using costs of tasks to their mean values and this may limit the ability task duplication, using idle processing element's time and also of scheduling algorithm to precisely compute the priorities of optimizing priority assignment method which is used in LDCP tasks. The CPOP algorithm is same as HEFT in the two phases algorithm, are the basic specifications of LDCP+, since LDCP but with different strategies in assigning priorities to tasks and algorithm is executable with the assumption that computation processor selection. These two algorithms have been cost of tasks are monotonic, our algorithm which is presented in mentioned as optimal algorithms in the parameter of total finish this paper has made the scheduling algorithm free from this time. restriction and in the case of non-monotonic computation costs, LDCP+ has the minimum total finish time in the comparison of In this paper we present a heuristic list-based algorithm other scheduling algorithms such as HEFT and CPOP. called LDCP+ (optimized of Longest Dynamic Critical Path algorithm) for static task scheduling in Grid systems with Keywords- Grid; Static task scheduling; Longest Dynamic limited number of processors and we compare our scheduling Critical Path. results with other algorithms such as CPOP, HEFT and LDCP for performance evaluation. I. INTRODUCTION A Grid system is a group of connected computers that has II. RELATED WORKS the ability of executing parallel programs via a high speed Static task scheduling for Grid systems, in general is known interconnection. The efficiency of program parallelism in Grid to be NP-Complete problem [4, 7, 9] and most of these systems depends on methods used in task scheduling on algorithms are heuristic [1, 2, 3, 4, 7]. One of the most available processing elements. Inner connection of processing important classes of heuristic algorithms is list-based elements in Grid causes an overhead when two tasks assigned algorithms [6], in such algorithms each task is assigned with a to different processing elements of distinct computers, transfer priority and three steps of task selection, processor selection data. In fact, task scheduling in distributed heterogeneous and status update are repeated until all tasks are scheduled. In systems are more complex in which each task can have the task selection phase the unscheduled task with the highest different execution time on different processing elements, so priority is selected. In the processor selection phase, the scheduling algorithms for a Grid system should consider the selected task is assigned to the processor that minimizes a execution time of each task on different processing elements predefined cost criterion that can be minimizing the schedule and even one incorrect decision can restrict the system length. At last in status update phase, the status of the system is performance to the slowest processing element [2]. updated. Examples of list-based algorithms are: Heterogeneous There are two kinds of scheduling algorithms: static scheduling Earliest Finish Time (HEFT) [9], Critical Path on a Processor algorithms and dynamic scheduling algorithms. In static (CPOC) [9], Critical Path on a Cluster (CPOC) [5], Dynamic 335 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 Level Scheduling (DLS) [8], Modified Critical Path (MCP) [10], Mapping Heuristic (MH) [3], Dynamic Critical Path (DCP), and Longest Dynamic Critical Path (LDCP) [2]. p0 t0 t2 0 6 15 III. PROBLEM DEFINITION p1 t1 Idle t3 In static task scheduling in Grid system, the execution precedence between tasks is represented by a Directed Acyclic 0 4 15 20 23 Graph (DAG), each DAG is shown by tupple (T, E) where T is Figure 2. Schedule length of the presented DAG in Fig. 1. on two processors a set of n tasks and E is a set of e edges. Each ti ∈ T represents a task and each ei, j = (ti , t j ) ∈ E represents the execution precedence between the two tasks which are 6 n0 n1 5 n0 n1 4 connected with the edge ei, j . 6 2 2 1 1 If (ti , t j ) ∈ E then the execution of task t i ∈T cannot be started before task finishes its execution. For the edge (t i , t j ) , 3 n2 9 3 n2 8 the source task t i is parent of the sink task t j , while t j is a 5 5 child of t i . A task with no parents is called an entry task and a task with no children is called an exit task. Associated with n3 7 n3 3 each edge (t i ,t j ) is a value d i , j that represents the amount of data to be transmitted from task to task t j , and in some cases it (a) (b) also represents the minimum time that a task needs to wait for Figure 3. Tasks computation time on each processor that will be acquired starting after task t i finishes its execution. from cost matrix in Fig. 1. A Grid system is represented by a set P of m processors, a Assigning task priorities in Grid system the efficiency of set T of n tasks and n × m computation cost matrix ( Wn×m ). list-based scheduling algorithms depends on the methods which Each element w i , k ∈ W ,1 ≤ i < n ,1 ≤ k ≤ m represents the assign priorities to tasks. execution time (computation cost) of task t i on processor Pk . We have the same assumption as LDCP that all processors are In our suggested algorithm LDCP+, if selecting a task in fully connected and communications between processors occur one step of scheduling causes the minimum schedule length we via independent communication units [2], so, we can have task assign a high priority to that task. There are some basic execution and data transferring in parallel. Also the data definitions which are used in LDCP algorithm and because LDCP+ is the result of optimization of LDCP, we decided to transfer rate between any two processors on the network is represent this basic knowledge too. assumed to be fixed and constant as same as LDCP. The communication cost between two processors is represented by n × n matrix ( Dn×n ). d i , j ∈ D is zero if two tasks t i and t j of B. Definition 2 and are scheduled on the same processor and it is equal to Critical Path: For a given DAG, the Critical Path (CP) is communication cost (non zero) in the other case. A task can defined as the path from an entry task to an exit task for which start its execution on a processor only when all data from its the sum of the computation costs of tasks and the parent become available to that processor. The goal of our communication costs of edges is maximal. algorithm is to assign tasks in processors in a way that minimizes the total finish time, or the schedule length. IV. LDCP: LONGEST DYNAMIC CRITICAL PATH A. Definition 1 A. Definition 3 Schedule Length: The maximum execution time of the Longest Dynamic Critical Path: Given a DAG with n tasks processors or the finish time of the final task after task and e edges and a Grid system with m processors, DCP during scheduling is called scheduled length. There is a DAG and a a particular scheduling step is a path of tasks and edges from an computation cost matrix with two processors as shown in entry task to an exit task. Fig.1. The schedule length is computed in Fig.2. and it is equal LDCP is the largest DCP, considering that communication to 23. costs between tasks scheduled on the same processors are t0 t1 Task p0 p1 assumed zero, and the execution constraints are preserved. 2 1 t0 6 5 Fig.3. represents two dynamic critical paths. First path in 3 Fig.3.a. is composed of tasks t 0 , t 2 and t 3 which is scheduled t2 t1 6 4 on processor p0 and has the length of 29. The second DCP in 5 t2 9 8 Fig.3.b. is composed of tasks t 0 , t 2 and t 3 which is scheduled t3 t3 7 3 on processor p1 and has the length of 23, so at the first step of (a) (b) scheduling, LDCP is composed of tasks t 0 , t 2 and t 3 and with Figure 1. An example of (a) DAG (b) computation matrix the schedule length of 29. 336 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 V. LDCP+: THE PROPOSED ALGORITHM 5) Definition 8 In the algorithm of LDCP+, each scheduling iteration KeyNodeSet: This set includes KeyNodes that are selected includes three phases below: for scheduling and in the first scheduling iteration it can include several tasks, but in other iterations it has only one task 1. Task selection for scheduling and in the first scheduling iteration it can include several tasks, but in other iterations it has only one 2. Processor selection phase task. 3. Status update phase 6) Definition 9. These 3 phases will be accomplished for each task until last Least Execution Time (LET): Least Execution Time is input task is selected for scheduling. defined as A. Task Selection Phase min{ processTim e( pk ) + wi ,k + d i , j } (3) LDCP+ selects a set of tasks that play main role in where processTim e ( p k ) is the time that find scheduled determining schedule length. task on processor p k finishes its execution, w i , k is the In first step of this phase, DAG of each processor is computation time of task corresponded to i on processor k, and required for scheduling. d i , j is the communication time between t i and t j . If both t i and t j are scheduled on processor p k , then communication 1) Definition 4 cost between them will be assumed zero. After computing Directed Graph: With the assumption of having a DAG URankSet, the destined task for scheduling algorithm is the including n tasks, e edges and a Grid system with m processors task corresponding to existing KeyNode in URankSet. In the ( p0 , p1 ,..., pm ), DAGP is the directed acyclic graph that k first iteration to obtain minimum execution time on available corresponds to processor p k . The computation cost of each processing elements, if the number of entry tasks is equal or task in the processor p k , is represented by a number on the less than processors number, all entry tasks will be consider as related node of the DAGPk . KeyNode, in other case, as same as the number of processors, DAGP0 is shown in Fig.3.a. and DAGP1 is shown in tasks with maximum URanks will be selected as KeyNodes and Fig.3.b. These figures are related with the DAG and the Grid place in KeyNodeSet. In the next iterations, KeyNodeSet system which is represented in Fig.1. Trough the course of this merely includes one KeyNode (a set with one member). paper, ti is used to refer to the i'th task in directed acyclic graph and the node n i identifies task t i in DAGPk . The number B. Processor Selection Phase associated with this node represents the computation cost of In this phase, selected task will be assigned to a processor task ti on processor pk. In each DAGPk , all nodes are assigned in the way to gain the minimum schedule length in each with a number named UpwardRank (URank). URanks are used iteration of scheduling. Therefore, in LDCP+, these stages will to determine tasks priorities in DAGPk . be passed: As mentioned above, in the first scheduling iteration, KeyNodeSet can have more than one KeyNode. For 2) Definition 5 the purpose of optimizing LDCP algorithm, LDCP+ computes URank: UpwardRank of i'th node ( n i ) in DAGPk is distinct permutation of tasks, which their corresponding defined recursively as KeyNodes are available in KeyNodeSet, on different processors and the permutation with the minimum average URankk (ni ) = wi ,k + max nl ∈succk ( ni ){ck (ni , nl ) + URank (nl )} (1) execution time on processors will be the first assignment of where succk (ni ) is a set of immediate successors of node tasks to processors. This average execution time can be achieved from n i , ck ( ni , nl ) is the communication cost between nodes n i and nl in DAGPk , and wi , k is the computation cost of ⎧ m−1 ⎫ t i on processor p k . ⎪ ∑ wi ,k ⎪ ⎪ ⎪ min ⎨ k =0 ⎬ (4) 3) Definition 6 ⎪ m ⎪ URankSet: Each element of URankSet is defined as ⎪ ⎩ ⎪ ⎭ m −1 Where i is the number of tasks corresponding to their Max {∑ URank k ( ni )} (2) KeyNodes, w i ,k ∈w and m is the number of processors. In k =0 the next iterations, the only available KeyNode in KeyNodeSet is selected to be scheduled. where URank k (n i ) is URank (n i ) in DAGPk . 1) Definition 10 4) Definition 7 Idle Space: In a processor when there is a gap between the KeyNode: KeyNode is the node that has the maximum start time of a task and the end time of the previous task, that URank in URankSet. Corresponded task to this node is used as interval time is called idle space. selected task for scheduling algorithm. 337 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 2) Definition 11 Replacement Ability: One task can be placed in an idle Establish DAGP for all processors in the system where 0 ≤ k ≤ m − 1 k space when parents of that task have been terminated before the Calculate URanks for all DAGPk start time of the task. If any of its parents have been scheduled Compute the URankSet While there are unscheduled tasks in task list do on a different processor, the required time for transferring data Find the KeyNode(s) in the URankSet between processors should be mentioned. Put the KeyNode(s) in KeyNodeSet If there is a processor with the idle space and selected task If (it’s the first step of scheduling) then Choose the processors which have the minimum permutation; has the ability of locating in that space (replacement ability), Else that processor will be selected. At the end of this phase, If (there is any processor with idle time and the task have the LDCP+ algorithm uses duplication process to decrease the replacement ability) then schedule length. With this definition, after selecting the Selected the processor; processor if the selected task has a parent scheduled on a Else different processor and the selected processor has an idle space Compute the finish time of the selected task on every system; before the start time of the selected task, then duplication Find and select the processor that minimizes the finish time of the process in the idle space will be used (regarding to the Selected task; replacement ability). End if Duplicate the parent(s) of the selected task if needed; 3) Definition 12 End if Duplication Process: Duplication process is repeating the Assign the selected task to the selected processor; Update the selected processor time; execution of one task on other processors. Update the URANK set; Update unscheduled task list; C. Status Update Phase End while After selecting the task and assigning it to a processor, Figure 4. LDCP+ algorithm appropriate URank with the selected task will be deleted from URankSet. Finish process time of the selected processor will be updated after the task has been assigned to the processor. Selected task will be deleted from the list of unscheduled tasks. Task p1 p2 p3 LDCP+ algorithm is proposed in Fig.4. 1 14 16 9 2 13 19 18 VI. CASE STUDY 3 11 13 19 4 13 8 17 In this section, execution results of CPOP, HEFT and 5 12 13 10 LDCP+ algorithms are compared in the case of having non 6 13 16 9 monotonic computation cost matrix. A Grid system compose of 7 7 15 11 three single-processor computers (m=3), fifteen tasks (n=15), a 8 5 11 14 non monotonic computation cost matrix and a DAG with 9 18 12 20 10 21 7 16 communication costs assigned to graph edges are shown in Fig.5. which also presents scheduling results of the mentioned DAG, executed by HEFT, CPOP and LDCP+ algorithms. (a) (b) Execution results of LDCP and LDCP+ are compared according to monotonic computation cost matrix. A Grid system with the parameters m=2 and n=10, a monotonic p1 p2 p3 p1 p2 p3 computation cost matrix and a DAG with communication costs 0 assigned to graph edges are shown in Fig.6. Fig.6 also shows n1 n1 scheduling results of the mentioned DAG presented in Fig6.b, 10 executed by LDCP and LDCP+ scheduling algorithms. n3 20 n4 n2 n4 30 n5 n3 n2 n6 n5 40 n7 n7 n6 50 60 n8 n8 n9 70 n9 80 n10 n1 0 90 (c) (d) 338 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 step Selected Selected step Selected task processor Selecte processo 1 t2 p0 d task r 2 t1 p1 1 n1 p1 3 t4 p1 2 n4 p2 4 t9 p0 3 n3 p1 5 t5 p0 4 n2 p3 6 t3 p0 5 n5 p2 7 t7 p1 6 n6 p3 8 t6 p1 7 n9 p2 9 t8 p0 8 n7 p1 10 t11 p0 9 n8 p1 11 t10 p1 10 n10 p2 (e) (f) (e) (f) Figure 5. Scheduling results for HEFT, CPOP, LDCP+ algorithms. (a) A Figure 6. Scheduling results for LDCP and LDCP+ algorithms. (a) A graph graph with 10 tasks. (b) Graph cost matrix. (c) HEFT Scheduling algorithm with 11 tasks. (b) Graph cost matrix (c) tasks execution sequence in LDCP with schedule length of 80. (d) CPOP algorithm with schedule length of 89. algorithm (d) LDCP algorithm with schedule length of 64 (e) tasks execution (e) LDCP+ algorithms with schedule length of 68. (f) Tasks execution sequence in LDCP+ algorithm. (f) LDCP+ algorithm with schedule length of sequence in LDCP+ algorithm. Duplicated tasks: 61.5 VII. CONCLUSION AND FUTURE WORK task p0 p1 t1 4 6 In Grid systems, task scheduling is an important problem in t2 15 22.5 the domain of optimizing heterogeneous distributed systems. In t3 4 6 this paper a new heuristic scheduling algorithm, named t4 13 19.5 LDCP+, is proposed. This algorithm has optimized LDCP t5 10 15 algorithm that better result are attained for schedule length by t6 7 10.5 improving these three phases: task selection phase, processor t7 8 12 selection phase and status update phase. LDCP+ can schedule t8 4 6 tasks in Grid systems in both case of having monotonic and t9 12 18 non monotonic cost matrix. Using duplication process for t10 6 9 optimizing priority assigns to tasks and also using idle spaces t11 9 13.5 of processors will result in having better schedule length rather than other scheduling algorithms. In real time environment, the assignment of resources such as processors in a specific time is (a) (b) so important. More works can be done to improve algorithms with less computation cost for such environments. REFERENCES step Selected Selected [1] S. Bansal, P. Kumar, and K. Singh. An improved duplication strategy for task processor scheduling precedence constrained graphs in multiprocessor systems. In 1 t2 p0 IEEE Transactions on Parallel and Distributed Systems 14(6), pages 2 t1 p1 533-544, 2003. 3 t4 p1 [2] M. I. Daoud and N. N. Kharma. A high performance algorithm for static 4 t9 p0 task scheduling in heterogeneous distributed computing systems. In 5 t5 p0 Journal of Parallel and Distributed Computing 68(4), pages 399-409, 6 t3 p0 2008. 7 t7 p1 [3] H. El-Rewini and T. G. Lewis. Scheduling parallel program tasks onto 8 t6 p1 arbitrary target machines. Journal of Parallel and Distributed Computing 9 t8 p0 9(2), pages 138-153, 1990. 10 t10 p0 [4] E. Ilavarasan, P. Thambidurai, and R. Mahilmannan. Performance 11 t11 p0 effective task scheduling algorithm for heterogeneous computing system. 4th International Symposium on Parallel and Distributed Computing, 0:28-38, 2005. [5] J. Kim, J. Rho, J.-O. Lee, and M.-C. Ko. Cpoc: Effective static task (c) (d) scheduling for grid computing. In Proceeding of the 2005 International [6] Conference on High Performance Computing and Communcations, pages 477-486, 2005. 339 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 [7] Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), pages 406-471, 1999. [8] Y. kwong Kwok, I. Ahmad, and I. Ahmad. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE Transactions on Parallel and Distributed Systems 7(5), pages 506-521, 1996. [9] G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection constrained heterogeneous processor architectures. IEEE Transaction on Parallel and Distributed Systems 4(2), pages 175-187, 1993. [10] H. Topcuoglu, S. Hariri, and W. Min-You. Performance-effective and low complexity task scheduling forheterogeneous computing. IEEE Transaction on Parallel and Distributed Systems 13(3), pages 260-274, 2002. 340 http://sites.google.com/site/ijcsis/ ISSN 1947-5500