VIEWS: 109 PAGES: 6 CATEGORY: Emerging Technologies POSTED ON: 8/13/2010 Public Domain
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 Scheduling of Workflows in Grid Computing with Probabilistic Tabu Search R. Joshua Samuel Raj Dr. V. Vasudevan CSE, VV College of Engineering Prof. & Head/IT, Kalasalingam University Tirunelveli, India Srivilliputur, India joshuasamuelraj@gmail.com drvvmca@yahoo.com Abstract: the same way that the Internet did in yesterdays economy, paving the way for numerous research In Grid Environment the number of resources efforts in grid scheduling mechanisms and tasks to be scheduled is usually variable and Grid Computing is our greatest hope for dynamic in nature. This characteristic emphasizes the delivering computing as utility to homes and offices. scheduling approach as a complex optimization problem. Scheduling is a key issue which must be solved Many large scale applications such as scientific, in grid computing study and a better scheduling scheme engineering and business problems (Hai et al., 2005; can greatly improve the efficiency.The objective of this Cannataro et al., 2002) are solved effectively using paper is to explore the Probabilistic Tabu Search to the logical amalgamation of geographically dispersed promote compute intensive grid applications to Grid resources (Bernan et al., 2002). Grid computing, maximize the Job Completion Ratio and minimize analogous to the pervasive electrical power grid, lateness in job completion based on the comprehensive enables resource sharing and cooperative work understanding of the challenges and the state of the art among distributed computational sites. of current research. Experimental results demonstrate In grid environment, applications are often described the effectiveness and robustness of the proposed algorithm. Further the comparative evaluation with as workflows. A workflow is composed of atomic other scheduling algorithms such as First Come First tasks that are processed in specific order to fulfill a Serve (FCFS), Last Come First Serve (LCFS), Earliest complicated goal. Generally, grid workflows require Deadline First (EDF) and Tabu Search are plotted. huge intensive computing and process larger data, compared with traditional workflows. Therefore, the Key words: grid computing, workflow, Tabu Search, performance of grid workflows becomes a critical scheduling problem, Probabilistic Tabu Search issue of the workflow management systems. One of the most challenging problems is to map each task INTRODUCTION to a corresponding service instance to achieve the customers’ quality of service (QoS) requirements as Grid Computing a pioneer technique in well as to accomplish high performance of the harnessing the geographically dislocated computer workflow. This problem is found to be NP-complete. power has changed the perception on the utility and During the course of grid scheduling there are many availability of the computer power, which has carved challenges that require the simultaneous optimization a new technology that openly ventures and of several incommensurable and competing amalgamates an infinite number of computing objectives. devices into any grid environment, augmenting to the • Unpredictable challenges in Grid resources computing capability and providing resolutions to the • Inevitability to multiple resource types for various tasks within the operational grid environment completing a job basically by enabling, sharing, selection and • Necessitate for a parallel or concurrent aggregation of geographically distributed execution of tasks in any workflows. autonomous resources dynamically at runtime, depending on their availability, capability, Under the OSGA, the workflow scheduler performance and cost, thereby shifting the focus to has to balance several QoS requirements, including collaborative environments, federating services and makespan and cost. Consequently, many traditional exchanging transactions in a mutual manner to share workflow scheduling algorithms, such as resources and thereby achieve common goals to Opportunistic Load Balancing, Minimum Completion enhance productivity and speed up progress in much Time, Min-min, Max-min and Duplex, are not 314 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 suitable since they only tackle the makespan Literatures have also presented a scheduling requirement. approach for the economics-driven grids to optimize In recent years, a number of researches have the cost under the deadline constraint. In fact, a been focused on scheduling problem involving more mixed-integer non-linear programming algorithm than one QoS requirements. The traditional System was introduced to optimize the cost with the namely advanced reservations for scheduling the consideration of other QoS requirements. As the workflows undergoes problems such as overloading scale of workflow applications becomes larger and and power failure. The overloading and the scheduler larger, conventional deterministic approaches may failure problem are overridden by a two level fail to give a satisfying solution. Moreover in Grid scheduling scheme where the first level is used for scheduling problem, for most practical applications, frequent small jobs and second level for large jobs. any scheduler delivering good quality planning of The market oriented approach algorithm succeeded in jobs would suffice rather than searching for distributed scheduling of workflows, but could not optimality. In fact, in highly dynamic Grid appease completion of more workflows within the environment, there is no possibility to even define deadline. The success ratio of the workflows allotted optimality of planning as it is defined in for mapping the Grid sites is 30% (Chien et al., 2005) combinatorial optimization. This is due to the fact when 30 workflows are scheduled at a time. that Grid schedulers run as long as the Grid system Workflows submitted to the Computational exists and thus the performance is measured not only Grids by resource consumers have a proper budget for particular applications but also in the long run. It proposal, client authentication and the requirements is well known that meta-heuristics are able to for its execution as shown in Fig 1. The willingness compute in short time high quality feasible solutions. to complete any job is given by resource providers. Therefore, meta-heuristic algorithms have been Hence the Grid schedulers search for solutions in the receiving growing interests due to their powerful state space aiming at achieving high performance, global search capability. both in terms of solution quality and execution speed. From the above exposition we are motivated and in this paper we apply the probabilistic Tabu search algorithm for the generalized Grid Scheduling Grid Client’s Job Submission problem. The basic idea behind the algorithm is to use preprocessing operations to arrive at a probability value for each vertex which roughly corresponds to Client Name Jeny its probability of being included in an optimal solution, and to use such probability values to shrink the size of the neighborhood of solutions to Password ****** manageable proportions. We report results from computational experiments that demonstrate the superiority of this method over the generic Tabu CPU Power 30T flods search method. PROBLEM DESCRIPTION Memory 19MB The Super Schedule (SSGA) Grid Architecture described with eight nodes Grid Dead Line 12/09/07 environment example is shown in the Fig 2. This architecture can be utilized for any practical applications for the normal grid environments. The Quality of Service Best Effort Service setup is experimented in TIFAC Core in Network Engineering under DST project. The goal of the SSGA is to find the allocation sequence of workflows on each Grid site. Submit Four major entities are involved in this architecture. • The grid users submit their request for job Fig. 1: Job submission blueprint completion to the local grid managers. • All the tasks should be received by the grid Literatures have proposed a grid workflow managers and the decision for the scheduling algorithm in which cost is optimized with scheduling is made on deploying the request the expectation to minimize the makespan. to the Intra Grid schedulers. 315 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 • The Intra-Grid schedulers have the updated information of the grid resources that are idle during time t. This information is frequently updated. The smaller jobs can be scheduled within their deadlines by the Intra-Grid schedulers in their respective Administrative Domains. Here scheduling is often dynamic. • For data intensive applications where the jobs are larger it requires the necessity of the resources worldwide. At that moment, there Fig 3: DAG workflow model The duration for any workflow, penalty cost incurred and the required grid resources are shown in the Table 1.The tasks taken for experiment have their predecessors and successors, such as T1 follow T2 or T2, T3 and are parallel computations once the task T1 is executed. Table 1: Experimental work flows The Workflow model for W1, W2, W3 are shown in Fig. 3. The FCFS map tasks to the idle Grid sites based on first task arrival to serve first. The EDF is a necessity of Inter-Grid schedulers which algorithm executes the tasks whose absolute deadline is static often. is the earliest. Hence it estimates the execution deadline of the individual workflow for any Fig 2: Super Schedule Grid Architecture standalone system and schedules such that the workflows that require greater completion time is The workflow allocation strategy in a Grid served first. In EDF the task priorities are not fixed environment differs from the traditional ones. The but change depending on the closeness of their goal of the Inter-Grid Scheduler is to receive the absolute deadline. request from different Intra-Grid Schedulers and The settings of the experiment consist of make an optimistic scheduling such that it workflows with following assumptions: accommodates many workflows completing within Each workflow received in the Inter-Grid its deadline. The following DAG workflows and the Scheduler consists of a set of Tasks T1, T2, penalty cost for each workflow are considered for T3 and so on. experimental purpose. 316 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 The task in each workflow is a Directed solution in the next iteration, to points in the solution Acyclic Graph (DAG) model. (Fig. 3.) space previously visited. The output from a task can be transferred to In order to improve the efficiency of the other tasks as per the DAG graph model and exploration process, one needs to keep track not only all jobs are available at time zero. of local information (like the current value of the At any time a task can be executed only on a objective function) but also of some information Grid site which is reported to the Inter-Grid related to the exploration process. This systematic scheduler as idle via Intra-Grid scheduler. use of memory is an essential feature of Tabu search There is no pre-emption of tasks or (TS). While most exploration methods keep in workflows. memory essentially the value f(i*) of the best solution The sequential order of workflow allotment i* visited so far, TS will also keep information on the changes. itinerary through the last solutions visited. Such Here we present a scheduling approach for the information will be used to guide the move from i to wide area problem where in the resources and jobs the next solution j to be chosen in N(i). The role of are dispersed geographically. the memory will be to restrict the choice to some subset of N(i) by forbidding for instance moves to some neighbor solutions. More precisely, we will PROPOSED METHOD OF PTS notice that the structure of the neighborhood N(i) of a solution i will in fact be variable from iteration to In this study, PTS heuristic to solve scientific iteration. workflow scheduling problem in Grid is discussed. The main problem with such a tabu search The roots of Tabu search go back to the 1970's; it was algorithm is the size of the the neighborhood, for first presented in its present form by Glover [Glover, each solution. Thus generic Tabu search is able to 1986]; the basic ideas have also been sketched by execute only a few iterations within reasonable Hansen [Hansen 1986]. Additional efforts of execution times and therefore alleviating the formalization are reported in [Glover, 1989], [de complexity of matching a job to the appropriate Werra & Hertz, 1989], [Glover, 1990]. Many resource in the shortest time possible. The computational experiments have shown that tabu Probabilistic Tabu search for Grid scheduling search has now become an established optimization addresses this concern. technique which can compete with almost all known techniques and which - by its flexibility - can beat SOLUTON CONSTRUCTION many classical procedures. The generic TS is a metaheuristic strategy based The structure of Probabilistic Tabu search is on neighborhood search with overcoming local as shown below. The basic idea is to look at only a optimality. It works in a deterministic way trying to subset of the neighborhood of each solution which model human memory processes. Memory is has the maximum likelihood of containing the best implemented by the implicit recording of previously tabu and non-tabu neighbors. The belief is that a seen solutions, using simple but effective data large enough set of locally optimal solutions structures. This approach focuses on the creation of a collectively contain predominantly those features that Tabu list of moves that have been performed recently are present in globally optimal solutions and rarely and are forbidden to be performed for a certain contain features that are absent in globally optimal number of iterations, thereby helping to avoid cycling solutions. In this approach, a pre-defined number of and promoting search in a diversified space. At each starting solutions are chosen from widely separated iterations, TS moves to the best solution that is not regions in the sample space, and used in local search forbidden and thus independent of local optima procedures to obtain a set of locally optimal The generic TS introduce flexible memory solutions. These locally optimal solutions are then structures articulating strategic restrictions and examined to provide an idea about the probability of aspiration levels as a mean for exploiting search each solution being included in an optimal solution. spaces. TS have the ability to generate solutions of Using this idea, the neighborhood of each solution is notably high quality such as to escape from the local searched in a probabilistic manner. minima and to implement an explorative strategy. TS are an iterative procedure for searching a global General Scheme of PTS: The structure of PTS optimum for discrete combinatorial problem. The algorithm is formalized as shown below. philosophy of TS is to avoid entrainment in cycles by forbidding or penalizing moves, which take the Step 0 (Generating Probabilities): Generate a set of s solutions S = {S1,S2, . . . , Ss} using an extension to 317 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 local search method to obtain a local optimum. For The comparative increase in the completion each solution Si compute the associated probability pi of workflows by PTS dual objective scheduling Go to Step 1. mechanism considering other algorithms such as FCFS, EDF and TS are shown in Fig 4 and Fig 5. Step 1 (Initialization): Define all solution elements as non-tabu. Choose an initial solution S, set 140 BestSolution ← S, and set Iteration ← 1. Go to 120 Step1. 100 FCFS JCR OF (5) Step 2 (Termination): If a pre-defined termination 80 EDF TS condition is satisfied, output BestSolution and exit. 60 PTS Else go to Step 3. 40 20 Step 3 (Iteration): Consider each neighbor N of S 0 with a probability of (1−pi)pj where vi = S \ N and vj 0 10 20 30 40 50 60 70 = N \ S. If vi or vj is marked ‘tabu’ then N is a tabu NO. OF WORK FLOWS neighbor, otherwise it is a ‘non-tabu’ neighbor. If the Fig 4: Job completion ratio best tabu neighbor considered has a cost lower than the cost of BestSolution, go to Step 4, else replace S It can be analyzed that PTS outperforms TS by the best non-tabu neighbor considered. Mark the in the number of workflow completions. In Table 2, solution elements participating in this move (i.e. the the penalty cost incurred by the Inter-Grid scheduler vertex that has left the solution, and the vertex that on not completing the job is plotted. As per the has entered the solution to form the neighbor) as tabu methodology PTS succeeds the other scheduling for the next TENURE moves. If this best non-tabu mechanisms in consideration. neighbor is better than BestSolution, replace BestSolution with this neighbor. Set Iteration ← 300 Iteration + 1. Go to Step 2. 250 Step 4 (Aspiration): Replace BestSolution and S 200 No of work flows with the tabu neighbor of S. Remove the tabu status PTS DOF 150 for all solution elements. Set Iteration ← Iteration + TS EDF 1. Go to Step 2. 100 FCFS 50 For every solution move in the TS 0 procedure, the neighborhood solution will be 1 2 3 4 5 6 7 8 9 10 11 12 evaluated for a Dual Objective Function of NO. OF WORKFLOWS minimizing the total penalty cost on choosing the Fig 5: DOF for PTS, TS, EDF and FCFS workflow sequence and maximizing the number of workflows completed within deadline (Job Table 2: Penalty cost incurred for the workflow sequence Completion Ratio). In our proposed method, the workflows are created based on DAG model and the No of deadline is fixed to be at 1.5 * Execution time. workflows FCFS EDF TS PTS 5 41.34 29 25 20.88 RESULTS AND DISCUSSION 10 43.75 35.78 29.94 27.63 The methodology is such that an initial job 15 45.67 42.87 33.78 30.84 sequence is selected at random among the set of job 20 56.45 45.78 40.82 37.62 sequences and the dual objective function for the 25 61.45 50.83 51.98 39.652 solution is defined as a best cost. The obtained 30 74.55 58.34 59.674 45.67 solution is recorded as initial step for the 35 84.3 73.46 68.3 50.64 Probabilistic Tabu Scheduling mechanism. Later, the 40 97.55 79.83 74 62.1 set of neighborhood solution of S is generated and 45 100.98 87.67 79.56 75.3 again the dual objective function (DOF) is calculated 50 108.3 97.25 85.65 82.5 and replaced if necessary finding the best cost among 55 112.7 106 99.32 89.41 the history record. 60 119.5 112.3 106.9 100.26 318 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 CONCLUSION AND FUTURE WORK J. Yu, R. Buyya, and C.K. Tham, “Cost-based scheduling of scientific workflow applications on utility grids”, Proceedings of In this paper, we have applied probabilistic tabu the 1st International Conference on e-Science and Grid Computing search algorithm for the Generalized Grid Scheduling (e-Science’ 05), pp. 140-147, 2005. problem. In this approach, a pre-defined number of M.M. López, E. Heymann, M.A. Senar, “Analysis of dynamic starting solutions are chosen from widely separated heuristics for workflow scheduling on grid systems”, in regions in the sample space, and used in local search Proceedings of the Fifth International Symposium on Parallel and procedures to obtain a set of locally optimal Distributed Computing (ISPDC’06), IEEE, 2006. solutions. These locally optimal solutions are then A. Afzal, J. Darlington, A.S. McGough, “QoS-constrained examined to provide an idea about the probability of stochastic workflow scheduling in enterprise and scientific grids”, being included in an optimal solution. Using these The 7th IEEE/ACM International Conference on Grid Computing, ideas, the neighborhood of each solution is searched 2006, pp. 1-8. in a probabilistic manner. Our computational experience shows us that this probabilistic tabu Name: search method outperforms generic tabu search most R. Joshua Samuel Raj of the time. In the near future we plan to combine Afﬁliation: Probabilistic Tabu search with simulated annealing Assistant Professor / CSE VV College of engineering. along with sharing method to increase the efficiency. Similarly the ant colony properties can be included Brief Biographical History: for scalability in the existing algorithm. The 2005 -Graduated in 2005 from the Computer Science and procedure can also suitably be modified and applied Engineering Department from PETEC under Anna University to any kind of Grid scheduling with different problem 2007 -Received M.E Degree in Computer Science and Engineering environment and optimize any number of objectives from Jaya College of Engineering under Anna University 2009 Working towards the Ph.D degree in the area of Grid concurrently. scheduling under Kalasalingam University REFERENCES Main Works: Grid computing, Mobile Adhoc Networking, Multicasting and so forth E.H.L. Aarts, P.J.M. van Laarhoven, J.K. Lenstra, and N.L.J. Ulder, “A Computational Study of Local Search Algorithms for Job Shop Scheduling", ORSA Journal on Computing 6, (1994)118- Name: 125. V. Vasudevan Afﬁliation: I. Foster and C. Kesselman, The grid: Blueprint for a future Director, Software Technologies Lab, TIFAC computing infrastructure, San Mateo, CA: Morgan Kaufmann, Core in Network Engineering, 1999. Srivilliputhur, India M. Maheswaran, et al., “Dynamic mapping of a class of Brief Biographical History: independent tasks onto heterogeneous computing systems”, 1984- M.Sc in Mathematics and worked for several areas towards Journal of Parallel and Distributed Computing, Vol. 59, 1999, pp. Representation Theory 107-131. 1992 Received his Ph.D. degree in Madurai Kamaraj University 2008- the Project Director for the Software Technologies Group of R. Buyya, D. Abramson, and J. Giddy, “A case for economy grid TIFAC Core in Network Engineering and Head of the Department architecture for service oriented grid computing”, 10th for Information Technology in Kalasalingam University, Heterogeneous Computing Workshop (HCW’ 2001), 2001. Sirivilliputhur, India I. Foster, C. Kesselman, S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, Intl J. Supercomputer Main Works: Applications, 2001. Grid computing, Agent Technology, Intrusion Detection system, Multicasting and so forth H. XiaoShan, S. XiaoHe, “QoS guided min-min heuristic for grid task scheduling”, Journal of Comput. Sci. & Technol., Vol. 18, No. 4, 2003, pp. 442-451. Diptesh Gosh, “A Probabilistic Tabu Search algorithm for the Generalized Minimum Spanning Tree Problem” Published in 2003, Indian Institute of Management (Ahmedabad) A. A. Mandal, et al. “Scheduling strategies for mapping application workflows onto the grid”, in Proceedings of the 14th IEEE International Symposium on High Performance and Distributed Computing (HPDC-14), 2005, pp. 125-134. 319 http://sites.google.com/site/ijcsis/ ISSN 1947-5500