Scheduling of Workflows in Grid Computing with Probabilistic Tabu Search
The International Journal of Computer Science and Information Security is a monthly periodical on research articles in general computer science and information security which provides a distinctive technical perspective on novel technical research work, whether theoretical, applicable, or related to implementation. Target Audience: IT academics, university IT faculties; and business people concerned with computer science and security; industry IT departments; government departments; the financial industry; the mobile industry and the computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. Thanks for your contributions in July 2010 issue and we are grateful to the reviewers for providing valuable comments. IJCSIS July 2010 Issue (Vol. 8, No. 4) has an acceptance rate of 36 %.

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
Scheduling of Workflows in Grid Computing
with Probabilistic Tabu Search
R. Joshua Samuel Raj Dr. V. Vasudevan
CSE, VV College of Engineering Prof. & Head/IT, Kalasalingam University
Tirunelveli, India Srivilliputur, India
joshuasamuelraj@gmail.com drvvmca@yahoo.com
Abstract: the same way that the Internet did in yesterdays
economy, paving the way for numerous research
In Grid Environment the number of resources efforts in grid scheduling mechanisms
and tasks to be scheduled is usually variable and Grid Computing is our greatest hope for
dynamic in nature. This characteristic emphasizes the
delivering computing as utility to homes and offices.
scheduling approach as a complex optimization
problem. Scheduling is a key issue which must be solved Many large scale applications such as scientific,
in grid computing study and a better scheduling scheme engineering and business problems (Hai et al., 2005;
can greatly improve the efficiency.The objective of this Cannataro et al., 2002) are solved effectively using
paper is to explore the Probabilistic Tabu Search to the logical amalgamation of geographically dispersed
promote compute intensive grid applications to Grid resources (Bernan et al., 2002). Grid computing,
maximize the Job Completion Ratio and minimize analogous to the pervasive electrical power grid,
lateness in job completion based on the comprehensive enables resource sharing and cooperative work
understanding of the challenges and the state of the art among distributed computational sites.
of current research. Experimental results demonstrate
In grid environment, applications are often described
the effectiveness and robustness of the proposed
algorithm. Further the comparative evaluation with as workflows. A workflow is composed of atomic
other scheduling algorithms such as First Come First tasks that are processed in specific order to fulfill a
Serve (FCFS), Last Come First Serve (LCFS), Earliest complicated goal. Generally, grid workflows require
Deadline First (EDF) and Tabu Search are plotted. huge intensive computing and process larger data,
compared with traditional workflows. Therefore, the
Key words: grid computing, workflow, Tabu Search, performance of grid workflows becomes a critical
scheduling problem, Probabilistic Tabu Search issue of the workflow management systems. One of
the most challenging problems is to map each task
INTRODUCTION to a corresponding service instance to achieve the
customers’ quality of service (QoS) requirements as
Grid Computing a pioneer technique in well as to accomplish high performance of the
harnessing the geographically dislocated computer workflow. This problem is found to be NP-complete.
power has changed the perception on the utility and During the course of grid scheduling there are many
availability of the computer power, which has carved challenges that require the simultaneous optimization
a new technology that openly ventures and of several incommensurable and competing
amalgamates an infinite number of computing objectives.
devices into any grid environment, augmenting to the • Unpredictable challenges in Grid resources
computing capability and providing resolutions to the • Inevitability to multiple resource types for
various tasks within the operational grid environment completing a job
basically by enabling, sharing, selection and • Necessitate for a parallel or concurrent
aggregation of geographically distributed execution of tasks in any workflows.
autonomous resources dynamically at runtime,
depending on their availability, capability, Under the OSGA, the workflow scheduler
performance and cost, thereby shifting the focus to has to balance several QoS requirements, including
collaborative environments, federating services and makespan and cost. Consequently, many traditional
exchanging transactions in a mutual manner to share workflow scheduling algorithms, such as
resources and thereby achieve common goals to Opportunistic Load Balancing, Minimum Completion
enhance productivity and speed up progress in much Time, Min-min, Max-min and Duplex, are not
314 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
suitable since they only tackle the makespan Literatures have also presented a scheduling
requirement. approach for the economics-driven grids to optimize
In recent years, a number of researches have the cost under the deadline constraint. In fact, a
been focused on scheduling problem involving more mixed-integer non-linear programming algorithm
than one QoS requirements. The traditional System was introduced to optimize the cost with the
namely advanced reservations for scheduling the consideration of other QoS requirements. As the
workflows undergoes problems such as overloading scale of workflow applications becomes larger and
and power failure. The overloading and the scheduler larger, conventional deterministic approaches may
failure problem are overridden by a two level fail to give a satisfying solution. Moreover in Grid
scheduling scheme where the first level is used for scheduling problem, for most practical applications,
frequent small jobs and second level for large jobs. any scheduler delivering good quality planning of
The market oriented approach algorithm succeeded in jobs would suffice rather than searching for
distributed scheduling of workflows, but could not optimality. In fact, in highly dynamic Grid
appease completion of more workflows within the environment, there is no possibility to even define
deadline. The success ratio of the workflows allotted optimality of planning as it is defined in
for mapping the Grid sites is 30% (Chien et al., 2005) combinatorial optimization. This is due to the fact
when 30 workflows are scheduled at a time. that Grid schedulers run as long as the Grid system
Workflows submitted to the Computational exists and thus the performance is measured not only
Grids by resource consumers have a proper budget for particular applications but also in the long run. It
proposal, client authentication and the requirements is well known that meta-heuristics are able to
for its execution as shown in Fig 1. The willingness compute in short time high quality feasible solutions.
to complete any job is given by resource providers. Therefore, meta-heuristic algorithms have been
Hence the Grid schedulers search for solutions in the receiving growing interests due to their powerful
state space aiming at achieving high performance, global search capability.
both in terms of solution quality and execution speed. From the above exposition we are motivated
and in this paper we apply the probabilistic Tabu
search algorithm for the generalized Grid Scheduling
Grid Client’s Job Submission problem. The basic idea behind the algorithm is to
use preprocessing operations to arrive at a probability
value for each vertex which roughly corresponds to
Client Name Jeny its probability of being included in an optimal
solution, and to use such probability values to shrink
the size of the neighborhood of solutions to
Password ****** manageable proportions. We report results from
computational experiments that demonstrate the
superiority of this method over the generic Tabu
CPU Power 30T flods search method.
PROBLEM DESCRIPTION
Memory 19MB
The Super Schedule (SSGA) Grid
Architecture described with eight nodes Grid
Dead Line 12/09/07 environment example is shown in the Fig 2. This
architecture can be utilized for any practical
applications for the normal grid environments. The
Quality of Service Best Effort Service setup is experimented in TIFAC Core in Network
Engineering under DST project.
The goal of the SSGA is to find the
allocation sequence of workflows on each Grid site.
Submit Four major entities are involved in this architecture.
• The grid users submit their request for job
Fig. 1: Job submission blueprint completion to the local grid managers.
• All the tasks should be received by the grid
Literatures have proposed a grid workflow managers and the decision for the
scheduling algorithm in which cost is optimized with scheduling is made on deploying the request
the expectation to minimize the makespan. to the Intra Grid schedulers.
315 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
• The Intra-Grid schedulers have the updated
information of the grid resources that are
idle during time t. This information is
frequently updated. The smaller jobs can be
scheduled within their deadlines by the
Intra-Grid schedulers in their respective
Administrative Domains. Here scheduling is
often dynamic.
• For data intensive applications where the
jobs are larger it requires the necessity of the
resources worldwide. At that moment, there
Fig 3: DAG workflow model
The duration for any workflow, penalty cost
incurred and the required grid resources are shown in
the Table 1.The tasks taken for experiment have their
predecessors and successors, such as T1 follow T2 or
T2, T3 and are parallel computations once the task
T1 is executed.
Table 1: Experimental work flows
The Workflow model for W1, W2, W3 are
shown in Fig. 3. The FCFS map tasks to the idle Grid
sites based on first task arrival to serve first. The EDF
is a necessity of Inter-Grid schedulers which algorithm executes the tasks whose absolute deadline
is static often. is the earliest. Hence it estimates the execution
deadline of the individual workflow for any
Fig 2: Super Schedule Grid Architecture standalone system and schedules such that the
workflows that require greater completion time is
The workflow allocation strategy in a Grid served first. In EDF the task priorities are not fixed
environment differs from the traditional ones. The but change depending on the closeness of their
goal of the Inter-Grid Scheduler is to receive the absolute deadline.
request from different Intra-Grid Schedulers and The settings of the experiment consist of
make an optimistic scheduling such that it workflows with following assumptions:
accommodates many workflows completing within Each workflow received in the Inter-Grid
its deadline. The following DAG workflows and the Scheduler consists of a set of Tasks T1, T2,
penalty cost for each workflow are considered for T3 and so on.
experimental purpose.
316 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
The task in each workflow is a Directed solution in the next iteration, to points in the solution
Acyclic Graph (DAG) model. (Fig. 3.) space previously visited.
The output from a task can be transferred to In order to improve the efficiency of the
other tasks as per the DAG graph model and exploration process, one needs to keep track not only
all jobs are available at time zero. of local information (like the current value of the
At any time a task can be executed only on a objective function) but also of some information
Grid site which is reported to the Inter-Grid related to the exploration process. This systematic
scheduler as idle via Intra-Grid scheduler. use of memory is an essential feature of Tabu search
There is no pre-emption of tasks or (TS). While most exploration methods keep in
workflows. memory essentially the value f(i*) of the best solution
The sequential order of workflow allotment i* visited so far, TS will also keep information on the
changes. itinerary through the last solutions visited. Such
Here we present a scheduling approach for the information will be used to guide the move from i to
wide area problem where in the resources and jobs the next solution j to be chosen in N(i). The role of
are dispersed geographically. the memory will be to restrict the choice to some
subset of N(i) by forbidding for instance moves to
some neighbor solutions. More precisely, we will
PROPOSED METHOD OF PTS notice that the structure of the neighborhood N(i) of a
solution i will in fact be variable from iteration to
In this study, PTS heuristic to solve scientific iteration.
workflow scheduling problem in Grid is discussed. The main problem with such a tabu search
The roots of Tabu search go back to the 1970's; it was algorithm is the size of the the neighborhood, for
first presented in its present form by Glover [Glover, each solution. Thus generic Tabu search is able to
1986]; the basic ideas have also been sketched by execute only a few iterations within reasonable
Hansen [Hansen 1986]. Additional efforts of execution times and therefore alleviating the
formalization are reported in [Glover, 1989], [de complexity of matching a job to the appropriate
Werra & Hertz, 1989], [Glover, 1990]. Many resource in the shortest time possible. The
computational experiments have shown that tabu Probabilistic Tabu search for Grid scheduling
search has now become an established optimization addresses this concern.
technique which can compete with almost all known
techniques and which - by its flexibility - can beat SOLUTON CONSTRUCTION
many classical procedures.
The generic TS is a metaheuristic strategy based The structure of Probabilistic Tabu search is
on neighborhood search with overcoming local as shown below. The basic idea is to look at only a
optimality. It works in a deterministic way trying to subset of the neighborhood of each solution which
model human memory processes. Memory is has the maximum likelihood of containing the best
implemented by the implicit recording of previously tabu and non-tabu neighbors. The belief is that a
seen solutions, using simple but effective data large enough set of locally optimal solutions
structures. This approach focuses on the creation of a collectively contain predominantly those features that
Tabu list of moves that have been performed recently are present in globally optimal solutions and rarely
and are forbidden to be performed for a certain contain features that are absent in globally optimal
number of iterations, thereby helping to avoid cycling solutions. In this approach, a pre-defined number of
and promoting search in a diversified space. At each starting solutions are chosen from widely separated
iterations, TS moves to the best solution that is not regions in the sample space, and used in local search
forbidden and thus independent of local optima procedures to obtain a set of locally optimal
The generic TS introduce flexible memory solutions. These locally optimal solutions are then
structures articulating strategic restrictions and examined to provide an idea about the probability of
aspiration levels as a mean for exploiting search each solution being included in an optimal solution.
spaces. TS have the ability to generate solutions of Using this idea, the neighborhood of each solution is
notably high quality such as to escape from the local searched in a probabilistic manner.
minima and to implement an explorative strategy. TS
are an iterative procedure for searching a global General Scheme of PTS: The structure of PTS
optimum for discrete combinatorial problem. The algorithm is formalized as shown below.
philosophy of TS is to avoid entrainment in cycles by
forbidding or penalizing moves, which take the Step 0 (Generating Probabilities): Generate a set of
s solutions S = {S1,S2, . . . , Ss} using an extension to
317 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
local search method to obtain a local optimum. For The comparative increase in the completion
each solution Si compute the associated probability pi of workflows by PTS dual objective scheduling
Go to Step 1. mechanism considering other algorithms such as
FCFS, EDF and TS are shown in Fig 4 and Fig 5.
Step 1 (Initialization): Define all solution elements
as non-tabu. Choose an initial solution S, set 140
BestSolution ← S, and set Iteration ← 1. Go to 120
Step1.
100
FCFS
JCR OF (5)
Step 2 (Termination): If a pre-defined termination 80
EDF
TS
condition is satisfied, output BestSolution and exit. 60
PTS
Else go to Step 3. 40
20
Step 3 (Iteration): Consider each neighbor N of S
0
with a probability of (1−pi)pj where vi = S \ N and vj 0 10 20 30 40 50 60 70
= N \ S. If vi or vj is marked ‘tabu’ then N is a tabu NO. OF WORK FLOWS
neighbor, otherwise it is a ‘non-tabu’ neighbor. If the Fig 4: Job completion ratio
best tabu neighbor considered has a cost lower than
the cost of BestSolution, go to Step 4, else replace S It can be analyzed that PTS outperforms TS
by the best non-tabu neighbor considered. Mark the in the number of workflow completions. In Table 2,
solution elements participating in this move (i.e. the the penalty cost incurred by the Inter-Grid scheduler
vertex that has left the solution, and the vertex that on not completing the job is plotted. As per the
has entered the solution to form the neighbor) as tabu methodology PTS succeeds the other scheduling
for the next TENURE moves. If this best non-tabu mechanisms in consideration.
neighbor is better than BestSolution, replace
BestSolution with this neighbor. Set Iteration ← 300
Iteration + 1. Go to Step 2.
250
Step 4 (Aspiration): Replace BestSolution and S 200
No of work flows
with the tabu neighbor of S. Remove the tabu status PTS
DOF
150
for all solution elements. Set Iteration ← Iteration + TS
EDF
1. Go to Step 2. 100
FCFS
50
For every solution move in the TS
0
procedure, the neighborhood solution will be 1 2 3 4 5 6 7 8 9 10 11 12
evaluated for a Dual Objective Function of NO. OF WORKFLOWS
minimizing the total penalty cost on choosing the
Fig 5: DOF for PTS, TS, EDF and FCFS
workflow sequence and maximizing the number of
workflows completed within deadline (Job
Table 2: Penalty cost incurred for the workflow sequence
Completion Ratio). In our proposed method, the
workflows are created based on DAG model and the No of
deadline is fixed to be at 1.5 * Execution time. workflows FCFS EDF TS PTS
5 41.34 29 25 20.88
RESULTS AND DISCUSSION 10 43.75 35.78 29.94 27.63
The methodology is such that an initial job 15 45.67 42.87 33.78 30.84
sequence is selected at random among the set of job 20 56.45 45.78 40.82 37.62
sequences and the dual objective function for the 25 61.45 50.83 51.98 39.652
solution is defined as a best cost. The obtained 30 74.55 58.34 59.674 45.67
solution is recorded as initial step for the 35 84.3 73.46 68.3 50.64
Probabilistic Tabu Scheduling mechanism. Later, the 40 97.55 79.83 74 62.1
set of neighborhood solution of S is generated and 45 100.98 87.67 79.56 75.3
again the dual objective function (DOF) is calculated 50 108.3 97.25 85.65 82.5
and replaced if necessary finding the best cost among 55 112.7 106 99.32 89.41
the history record. 60 119.5 112.3 106.9 100.26
318 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
CONCLUSION AND FUTURE WORK J. Yu, R. Buyya, and C.K. Tham, “Cost-based scheduling of
scientific workflow applications on utility grids”, Proceedings of
In this paper, we have applied probabilistic tabu the 1st International Conference on e-Science and Grid Computing
search algorithm for the Generalized Grid Scheduling (e-Science’ 05), pp. 140-147, 2005.
problem. In this approach, a pre-defined number of M.M. López, E. Heymann, M.A. Senar, “Analysis of dynamic
starting solutions are chosen from widely separated heuristics for workflow scheduling on grid systems”, in
regions in the sample space, and used in local search Proceedings of the Fifth International Symposium on Parallel and
procedures to obtain a set of locally optimal Distributed Computing (ISPDC’06), IEEE, 2006.
solutions. These locally optimal solutions are then A. Afzal, J. Darlington, A.S. McGough, “QoS-constrained
examined to provide an idea about the probability of stochastic workflow scheduling in enterprise and scientific grids”,
being included in an optimal solution. Using these The 7th IEEE/ACM International Conference on Grid Computing,
ideas, the neighborhood of each solution is searched 2006, pp. 1-8.
in a probabilistic manner. Our computational
experience shows us that this probabilistic tabu
Name:
search method outperforms generic tabu search most R. Joshua Samuel Raj
of the time.
In the near future we plan to combine Affiliation:
Probabilistic Tabu search with simulated annealing Assistant Professor / CSE
VV College of engineering.
along with sharing method to increase the efficiency.
Similarly the ant colony properties can be included Brief Biographical History:
for scalability in the existing algorithm. The 2005 -Graduated in 2005 from the Computer Science and
procedure can also suitably be modified and applied Engineering Department from PETEC under Anna University
to any kind of Grid scheduling with different problem 2007 -Received M.E Degree in Computer Science and Engineering
environment and optimize any number of objectives from Jaya College of Engineering under Anna University
2009 Working towards the Ph.D degree in the area of Grid
concurrently. scheduling under Kalasalingam University
REFERENCES Main Works:
Grid computing, Mobile Adhoc Networking, Multicasting and so forth
E.H.L. Aarts, P.J.M. van Laarhoven, J.K. Lenstra, and N.L.J.
Ulder, “A Computational Study of Local Search Algorithms for
Job Shop Scheduling", ORSA Journal on Computing 6, (1994)118- Name:
125. V. Vasudevan
Affiliation:
I. Foster and C. Kesselman, The grid: Blueprint for a future Director, Software Technologies Lab, TIFAC
computing infrastructure, San Mateo, CA: Morgan Kaufmann, Core in Network Engineering,
1999. Srivilliputhur, India
M. Maheswaran, et al., “Dynamic mapping of a class of
Brief Biographical History:
independent tasks onto heterogeneous computing systems”,
1984- M.Sc in Mathematics and worked for several areas towards
Journal of Parallel and Distributed Computing, Vol. 59, 1999, pp.
Representation Theory
107-131.
1992 Received his Ph.D. degree in Madurai Kamaraj University
2008- the Project Director for the Software Technologies Group of
R. Buyya, D. Abramson, and J. Giddy, “A case for economy grid
TIFAC Core in Network Engineering and Head of the Department
architecture for service oriented grid computing”, 10th
for Information Technology in Kalasalingam University,
Heterogeneous Computing Workshop (HCW’ 2001), 2001.
Sirivilliputhur, India
I. Foster, C. Kesselman, S. Tuecke, “The Anatomy of the Grid:
Enabling Scalable Virtual Organizations”, Intl J. Supercomputer Main Works:
Applications, 2001. Grid computing, Agent Technology, Intrusion Detection system,
Multicasting and so forth
H. XiaoShan, S. XiaoHe, “QoS guided min-min heuristic for grid
task scheduling”, Journal of Comput. Sci. & Technol., Vol. 18, No.
4, 2003, pp. 442-451.
Diptesh Gosh, “A Probabilistic Tabu Search algorithm for the
Generalized Minimum Spanning Tree Problem” Published in 2003,
Indian Institute of Management (Ahmedabad)
A. A. Mandal, et al. “Scheduling strategies for mapping
application workflows onto the grid”, in Proceedings of the 14th
IEEE International Symposium on High Performance and
Distributed Computing (HPDC-14), 2005, pp. 125-134.
319 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "