Scheduling of Workflows in Grid Computing with Probabilistic Tabu Search by ijcsiseditor


More Info
									                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                               Vol. 8, No. 4, July 2010

    Scheduling of Workflows in Grid Computing
          with Probabilistic Tabu Search
                R. Joshua Samuel Raj                                               Dr. V. Vasudevan
            CSE, VV College of Engineering                             Prof. & Head/IT, Kalasalingam University
                  Tirunelveli, India                                              Srivilliputur, India

Abstract:                                                        the same way that the Internet did in yesterdays
                                                                 economy, paving the way for numerous research
          In Grid Environment the number of resources            efforts in grid scheduling mechanisms
and tasks to be scheduled is usually variable and                     Grid Computing is our greatest hope for
dynamic in nature. This characteristic emphasizes the
                                                                 delivering computing as utility to homes and offices.
scheduling approach as a complex optimization
problem. Scheduling is a key issue which must be solved          Many large scale applications such as scientific,
in grid computing study and a better scheduling scheme           engineering and business problems (Hai et al., 2005;
can greatly improve the efficiency.The objective of this         Cannataro et al., 2002) are solved effectively using
paper is to explore the Probabilistic Tabu Search to             the logical amalgamation of geographically dispersed
promote compute intensive grid applications to                   Grid resources (Bernan et al., 2002). Grid computing,
maximize the Job Completion Ratio and minimize                   analogous to the pervasive electrical power grid,
lateness in job completion based on the comprehensive            enables resource sharing and cooperative work
understanding of the challenges and the state of the art         among         distributed     computational      sites.
of current research. Experimental results demonstrate
                                                                 In grid environment, applications are often described
the effectiveness and robustness of the proposed
algorithm. Further the comparative evaluation with               as workflows. A workflow is composed of atomic
other scheduling algorithms such as First Come First             tasks that are processed in specific order to fulfill a
Serve (FCFS), Last Come First Serve (LCFS), Earliest             complicated goal. Generally, grid workflows require
Deadline First (EDF) and Tabu Search are plotted.                huge intensive computing and process larger data,
                                                                 compared with traditional workflows. Therefore, the
Key words: grid computing, workflow, Tabu Search,                performance of grid workflows becomes a critical
scheduling problem, Probabilistic Tabu Search                    issue of the workflow management systems. One of
                                                                 the most challenging problems is to map each task
                  INTRODUCTION                                   to a corresponding service instance to achieve the
                                                                 customers’ quality of service (QoS) requirements as
          Grid Computing a pioneer technique in                  well as to accomplish high performance of the
harnessing the geographically dislocated computer                workflow. This problem is found to be NP-complete.
power has changed the perception on the utility and              During the course of grid scheduling there are many
availability of the computer power, which has carved             challenges that require the simultaneous optimization
a new technology that openly ventures and                        of several incommensurable and competing
amalgamates an infinite number of computing                      objectives.
devices into any grid environment, augmenting to the                  • Unpredictable challenges in Grid resources
computing capability and providing resolutions to the                 • Inevitability to multiple resource types for
various tasks within the operational grid environment                      completing a job
basically by enabling, sharing, selection and                         • Necessitate for a parallel or concurrent
aggregation       of    geographically      distributed                    execution of tasks in any workflows.
autonomous resources dynamically at runtime,
depending on their availability, capability,                              Under the OSGA, the workflow scheduler
performance and cost, thereby shifting the focus to              has to balance several QoS requirements, including
collaborative environments, federating services and              makespan and cost. Consequently, many traditional
exchanging transactions in a mutual manner to share              workflow     scheduling   algorithms,   such    as
resources and thereby achieve common goals to                    Opportunistic Load Balancing, Minimum Completion
enhance productivity and speed up progress in much               Time, Min-min, Max-min and Duplex, are not

                                                                                          ISSN 1947-5500
                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                           Vol. 8, No. 4, July 2010

suitable since they only tackle the makespan                                 Literatures have also presented a scheduling
requirement.                                                                 approach for the economics-driven grids to optimize
          In recent years, a number of researches have                       the cost under the deadline constraint. In fact, a
been focused on scheduling problem involving more                            mixed-integer non-linear programming algorithm
than one QoS requirements. The traditional System                            was introduced to optimize the cost with the
namely advanced reservations for scheduling the                              consideration of other QoS requirements. As the
workflows undergoes problems such as overloading                             scale of workflow applications becomes larger and
and power failure. The overloading and the scheduler                         larger, conventional deterministic approaches may
failure problem are overridden by a two level                                fail to give a satisfying solution. Moreover in Grid
scheduling scheme where the first level is used for                          scheduling problem, for most practical applications,
frequent small jobs and second level for large jobs.                         any scheduler delivering good quality planning of
The market oriented approach algorithm succeeded in                          jobs would suffice rather than searching for
distributed scheduling of workflows, but could not                           optimality. In fact, in highly dynamic Grid
appease completion of more workflows within the                              environment, there is no possibility to even define
deadline. The success ratio of the workflows allotted                        optimality of planning as it is defined in
for mapping the Grid sites is 30% (Chien et al., 2005)                       combinatorial optimization. This is due to the fact
when 30 workflows are scheduled at a time.                                   that Grid schedulers run as long as the Grid system
          Workflows submitted to the Computational                           exists and thus the performance is measured not only
Grids by resource consumers have a proper budget                             for particular applications but also in the long run. It
proposal, client authentication and the requirements                         is well known that meta-heuristics are able to
for its execution as shown in Fig 1. The willingness                         compute in short time high quality feasible solutions.
to complete any job is given by resource providers.                          Therefore, meta-heuristic algorithms have been
Hence the Grid schedulers search for solutions in the                        receiving growing interests due to their powerful
state space aiming at achieving high performance,                            global search capability.
both in terms of solution quality and execution speed.                                 From the above exposition we are motivated
                                                                             and in this paper we apply the probabilistic Tabu
                                                                             search algorithm for the generalized Grid Scheduling
                 Grid Client’s Job Submission                                problem. The basic idea behind the algorithm is to
                                                                             use preprocessing operations to arrive at a probability
                                                                             value for each vertex which roughly corresponds to
      Client Name                    Jeny                                    its probability of being included in an optimal
                                                                             solution, and to use such probability values to shrink
                                                                             the size of the neighborhood of solutions to
      Password                       ******                                  manageable proportions. We report results from
                                                                             computational experiments that demonstrate the
                                                                             superiority of this method over the generic Tabu
      CPU Power                      30T flods                               search method.

                                                                                          PROBLEM DESCRIPTION
      Memory                         19MB
                                                                                      The Super Schedule (SSGA) Grid
                                                                             Architecture described with eight nodes Grid
      Dead Line                      12/09/07                                environment example is shown in the Fig 2. This
                                                                             architecture can be utilized for any practical
                                                                             applications for the normal grid environments. The
      Quality of Service             Best Effort Service                     setup is experimented in TIFAC Core in Network
                                                                             Engineering under DST project.
                                                                                      The goal of the SSGA is to find the
                                                                             allocation sequence of workflows on each Grid site.
                            Submit                                           Four major entities are involved in this architecture.
                                                                                  • The grid users submit their request for job
Fig. 1: Job submission blueprint                                                      completion to the local grid managers.
                                                                                  • All the tasks should be received by the grid
         Literatures have proposed a grid workflow                                    managers and the decision for the
scheduling algorithm in which cost is optimized with                                  scheduling is made on deploying the request
the expectation to minimize the makespan.                                             to the Intra Grid schedulers.

                                                                                                      ISSN 1947-5500
                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                Vol. 8, No. 4, July 2010

    •    The Intra-Grid schedulers have the updated
         information of the grid resources that are
         idle during time t. This information is
         frequently updated. The smaller jobs can be
         scheduled within their deadlines by the
         Intra-Grid schedulers in their respective
         Administrative Domains. Here scheduling is
         often dynamic.
    •    For data intensive applications where the
         jobs are larger it requires the necessity of the
         resources worldwide. At that moment, there

                                                                  Fig 3: DAG workflow model

                                                                           The duration for any workflow, penalty cost
                                                                  incurred and the required grid resources are shown in
                                                                  the Table 1.The tasks taken for experiment have their
                                                                  predecessors and successors, such as T1 follow T2 or
                                                                  T2, T3 and are parallel computations once the task
                                                                  T1 is executed.

                                                                  Table 1: Experimental work flows

                                                                           The Workflow model for W1, W2, W3 are
                                                                  shown in Fig. 3. The FCFS map tasks to the idle Grid
                                                                  sites based on first task arrival to serve first. The EDF
         is a necessity of Inter-Grid schedulers which            algorithm executes the tasks whose absolute deadline
         is static often.                                         is the earliest. Hence it estimates the execution
                                                                  deadline of the individual workflow for any
    Fig 2: Super Schedule Grid Architecture                       standalone system and schedules such that the
                                                                  workflows that require greater completion time is
         The workflow allocation strategy in a Grid               served first. In EDF the task priorities are not fixed
environment differs from the traditional ones. The                but change depending on the closeness of their
goal of the Inter-Grid Scheduler is to receive the                absolute deadline.
request from different Intra-Grid Schedulers and                           The settings of the experiment consist of
make an optimistic scheduling such that it                        workflows with following assumptions:
accommodates many workflows completing within                              Each workflow received in the Inter-Grid
its deadline. The following DAG workflows and the                          Scheduler consists of a set of Tasks T1, T2,
penalty cost for each workflow are considered for                          T3 and so on.
experimental purpose.

                                                                                              ISSN 1947-5500
                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                              Vol. 8, No. 4, July 2010

         The task in each workflow is a Directed                solution in the next iteration, to points in the solution
         Acyclic Graph (DAG) model. (Fig. 3.)                   space previously visited.
         The output from a task can be transferred to                     In order to improve the efficiency of the
         other tasks as per the DAG graph model and             exploration process, one needs to keep track not only
         all jobs are available at time zero.                   of local information (like the current value of the
         At any time a task can be executed only on a           objective function) but also of some information
         Grid site which is reported to the Inter-Grid          related to the exploration process. This systematic
         scheduler as idle via Intra-Grid scheduler.            use of memory is an essential feature of Tabu search
         There is no pre-emption of tasks or                    (TS). While most exploration methods keep in
         workflows.                                             memory essentially the value f(i*) of the best solution
         The sequential order of workflow allotment             i* visited so far, TS will also keep information on the
         changes.                                               itinerary through the last solutions visited. Such
     Here we present a scheduling approach for the              information will be used to guide the move from i to
wide area problem where in the resources and jobs               the next solution j to be chosen in N(i). The role of
are dispersed geographically.                                   the memory will be to restrict the choice to some
                                                                subset of N(i) by forbidding for instance moves to
                                                                some neighbor solutions. More precisely, we will
            PROPOSED METHOD OF PTS                              notice that the structure of the neighborhood N(i) of a
                                                                solution i will in fact be variable from iteration to
      In this study, PTS heuristic to solve scientific          iteration.
workflow scheduling problem in Grid is discussed.                         The main problem with such a tabu search
The roots of Tabu search go back to the 1970's; it was          algorithm is the size of the the neighborhood, for
first presented in its present form by Glover [Glover,          each solution. Thus generic Tabu search is able to
1986]; the basic ideas have also been sketched by               execute only a few iterations within reasonable
Hansen [Hansen 1986]. Additional efforts of                     execution times and therefore alleviating the
formalization are reported in [Glover, 1989], [de               complexity of matching a job to the appropriate
Werra & Hertz, 1989], [Glover, 1990]. Many                      resource in the shortest time possible. The
computational experiments have shown that tabu                  Probabilistic Tabu search for Grid scheduling
search has now become an established optimization               addresses this concern.
technique which can compete with almost all known
techniques and which - by its flexibility - can beat                        SOLUTON CONSTRUCTION
many classical procedures.
      The generic TS is a metaheuristic strategy based                   The structure of Probabilistic Tabu search is
on neighborhood search with overcoming local                    as shown below. The basic idea is to look at only a
optimality. It works in a deterministic way trying to           subset of the neighborhood of each solution which
model human memory processes. Memory is                         has the maximum likelihood of containing the best
implemented by the implicit recording of previously             tabu and non-tabu neighbors. The belief is that a
seen solutions, using simple but effective data                 large enough set of locally optimal solutions
structures. This approach focuses on the creation of a          collectively contain predominantly those features that
Tabu list of moves that have been performed recently            are present in globally optimal solutions and rarely
and are forbidden to be performed for a certain                 contain features that are absent in globally optimal
number of iterations, thereby helping to avoid cycling          solutions. In this approach, a pre-defined number of
and promoting search in a diversified space. At each            starting solutions are chosen from widely separated
iterations, TS moves to the best solution that is not           regions in the sample space, and used in local search
forbidden and thus independent of local optima                  procedures to obtain a set of locally optimal
      The generic TS introduce flexible memory                  solutions. These locally optimal solutions are then
structures articulating strategic restrictions and              examined to provide an idea about the probability of
aspiration levels as a mean for exploiting search               each solution being included in an optimal solution.
spaces. TS have the ability to generate solutions of            Using this idea, the neighborhood of each solution is
notably high quality such as to escape from the local           searched in a probabilistic manner.
minima and to implement an explorative strategy. TS
are an iterative procedure for searching a global                General Scheme of PTS: The structure of PTS
optimum for discrete combinatorial problem. The                 algorithm is formalized as shown below.
philosophy of TS is to avoid entrainment in cycles by
forbidding or penalizing moves, which take the                  Step 0 (Generating Probabilities): Generate a set of
                                                                s solutions S = {S1,S2, . . . , Ss} using an extension to

                                                                                         ISSN 1947-5500
                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                               Vol. 8, No. 4, July 2010

local search method to obtain a local optimum. For                       The comparative increase in the completion
each solution Si compute the associated probability pi           of workflows by PTS dual objective scheduling
Go to Step 1.                                                    mechanism considering other algorithms such as
                                                                 FCFS, EDF and TS are shown in Fig 4 and Fig 5.
Step 1 (Initialization): Define all solution elements
as non-tabu. Choose an initial solution S, set                                  140

BestSolution ← S, and set Iteration ← 1. Go to                                  120

                                                                   JCR OF (5)
Step 2 (Termination): If a pre-defined termination                               80
condition is satisfied, output BestSolution and exit.                            60
Else go to Step 3.                                                               40

Step 3 (Iteration): Consider each neighbor N of S
with a probability of (1−pi)pj where vi = S \ N and vj                                0        10        20          30        40         50          60            70

= N \ S. If vi or vj is marked ‘tabu’ then N is a tabu                                                         NO. OF WORK FLOWS

neighbor, otherwise it is a ‘non-tabu’ neighbor. If the          Fig 4: Job completion ratio
best tabu neighbor considered has a cost lower than
the cost of BestSolution, go to Step 4, else replace S                    It can be analyzed that PTS outperforms TS
by the best non-tabu neighbor considered. Mark the               in the number of workflow completions. In Table 2,
solution elements participating in this move (i.e. the           the penalty cost incurred by the Inter-Grid scheduler
vertex that has left the solution, and the vertex that           on not completing the job is plotted. As per the
has entered the solution to form the neighbor) as tabu           methodology PTS succeeds the other scheduling
for the next TENURE moves. If this best non-tabu                 mechanisms in consideration.
neighbor is better than BestSolution, replace
BestSolution with this neighbor. Set Iteration ←                                 300
Iteration + 1. Go to Step 2.

Step 4 (Aspiration): Replace BestSolution and S                                  200
                                                                                                                                                               No of work flows
with the tabu neighbor of S. Remove the tabu status                                                                                                            PTS

for all solution elements. Set Iteration ← Iteration +                                                                                                         TS
1. Go to Step 2.                                                                 100


         For every solution move in the TS
procedure, the neighborhood solution will be                                              1    2    3    4     5     6     7   8    9    10    11    12

evaluated for a Dual Objective Function of                                                                   NO. OF WORKFLOWS

minimizing the total penalty cost on choosing the
                                                                 Fig 5: DOF for PTS, TS, EDF and FCFS
workflow sequence and maximizing the number of
workflows completed within deadline (Job
                                                                 Table 2: Penalty cost incurred for the workflow sequence
Completion Ratio). In our proposed method, the
workflows are created based on DAG model and the                  No of
deadline is fixed to be at 1.5 * Execution time.                  workflows                             FCFS               EDF                TS                 PTS
                                                                                               5             41.34                  29                    25              20.88
         RESULTS AND DISCUSSION                                                               10             43.75             35.78                29.94                 27.63
          The methodology is such that an initial job                                         15             45.67             42.87                33.78                 30.84
sequence is selected at random among the set of job                                           20             56.45             45.78                40.82                 37.62
sequences and the dual objective function for the                                             25             61.45             50.83                51.98                39.652
solution is defined as a best cost. The obtained                                              30             74.55             58.34            59.674                    45.67
solution is recorded as initial step for the                                                  35              84.3             73.46                 68.3                 50.64
Probabilistic Tabu Scheduling mechanism. Later, the                                           40             97.55             79.83                      74               62.1
set of neighborhood solution of S is generated and                                            45         100.98                87.67                79.56                  75.3
again the dual objective function (DOF) is calculated                                         50             108.3             97.25                85.65                  82.5
and replaced if necessary finding the best cost among                                         55             112.7                 106              99.32                 89.41
the history record.                                                                           60             119.5             112.3                106.9                100.26

                                                                                                                         ISSN 1947-5500
                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                        Vol. 8, No. 4, July 2010

        CONCLUSION AND FUTURE WORK                                         J. Yu, R. Buyya, and C.K. Tham, “Cost-based scheduling of
                                                                           scientific workflow applications on utility grids”, Proceedings of
In this paper, we have applied probabilistic tabu                          the 1st International Conference on e-Science and Grid Computing
search algorithm for the Generalized Grid Scheduling                       (e-Science’ 05), pp. 140-147, 2005.
problem. In this approach, a pre-defined number of                         M.M. López, E. Heymann, M.A. Senar, “Analysis of dynamic
starting solutions are chosen from widely separated                        heuristics for workflow scheduling on grid systems”, in
regions in the sample space, and used in local search                      Proceedings of the Fifth International Symposium on Parallel and
procedures to obtain a set of locally optimal                              Distributed Computing (ISPDC’06), IEEE, 2006.
solutions. These locally optimal solutions are then                        A. Afzal, J. Darlington, A.S. McGough, “QoS-constrained
examined to provide an idea about the probability of                       stochastic workflow scheduling in enterprise and scientific grids”,
being included in an optimal solution. Using these                         The 7th IEEE/ACM International Conference on Grid Computing,
ideas, the neighborhood of each solution is searched                       2006, pp. 1-8.
in a probabilistic manner. Our computational
experience shows us that this probabilistic tabu
search method outperforms generic tabu search most                                                   R. Joshua Samuel Raj
of the time.
          In the near future we plan to combine                                                      Affiliation:
Probabilistic Tabu search with simulated annealing                                                   Assistant Professor / CSE
                                                                                                     VV College of engineering.
along with sharing method to increase the efficiency.
Similarly the ant colony properties can be included                        Brief Biographical History:
for scalability in the existing algorithm. The                             2005 -Graduated in 2005 from the Computer Science and
procedure can also suitably be modified and applied                        Engineering Department from PETEC under Anna University
to any kind of Grid scheduling with different problem                      2007 -Received M.E Degree in Computer Science and Engineering
environment and optimize any number of objectives                          from Jaya College of Engineering under Anna University
                                                                            2009 Working towards the Ph.D degree in the area of Grid
concurrently.                                                              scheduling under Kalasalingam University

                       REFERENCES                                          Main Works:
                                                                           Grid computing, Mobile Adhoc Networking, Multicasting and so forth
E.H.L. Aarts, P.J.M. van Laarhoven, J.K. Lenstra, and N.L.J.
Ulder, “A Computational Study of Local Search Algorithms for
Job Shop Scheduling", ORSA Journal on Computing 6, (1994)118-                                        Name:
125.                                                                                                 V. Vasudevan
I. Foster and C. Kesselman, The grid: Blueprint for a future                                         Director, Software Technologies Lab, TIFAC
computing infrastructure, San Mateo, CA: Morgan Kaufmann,                                            Core in Network Engineering,
1999.                                                                                                Srivilliputhur, India

M. Maheswaran, et al., “Dynamic mapping of a class of
                                                                           Brief Biographical History:
independent tasks onto heterogeneous computing systems”,
                                                                           1984- M.Sc in Mathematics and worked for several areas towards
Journal of Parallel and Distributed Computing, Vol. 59, 1999, pp.
                                                                           Representation Theory
                                                                           1992 Received his Ph.D. degree in Madurai Kamaraj University
                                                                           2008- the Project Director for the Software Technologies Group of
R. Buyya, D. Abramson, and J. Giddy, “A case for economy grid
                                                                           TIFAC Core in Network Engineering and Head of the Department
architecture for service oriented grid computing”, 10th
                                                                           for Information Technology in Kalasalingam University,
Heterogeneous Computing Workshop (HCW’ 2001), 2001.
                                                                           Sirivilliputhur, India
I. Foster, C. Kesselman, S. Tuecke, “The Anatomy of the Grid:
Enabling Scalable Virtual Organizations”, Intl J. Supercomputer            Main Works:
Applications, 2001.                                                        Grid computing, Agent Technology, Intrusion Detection system,
                                                                           Multicasting and so forth
 H. XiaoShan, S. XiaoHe, “QoS guided min-min heuristic for grid
task scheduling”, Journal of Comput. Sci. & Technol., Vol. 18, No.
4, 2003, pp. 442-451.

Diptesh Gosh, “A Probabilistic Tabu Search algorithm for the
Generalized Minimum Spanning Tree Problem” Published in 2003,
Indian Institute of Management (Ahmedabad)

A. A. Mandal, et al. “Scheduling strategies for mapping
application workflows onto the grid”, in Proceedings of the 14th
IEEE International Symposium on High Performance and
Distributed Computing (HPDC-14), 2005, pp. 125-134.

                                                                                                        ISSN 1947-5500

To top