Guse White Paper by 87N1D8m


									                                   gUse White Paper

gUse is an easily usable highly flexible and scalable co-operative Grid application
development and enactor infrastructure connecting developers and end users of grid
applications with computational resources of different technologies enabling the design,
integration and submission of sophisticated (layered and parameter sweep enabled )
workflows spreading parallel among classic Grids ( Gt2, Gt4, LCG, Glite, ….), unique
Web services, Clusters, and eventual local resources of of the gUse.

gUse is a successor of successful parameter sweep enabled P-GRADE portal inheriting
its merits:

      user friendliness, by the “Learn once – use forever and everywhere” principle;
      expandability, by the selected Portlet technology enabling the plugging in of user
       defined custom services
      simplicity, by hiding the different technologies of the backend Grid middleware
       an resources
       graphic user interface, facilitating natural overview of objects used in the design
       and in the workflow submission process
      flexibility, to add new middleware (Grid) to its backend
      comfort, by the implemented standard services ( Certification Management, Grid
       Information Query, Grid Resource Connection Management, and Grid File and
       Catalogue management)
      connectivity, by accepting Workflows whose parts running parallel in different
       Grids built even by different technologies

However gUse is much more than a simple Portal

   1. gUse has not the former “unintelligent” interface between a simple Condor DAG
      interpreter and the job submission systems of the backend systems, but it has an
      own workflow enactor capable of –among other- brokering (schedule and
      distribute jobs among the available and proper resources)
   2. This workflow enactor enables the abstract, hierarchical definition and the
      embedded call of workflows, where even the recursive embedding of workflows
      is possible.
   3. The workflow enactor makes it possible, that the workflows may be started not
      only directly by the user but even by an external ( by WDSL protocol described)
      event issued by an automatic alien system or by a Crontab like schedule table set
      by the user.
   4. The versatility and scalability of the managed backend systems (middleware) is
      facilitated by an array of built in submitters. All submitters have a common
      standard WS interface toward the workflow enactor. Additionally to the built in
      standard backend systems (GT2,GT4,LCG2,Glite,Condor, GLEMCA, Axis,….)
      the administrator can add own submitter(s) to the infrastructure in real time.
      These submitters may be either user defined –for example to exploit a local
      resource or may be standard ones balancing loads of existing submitters.
5. The own workflow enactor facilitates a much more flexible parameter sweep
    solution than the P-GRADE portal. Instead of the workflow, a single job is base
    of multiplication of the submissions. The consequence is, that within a workflow
    the number of submissions of separate jobs (or of a group of connected jobs) can
    be independently determined from submissions of other jobs (or groups) . The
    result is that the computation load can be reduced to the optimally needed
    minimum. A series of instruments ( Input File containers, threshold numbers
    associated to the input ports of jobs, cross product/ dot product relation switches
    connecting input ports, generator / collector jobs ( enabling writing / reading of
    several files during on job submission ), user programmed logic control over the
    input ports of jobs (conditionally excluding them from workflow elaboration) )
    are supporting the basically data driven execution of workflows.
6. The palette of job level activities has been expanded beyond the already
    mentioned call of embedded workflows by the call of WEB services, and by the
    call of GLEMCA interfaced legacy systems. Even the group of classical binary
    job executables has been extended by the possibility of call Java JVM-s and the
    and MPI interpreters of the target systems.
7. The error resisting, stable, scalable performance of the gUse has been supported
    by its distributed modular structure. The same way as it has been discussed in the
    case of the submitters, the whole system is a run time pluggable –in the case of
    bottle necks multipliable – set of independent services speak with each other by a
    standard, XML protocol. So it is up to the Administrator to put the whole system
    in a common Host or distribute its among several machines.
8. In addition to the common input files ( uploaded from the local machine of the
    Portal user or reached from remote Grid environment) a job can be feed by a
    direct value or by the result set of an online SQL query of a remote database.
9. The workflow developer is supported by a log book storing each change of the
    workflow. The logging is subdivided in Job histories.
10. May be that the most important novelty of the gUse is the emphasis on the
    support the different needs of developers and of end users.
11. The experienced developers receive additional support to make their code
    reusable by introducing new concepts to express the hierarchic grouping of code
    Graphs are introduced to define the common topology of a group of workflows by
    identifying jobs and job input output connections.
    Concrete workflows define the semantic of workflows over a given Graph
    Templates inherit the part of the semantics that must be preserved in a different
    Additionally the templates seems to be a natural environment where the
    documentation and the other kind of the verbal end user support (in form of user
    digestible labels) can be placed.
    An Application is the end product of the developer. It is a packed directly
    applicable whole solution which gives at most a limited freedom to the end user to
    modify ( Input file associations, command line parameters of jobs, resource
    selection). An application which may be containing several workflows –as the
    embedded workflows to be called must be included as well - will be exported and
    stored in a common -by all user accessible - repository.
12. The Repository stores not only full Applications. Even the community of
    developers may help each other by the publication of tested Graphs, Concrete
    Workflows and Templates or - horribile dictu –by sending of not finished
    Applications (consisting of several Graphs, Concrete Workflows, Templates)
    named as Projects.
13. The end user imports the Application and configures it with the help of an
    automatically generated simplified Web page form, gaining its building
    parameters from the Template(s) belonging to the given application.
14. A new –and hopefully- running Workflow Instance will be generated from an
    imported -and by the end user completed –Application upon Workflow
    submission. The same happens if a developer submits a full Workflow. The
    workflow instance inherits all the definitions of the workflow and additionally
    contains the runtime state and output (result) information.
    It follows that the user can change the –permitted- configuration of a workflow in
    run time, an it may influence all jobs in all instances ( certainly excluded the just
    running or terminated ones)
    An other consequence is that the call of embedded workflows is implemented by
    creating a separate Workflow Instance belonging to the called Workflow.
15. The concept of the workflow instance and the concept of job level parameter
    sweeping is reflected by the new hierarchically structured visualization and
    manipulation of activated elements. In the hierarchically descending order
    Workflow, Workflow Instance, Job set (more than one element set in the case
    when the multiple number of input files forces multiple submissions of the same
    job ), Jobs can be visualized by reading of its states, messages and outputs.
    Suspending and killing/resuming of workflows can be done on workflow
    instance level. As a conclusion it can be told that as a contrast to the old P-Grade
    portal each ever produced little item of workflow can be fetched and visualized
    on a common interface.

                          The static organization of the gUse

The gUse has a 3 tire structure (see Figure 1):

   1. The user interface containing two sub parts: the Graph Editor to define the
      static skeleton of workflows, and a HTML one, interpretable by a web
      browser to access the portlets of the system enabling the configuration,
      submission, observation and harvesting of workflows and facilitating auxiliary
      services as information system, certificate management, remote file access
      management, resource management, workflow repository management.
      The Graph Editor is a Webstart application callable from the workflow
      management portlet. It means that it downloads itself to, and runs on the
   user’s desktop machine on demand.

2. The middle layer composes the gUse infrastructure hosted on one ore more
   server machines where the implementation of portlet activities are mapped in
   a sequence of web services calls. These services are responsible for the
   handling of several databases (users, user proxy certificates, user source code
   and the history of their changes, time states and results of workflow
   instances) , for resource management and for the management of submitted
   The set of submitters are composing most important back end of the middle
   layer connecting the gUse infrastructure with the job submission systems of
   the different Grids.

3. The Grid and middleware on the Grid composes the third “remote” layer of
   the structure. It includes the Job submission, certificate and VOMS
   management, Remote Data, and Information system management systems
   eventually differing from each other in their technology. This layer is
   maintained and operated by foreign vendors / organizations.

                      Figure 1 Configuration of gUse
             (From point of view of the Workflow management)
The details of the Workflow manager of Tire 2

        The Portal embedded in Gridsphere contains interpretations of the UIF
         operation on the server side (Tire2 +Tire3) of the system.
        WF storage, implemented by a MySQL database contains the main part of
         the definition of workflows. It also contains the logbook noticing the time
         stamped changes the user has made during the workflow definition process.
         Moreover it contains the runtime identifiers and states of instances of
         submitted workflows.
         The storage is subdivided by users and by object types where the main base
         object types are GRAPH, CONCRETE WORKFLOW, TEMPLATE and
        File Storage, implemented by a traditional Unix file system contains the -
         by the user uploaded – input data files, files of executable code which will
         perform the activities of the Workflow Jobs when the jobs will be delivered
         to remote Grid resources to be executed.
         Also the file system contains the eventual output “local” files of terminated
         Workflow Jobs.
         A file is local output file of a job, if the content of the file will be
         consumed by a input port of a subsequent job with the help of the WF
         enactor or the file is regarded as an result file of the workflow and the user
         wants to receive and download it as a part of the packed result of a
         Workflow Instance.
         It must be noted that the names of input data files and files of executables
         are referenced WF sorage.
        WF enactor is central workflow interpreter. It maps a submitted workflow
         instance to component jobs to submit.
         During this process it investigates the availability of each needed input files
         of the job, and checks optional user defined conditions prohibiting the job
         The parameter sweep ability of the gUse is also ensured by the WF enactor
         by determining the next group of associated input files to be feed for the
         subsequent job submission.
        Jobs are submitted by an –optionally two level mapping - backed by
         submitters. The dedicated submitters fit the job description to the
         technically different job submissions forms, and communicate with the
         remote system via the corresponding protocol in order to send the job, to
         report back its state, and to fetch its result upon termination.
         The gUse internal BROKER is able to make an optimization on user
         demand if the job is able to run on more than one type of Grid resource.
         A special submitter serves jobs containing no executable code, but the
         invocation of an existing Web Service. At persent only the SOAP protocol
         is built in, but the structure of gUse enables to plug in other WS call
         communication protocol.
         A special type of remote services is when GLEMCA infrastructure is needed
    to start a legacy application.
    A different special case if the user wants to use the infrastructure of gUse as
    destination to submit a job. It is an ideal choice - first of all for test purposes
    - to control the semantics of a workflow excluding the Grid imposed time
    delays and errors due to bad networks, firewall problems, or non existent, or
    expired certificates.
   The repository is a storage of tested and reusable workflow solutions
    accessable for the whole user community.
   IS is a central manager of the distributed gUse systems. It orchestrates the
    cooperation of the above mentioned functions implemented by independent
    services. Moreover it makes reports (not shown in Figure 1) for the
    administrator of the gUse and ensures the seamless runtime insertion of a
    host and the subsequent redistribution of services in case of bottleneck

To top