Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Exploration of Embedded System Architectures

VIEWS: 5 PAGES: 97

									         The Globus Project


               Adam Belloum
Computer Architecture & Parallel Systems group
          University of Amsterdam
           adam@science.uva.nl
         Globus requirements

•   Security
•   Global name space
•   Fault tolerance
•   Accommodating heterogeneity
•   Binary management and application provisioning
•   Multilanguage support
•   Multilanguage support
•   Persistence
•   Extensibility
•   Site autonomy
•   Complexity management
     Globus design principles

• Provide a toolkit
   – from which users can pick and choose
• Focus on low-level functionality,
   – facilitating high-level tools (general usability)
• Use standards whenever possible
   – for both interfaces and implementations
• Emphasize the identification and definition of
   – protocols and services first,
   – and APIs and software development kits next
• Provide open-source community development
• Provide immediate usefulness
• Do not provide a virtual machine abstraction
  Globus architectural details

• Globus started out with the bottom-up
  premise that:
  – a grid must be constructed as a set of tools
    developed from user requirements.
• This architecture is based on composing tools
  from a kit.
  – much of the initial design time was spent
    determining the user requirements for which grid
    tools could be built.
  Globus architectural details

• Resource manager to start jobs
  – assuming users had procured accounts beforehand on
    all of the machines on which they could possibly run
• Tool and API for transferring files from one
  machine to another
  – used for binary and data transfer
• Tools for procuring credentials and certificates
• Service for collecting resource information about
  machines on a grid.
  Globus architectural details

• The designers of Globus believe that
  – New services/tools must be added to the existing
    set such that users can combine available tools to
    get the work done.


• Much of the later development in Globus has
  been directed
  – at composing these tools in order to achieve a
    specific goal.
  Globus architectural details

• The communications module provides network-
  aware communications messaging capabilities.
  – The implementation of the communications module
    in the Globus Toolkit was called Nexus.

• The resource location and allocation module
  provides mechanisms for
  – expressing application resource requirements
  – identifying resources that meet the requirements
  – scheduling resources after they have been located.
   Globus architectural details

• The authentication module provides means to verify the
  identity of both humans and resources.
   – GSSAPI hides the underlying authentication technique: Kerberos
     (centralized authentication system) or SSL (PKI)

• The information service module provides a uniform
  mechanism for obtaining information about meta-system.
   – Known as the Metacomputing Directory Service (MDS), which
     builds upon an API of LDAP.

• The data access module is responsible for providing
  remote access to persistent storage, such as files.
  Globus architectural details

• The grid fabric layer provides the resources to
  which grid protocols mediate access.

• The connectivity layer defines the core
  communication and authentication protocols
  required for grid-specific network transactions.
  – this layer includes the Grid Security Infrastructure
    (GSI).
   Globus architectural details

• The resource layer, which defines protocols for
   – secure negotiation, initiation, monitoring, control, accounting,
     and payment of sharing operations on individual resources
   – a Grid Resource Information Protocol (GRIP), a resource
     information protocol;
   – the Grid Resource Registration Protocol (GRRP), used to
     register resources with the Grid Index Information Servers;
   – the Grid Resource Access and Management (GRAM) protocol,
     used to allocate and monitor resources; and GridFTP, which is
     used for data access.
• The collective layer is used to coordinate access to
  multiple resources, which, in terms of the GT, refers to
  MetaComputing Directory Service (MDS), supported by
  GRRP and GRIP.
• Finally, grid applications are at the very top
   Globus architectural details

• With respect to grids, a bottom-up approach tends to
   – result in early successes simply because the approach targets
     immediate user requirements.
   – the implementation of a new service or tool can be quick, since
     much of the complexity of the underlying substrate is abstracted
     away.


• The risk is that this approach may not
   – Scale as the number of tools or services increases, since an
     increasing number of pairwise protocols are necessary to
     ensure that the tools compose seamlessly.
   – accommodate changing requirements
               Layered Architecture
                                  Applications

 GlobusView         High-level Services and Tools                    Testbed Status


 DUROC              MPI         MPI-IO          CC++      Nimrod/G          globusrun




                                  Core Services
  Nexus                                                                     GRAM
                Metacomputing             Globus
                  Directory                            Heartbeat
                                          Security
  Gloperf          Service                              Monitor             GASS
                                         Interface




Condor        MPI                     Local                          TCP           UDP

                                     Services
 LSF        Easy          NQE                           AIX          Irix          Solaris
            Security in Globus




Overview of Security Standards in the Grid, CSE 225 High Performance and
Computational Grids Spring 2000 kwalsh@ucsd.edu
       Globus Security Requirements

•   Single sign-on
•   Protection of credentials
•   Interoperability with local security solutions
•   Exportability
•   Uniform credentials/certification infrastructure
•   Support for secure group communication
•   Support for multiple implementations
     Globus Security Assumptions

• Grid consists of multiple trust domains
• Resource pool & users are large and dynamic
• interoperate with local security solutions
   – local security policies differ
• authentication exportable
   – cannot directly or indirectly require use of bulk privacy
• uniform credentials/certification
   – a user will be associated differently with site it has access
   – processes used in a computation are dynamic access
     control
   Globus Security Infrastructure (GSI)


• Provides authentication and data integrity
  – Data signing (not encryption) services for
    Unix/Windows client/server programs
• Utilize an X.509 PKI
• GSI library is layered on top of the SSLeay
• Performs X.509 certificate handling & SSL
  protocol.
Security in Globus (8)
                                           CREDENTIAL
                            User



         Assignment of    User Proxy
                                             Single sign-on
         credentials to
         “user proxies”     Globus
                                             via “grid-id”
                           Credential
                                                              Mutual
                                                           user-resource
                                                           authentication
Site 1                                     Site 2


                           Authenticated                               Mapping
 GRAM           Process                       Process    GRAM         to local ids
                            interprocess
                          communication
  GSI           Process                       Process      GSI

 Ticket                                                 CertificateGSSAPI:
                Process                       Process               multiple
Kerberos                                                Public Key low-level
                                                                  mechanisms
      Security in Globus (7)


• .
        Security in Globus Standards

• Standards subscribed to:
   –   Generic Security Services (GSS) RFC 2078
   –   Secure Socket Layer (SSL)
   –   Public Key Cryptography based on X.509 certificates
   –   Kerberos
             Security in Globus (2)



– Kerberos
                             SSL
Technology                   X.509
Standards        SSH   PGP    PKI    Kerberos   DCE   IPSec   VPN




Security
Requirements

Authentication         x     x       x          x     x       x


Authorization    x     x     x       x          x     x       x


Assurance              x     x       x          x     x       x


Accounting                   x                  x


Audit                        x                  x


Integrity              x     x       x          x     x       x


Confidentiality x      x     x       x          x     x       x
Resource Management
                 Introduction

• The GT resource management components
  includes a set of service components:
  – Globus Resource Allocation Manager
     • GRAM.
  – Dynamically Updated Request Online Component
     • DUROC
  – Globus Architecture for Reservation and Allocation
     • GARA
   What is the role of the GRAM


• GRAM is designed to provide
  – a single common protocol and API for requesting and
    using remote system resources,
  – by providing a uniform, flexible interface to, local job
    scheduling systems.

• GRAM provides a simple authorization mechanism
  based on GSI identities and a mechanism to map
  GSI identities to local user accounts.
   The main feature of the GRAM

• GRAM reduces the number of mechanisms
  required for using remote resources.

• This capability is consistent with the
  "hourglass" role played by most of the
  GT’s components:

    – GRAM is the neck of the hourglass, with
      applications and higher-level services
      (resource brokers or metaschedulers) above
      it and local control and access mechanisms
      below it.

    – Both sides need work only with GRAM, so
      the number of interactions, APIs, and
      protocols that need to be used are greatly
      reduced.


                                                   http://www.globus.org/mds/
                      GRAM Components (GT2)
                             MDS client API calls
                             to locate resources
            Client                                   MDS: Grid Index Info Server
                             MDS client API calls                                    Site boundary
                             to get resource info


 GRAM client API calls to
request resource allocation
                                                MDS:   Grid Resource Info Server
   and process creation.                                            Query current status
                          GRAM client API state                     of resource
        Globus Security     change callbacks
        Infrastructure                              Local Resource Manager
                                                                                 Allocate &
                                                       Request
                                                                              create processes
                         Create      Job Manager

        Gatekeeper                  Parse
                                                                       Process
                                                        Monitor &
                                                         control       Process
                                      RSL Library
                                                                       Process
 Resource Specification Language
• Much of the power of GRAM is in the RSL
• Common language for specifying job requests
   – GRAM service translates this common language into
     scheduler specific language
• GRAM service constrains RSL to a conjunction of
  (attribute=value) pairs
   – E.g. &(executable=“/bin/ls”)(arguments=“-l”)
• GRAM service understands a well defined set of
  attributes
         A Co-allocation Multirequest
        +( & (resourceManagerContact=
             *** “flash.isi.edu:2119/jobmanager-
        lsf:/O=Grid/…/CN=host/flash.isi.edu”)
              (count=1)
                                           Different resource
              (label="subjob A")           managers
              (executable= my_app1)
           )
Different ( & (resourceManagerContact=
counts
         ***“sp139.sdsc.edu:2119:/O=Grid/…/CN=host/sp097.sdsc.edu")
              (count=2)
              (label="subjob B")          Different executables
              (executable=my_app2)
           )
DUROC: Dynamically Updated Request
                Online Component
• Simultaneous allocation of a resource set
   – Handled via optimistic co-allocation based on free
     nodes or queue prediction
   – And advance reservations

• In the GT2:
     globusrun co-allocates specific multi-requests
                  Job Manager Files
                                                          GRIS
       Client

                                           Job
                             monitoring   status


     Gatekeeper     Jobmanager                                   JOB
                                             Submission
X509_USER_PROXY
                   GASS_CACHE
      UP                                      Scheduler
                                                Desc.
                     UP       Staged
                                               Exe=x
                               EXE
                                               Args=y
                    stdout                     Env=z
                               Staged
                                stdin
                    stderr
            GRAM in the GT3

• GT3 GRAM integrating into the various
  metaschedulers and resource brokers.

• GRAM does not provide accounting and billing
  features.
  – It is assumed that these features-if needed-are being
    supplied by local management mechanisms such as a
    queuing system or scheduler.
             GRAM in the GT3

• GRAM allows to run jobs remotely
   – using a set of WSDL/OGSI client interfaces for
     submitting, monitoring, and terminating a job.


• Job requests
   – are written by the user in the Resource Specification
     Language
   – and processed by the ManagedJobService as part of the
     job request.
      Master Hosting Environment (MHE)
2
                       Master Manager Factory
                   1      Service (Master)

                       13   Service Data
                             Aggregator
                                                                   12
    Virtual Host
    Environment
3    Redirector                                      11                                                    Resource
                                   5 Launch UHE                                                           Information
                   4                                           6                                            Provider
                       Start UHE       (setuid)
                                                           7               Managed Job Factory           Service (RPS)
                       (gridmap)
                                                     8                       Service (MJFS)
                                                                                                            12
                                                                                                                 12
                                                                                            9       12

                                                                                      Managed
                                                                                     Job Service          Scheduling
                                                                                       (MJS)        10      System
                                                                                14
                                                                                            15

                                                                           File Stream Factory
                                                                            File Stream Factory
                                                                             Service (stdout)
                                                                               Service (stdout)              Host
                                                      16
                                                                                                            System
                                                          Grid Resource
                                                         Identify Mapper              File Stream
                                                             (GRIM)                     Service
                                                                                         (FSS)

                                                         User Hosting Environment (UHE)

                                      http:www-unix.globus.org/developper/resource-management.html
Globus Information Service
      GT3 Information Services

• What is GT3 Information Services?
  – Grid service which provides information about Grid
    resources

  – modular Java component framework for OGSA
     • service developers can use to implement various information
       management solutions for GT3-compatible OGSA Services
       and Service Data
GT2 vs. GT3 Information Services

• Components
   – MDS  Index Service, Service Data Providers and Aggregators,
     Query and Notification Framework
   – GRAM Reporter  Resource Information Provider Service
• Data Format
   – LDIF  XML
• Data Source
   – GLUE providers  GLUE providers, Service Instance
• Query Mechanisms
   – LDAP  XPath, XQuery
                       Service Data

• A Grid service instance maintains a set of service data
  elements (SDE)
   – Declared via an extended XSD element declaration, placed
     in a WSDL portType
   – Includes basic introspection information, interface-specific
     data, and application state
• Pull and push models for information query
   – GridService::FindServiceData operation
      • Pull: queries this information via extensible query language
   – NotificationSource::Subscribe
      • Push: Subscribe to notification of changes to information
             Why Service Data?

• Discovery often requires instance-specific,
  perhaps dynamic information
• Service data offers a general solution
   – Every service must support some common service data,
     and may support any additional service data desired
   – Not just meta-data, but also instance state
• Part of GT MDS-2 model contained in OGSI
   – Defines standard data model, and query
   – Complements soft-state registration and notification
       OGSI ServiceData Model

• ServiceData for self-description
   – Model service with properties!
      • Fine-grained view of resource functionality
      • Data scoped by service instance


   – Domain-dependent state
      • Service discovery/monitoring information
      • Stateful properties of service, e.g. cpu-load
Components
           Basic Components


• Service Data Provider Components

• Service Data Aggregation Components

• Registry Components
          Service Data Provider
              Components
• Service Data Provider components
   – provide a standard mechanism for dynamic generation
     of service data via external programs


• External provider programs
   – can be the core providers that are part of GT3 or
   – can be user-created, custom providers
 In Detail: Service Data Providers

• Service Data Provider interfaces are designed to
  support execution in either
   – synchronous (“pull”) mode
   – asynchronous (“push”) mode

• A valid provider
   – is composed of any Java class which implements at
     least one of three predefined Java Interfaces
   – generates a compatible form of XML output as the
     result of its execution
 In Detail: Service Data Providers

• Provider Interfaces
   – SimpleDataProvider
       • synchronous provider which produces XML output in the form of a
         Java OutputStream
   – DOMDataProvider
       • synchronous extension of SimpleDataProvider which can also
         produce XML output in the form of a Java org.w3c.dom.Document
   – AsyncDataProvider
       • asynchronous version of SimpleDataProvider sending the output to
         the specified callback Object, which is assumed that the provider
         implementer and the provider caller have both agreed on the callback
         interface at compile-time
   In Detail: Provider Interfaces

• SimpleDataProvider
  – Basic interface which all service data providers must
    implement
   In Detail: Provider Interfaces

• DOMDataProvider
  – Generic interface for XML service data providers that
    are capable of emitting a org.w3c.dom.Document
    object at runtime
   In Detail: Provider Interfaces

• AsyncDataProvider
  – Asynchronous version of provider interface
       In Detail: GT3 Providers
• AsyncDocumentProvider
  – An asynchronous version of a generic XML document
    provider
• ScriptExecutionProvider
  – ServiceDataProvider that provides a generic way to
    execute scripts which produce XML documents
• HostScriptProvider
  – Constructs Host service data from the output of
    multiple scripts
      In Detail: GT3 Providers

• ForkInfoProvider
  – ServiceDataProvider which monitors local system PIDs
• PBSInfoProvider
  – ServiceDataProvider which queries PBS for queue
    information
• SimpleSystemInformationProvider
  – Basic MDS GRIS-sytle sensor which emits system
    information in XML, with state managed directly as an
    XML Document using JDOM - JDK 1.3 compatible
     In Detail: Provider Manager

• Provider execution
   – is handled by the ServiceDataProviderManager class, which
     schedules and manages provider execution as Java TimerTasks


• ServiceDataProviderManager
   – uses an XML-based configuration file to load and link installed
     Service Data Providers during runtime through standard Java
     reflection methods

• Configuration file
   – $GLOBUS_LOCATION/etc/indexservice.providers
   – $GLOBUS_LOCATION/etc/rips.providers
    In Detail: Provider Manager

• Configuration entry for the provider in a
  configuration file
   – enables your provider for execution by the Provider
     Manager
   – publishes the existence of your provider to clients


• Required attribute in the configuration entry
   – the “class” attribute, which is simply the fully qualified
     Java class name
 In Detail: Custom Data Handlers

• The default data processing behavior of the Provider Manager
   – take the logical XML document result of a provider’s execution
   – wrap it in a new SDE
   – and then add it to the Service’s ServiceDataSet


• We can override the default data processing logic in the
  Provider Manager
   – by specifying the “handler” attribute in the Provider’s configuration file
             In Detail: Mechanisms


Service or User
                                   Information Providers

                                      SimpleDataProvider
  enumProvider
                    Provider
                                       DomDataProvider
                    Manager
  executeProvider
                                      AsyncDataProvider


             Custom Data Handler
           Basic Components


• Service Data Provider Components

• Service Data Aggregation Components

• Registry Components
        Service Data Aggregation
              Components
• ServiceDataAggregator components
   – provide a reusable mechanism for handling subscription,
     notification, and updating of
       • locally stored copies of service data which is generated by other
         services


• By using the ServiceDataAggregator class in your service
  code
   – service Data from both locally executing information providers and
     other OGSA service instances can be aggregated into any given
     service
In Detail: Service Data Aggregator

• ServiceDataAggregator component
   – is used to perform server-side notification subscription
     management


• Key additional feature of ServiceDataAggregator
   – notification data that is processed by the
     deliverNotification() function
      • is actually copied and stored locally as a SDE, which includes
        creation timestamp, TTL and source metadata
In Detail: Service Data Aggregator

• Aggregated SDEs are
   – organized by SDE QName
   – stored in a array which is returned as a
     org.gridforum.ogsa.ServiceDataSetType to FindServiceData name
     queries


• Originator field (GSH type) of the OGSA ServiceDataType
   – is used as the “primary key”
       • to differentiate like-named entries from each other
       • to identify the source of the data itself
  In Detail: AggregatorPortType

• addSubscription and removeSubscription
             In Detail: Mechanisms


Service or User
                                                        Grid Services



 addSubscription
                                  deliverNotification




                                                              …
                     Aggregator
removeSubscription
           Basic Components


• Service Data Provider Components

• Service Data Aggregation Components

• Registry Components
           Registry Components

• Registry components
   – maintain a set of available peer Grid Service Handles
   – provides soft-state cataloging of a set of Grid Services
      • i.e., the registry of services is periodically updated with
        existence notification messages and any existing entries which
        fail to refresh within the timeout period are eventually expired


• Registries
   – can be used to support query or other operations that
     may apply to one or more services in a set
  In Detail: RegistrationPortType

• registerService and unregisterService
In Detail: Mechanisms


Grid Services



                registerService

                                    Registry
      …




                unregisterService
logical structure of the Index Service

                                                  User

                                                 Service         Service
                                 GSH
                                                  Data            Data


                                                                                     Index Service
                                                Aggregator
Collective            Registry                                        Provider                     Java
 Layer
                                                Mechanism
                     Mechanism                                       Mechanism                   Provider
                                              (caching here)



             Existence Notification Message                Notification Message




                                    Service            Service             Service
                                     Data               Data                Data
Resource
  Layer                        Grid               Grid              Grid
                              Service            Service           Service
Globus Data Management services
Reliable File Transfer Service
                          Overview

• The Reliable Transfer Service (RFT) is an OGSA based
  service that provides interfaces for:
   – Controlling and monitoring 3rd party file transfers using GridFTP
     servers.
   – The client controlling the transfer is hosted inside of a grid service
     so it can be managed using the soft state model and queried using
     the ServiceData interfaces available to all grid services.


• It is essentially a reliable and recoverable version of the
  GT2 globus-url-copy tool and more.
     Prerequisites and Dependencies


• The Prerequisites to RFT are:
    – GridFTP Server with a Host Certificate
    – PostgreSQL

• PostgreSQL is used to store the state of the transfer to
  allow for restart after failures.
    – The interface to PostgreSQL is JDBC so any DBMS that supports
      JDBC can be used.

Note:GT3 used PostgreSQL version 7.3.2 for testing and the instructions
  provided to set up the database are good for the same.
   Prerequisites and Dependencies

• GridFTP perfoms the actual file transfer.
• GridFTP server can only be run on Unix or Linux.



• There are 2 ways to get GridFTP:
   – Packaged with the core GT3 Final installation
   – As part of the Globus Toolkit 2.4 distribution
     Prerequisites and Dependencies

1.   PostgreSQL Setup
2.   Configure and Run a GridFTP Server
3.   RFT Grid Service Setup
4.   Build the GAR from Source Distribution




                              www-unix.globus.org/toolkit/reliable_transfer.html
      Service Data Elements for RFT


•   Version :version of RFT.

•   FileTransferProgress: SDE that denotes the percentage of file that is transferred

•   FileTransferRestartMarker: SDE for the last restart marker for a particular
    transfer

•   FileTransferJobStatusElement: SDE for status of a particular transfer

•   FileTransferStatusElement: SDE that denotes the status of all the transfers in the
    request

•   GridFTPRestartMarkerElement: SDE of Restart marker of the transfer

•   GridFTPPerfMarkerElement: SDE of Performance Marker of the transfer
The Replica Location Service
     The replica Location Service

• The replica location service (RLS) maintains and provides
  access to mapping information from logical names for data
  items to target names.

• The distributed RLS is intended to replace the centralized
  Globus replica catalog available in earlier releases of
  GT2.x.

• The distributed RLS provides higher performance,
  reliability and scalability.
        Replica Location service

• Replication of data items can reduce access
  latency, improve data locality, and increase
  robustness, scalability and performance for
  distributed applications.

• An RLS typically does not operate in isolation, but
  functions as one component of a data grid
  architecture.
         Replica Location Service

• Consistent local state maintained in Local Replica Catalogs
  (LRCs).
   – Local catalogs maintain mappings between arbitrary logical file
     names (LFNs) and the physical file names (PFNs) associated with
     those LFNs on its storage system(s).

• Collective state with relaxed consistency maintained in
  Replica Location Indices (RLIs).
   – Each RLI contains a set of mappings from LFNs to LRCs. A
     variety of index structures can be defined with different
     performance characteristics, simply by varying the number of RLIs
     and amount of redundancy and partitioning among the RLIs.
          Replica Location Service

• Soft state maintenance of RLI state.
    – LRCs send information about their state to RLIs using soft state
      protocols. State information in RLIs times out and must be periodically
      refreshed by soft state updates.

• Compression of state updates.
    – Optional compression uses Bloom Filters to summarize the content of a LRC
      before sending a soft state update to a RLI Node.

• Membership and partitioning information maintenance.
    – The current RLS implementation maintains static information about the
      LRCs and RLIs participating in the distributed system.

    – As new implementations of the RLS are developed, they will use OGSA
      mechanisms for registration of services and for service lifetime management.
    Relationship to Earlier Globus Replica
            Management Software

•   The RLS is intended to replace replica management tools available in GT2.X,
    including:
     – the Replica Catalog API
     – the Replica Management API.

•   The RLS differs from these earlier components in several important ways.

     – As a distributed system, the RLS is designed to provide reliability by avoiding
       single points of failure , load balancing, performance and scalability.

     – The RLS implementation is based on open source relational database technology.

     – The RLS separates replication information from other types of metadata.

          • The RLS does not include information about logical collections, but assumes such
            information is stored in a separate metadata service.
The GridFTP Protocol and
        Software
            What is GridFTP ?

• GridFTP is a high-performance, secure, reliable
  data transfer protocol optimized for high-
  bandwidth wide-area networks.

• The GridFTP protocol is based on FTP, the
  highly-popular Internet file transfer protocol.
               Protocol Features

•   GSI security on control and data channels
•   Multiple data channels for parallel transfers
•   Partial file transfers
•   Third-party (direct server-to-server) transfers
•   Authenticated data channels
•   Reusable data channels
•   Command pipelining
                      Protocol Features

•   Grid Security Infrastructure (GSI) and Kerberos support:
     – Robust and flexible authentication, integrity, and confidentiality features are
       critical when transferring or accessing files.

•   Third-party control of data transfer:
     – In order to manage large data sets for large distributed communities, it is necessary
       to provide third-party control of transfers between storage servers.

•   Parallel data transfer:
     – On wide-area links, using multiple TCP streams can improve aggregate bandwidth
       over using a single TCP stream.

•   Striped data transfer:
     –   Partitioning data across multiple servers can further improve aggregate bandwidth.
         GridFTP supports striped data transfers through extensions defined in the Grid
         Forum draft.
                      Protocol Features

•   Partial file transfer:
     – GridFTP introduces new FTP commands to support transfers of regions of a file.

•   Support for reliable data transfer:
     – Reliable transfer is important for many applications that manage data. Fault
       recovery methods for handling transient network failures, server outages, etc., are
       needed

•   Manual control of TCP buffer size:
     – This is a critical parameter for achieving maximum bandwidth with TCP/IP. The
       protocol also has support for automatic buffer size tuning

•   Integrated Instrumentation:
     – The protocol calls for restart and performance markers to be sent back. It is not
       specified how often, and this is something we intend to address shortly.
    What Does “GridFTP” Mean?

•   GridFTP Protocol:
    – This refers to the wire protocol used and is defined by a draft technical
      specification submitted to the Global Grid Forum.


• The Globus Toolkit V2.0 GridFTP Server (GT2GridFTP):
    – This system is the widely used open source wuftpd FTP server code base
      extended to support the GridFTP protocol extensions.
    – GT2GridFTP is distributed with the Globus Toolkit.


• The GridFTP family of tools: the term “GridFTP” is used to refer to
  the entire family of GridFTP tools distributed with the Globus Toolkit:
  The GridFTP server, client tools, client library, control library, etc.
                       Implementation

•   The Globus implementation of the GridFTP protocol takes the form of two
    APIs and corresponding libraries:
     –   globus_ftp_control
     –   globus_ftp_client.

•   Besides supporting the protocol features described above, The APIs also
    include interfaces for adding software "plug-ins".

•   In addition to Globus software libraries, we have also implemented
     – an API/library (globus_gass_copy)
     – a command-line tool (globus-url-copy) that integrates GridFTP, HTTP, and local
       file I/O to enable secure transfers using any combination of these protocols.

•   Globus has adapted a popular FTP server package (Washington University's
    wu-ftpd) to support a majority of the GridFTP protocol features (GSI security,
    parallel transfer, third-party transfer, partial file transfer).
     Availability of the GridFTP

• Our data grid software is currently available to the
  public as components of the Globus Toolkit 2.0
  release.

• Prior to GT2.X release, the software was tested
  and evaluated for more than a year by several
  external project teams who are using our
  technologies to build data grids for their own use.
GASS: Global Access to Secondary
            Storage
    Requirement for Grid I/O service

•   Uniform data access
•   Diverse data source
•   Dynamic resource set
•   Support for streaming I/O
•   Little or no program modification
•   Support for programmer-direct performance
    optimization



             Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
           GASS Architecture

•   Common Grid File Access Patterns
•   Default Data Movement Strategies
•   Specialized Data Movement Strategies
•   GASS Operation
•   Integration with the Globus Toolkit




            Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
      Common Grid File Access
            Patterns
•   Read-only access
•   Write-shared access
•   Append-only access
•   Unrestricted read/write




             Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
    Read-only access to                               Write access to                          Append-only access,
    • constant data,                                  • Entire file,                           • multiple writers,
    • read entire file                                • Multiple writers:last writer wins




READ                      READ                 WRITE                         WRITE          APPEND                       APPEND




 Concurrent write and read access,               Concurrent write access to the same file        Read-only access to part of the file




WRITE                      READ                   WRITE                             WRITE     READ                    READ




                                     Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
      Default Data Movement
            Strategies
• GASS addresses bandwidth management
  issues by providing a file cache: a “local”
  secondary storage

• By default, data is moved into and out of this
  cache when files are opened and closed




           Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
                                    Processes




              Cache                                                                     Cache




GASS-server           http-server               ftp-server               HPSS-server




                Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
                 GASS Operation

• Grid applications access remote files using GASS by
  opening and closing the files with specialized open
  and close calls
   –   globus_gass_open()
   –   globus_gass_fopen()
   –   globus_gass_close()
   –   globus_gass_fclose()


Note: the GASS open and close calls act like their
  standard Unix I/O counterparts, except that a URL
  rather than a le name is used to specify the location
  of the le data.

               Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing
 Integration With Globus Toolkit

• The availability of GASS services has made it
  straightforward to extend the GRAM API:

  – Allow both executables and standard input,
    output, and error streams to be named by URLs

  – GASS mechanisms are used to fetch
     • URL-named executable into the cache.
     • standard input, and to redirect standard output and error.



            Joseph Bester et al. “GASS: A Data Movement and Access Service for Wide Area Computing

								
To top