Hall D The Grid _ Web Services by fanzhongqing


									Grid Computing

      Chip Watson
      Jefferson Lab

 Hall B Collaboration Meeting
               What is the Grid?
Some wax philosophical, and say it is an unlimited
  capacity for computing. Like the power grid, you
  just plug in and use it, don’t care who provides it.
Difficulty: “metering” your use of resources, and
  charging for them. We aren’t there yet.

Simpler view: it is a large computer center, with a
  geographically distributed file system and batch
This view assumes you have a right to use each
  piece of the distributed system, subject to perhaps
  local accounting constraints.                    2
          Key Aspects of the Grid

Data Grid: Location independent file system.
   If you know the “logical name” of a data set, you can
    find it. (Normal access controls apply).
   Files can migrate around the grid to optimize usage,
    and may exist in multiple locations.

Computational Grid: Submit a job to “the grid”.
   You describe the requirements of your job, and grid
    middleware finds an appropriate place to run it.
   Jobs can be batch, or even interactive.

        Other Important Aspects
Single Sign-On
  You “log on” to the grid once, and you can use the
  distributed resources for a certain period of time
  (sort of like the AFS file system)

  Analog: all day metro ticket

       Distributed Computing Model
In the “old” model, a lab has a large computer center,
provisioned for all demanding data storage, analysis and
simulation requirements.
In the “current” model, only a fraction resides at the lab.
   already widely used in HEP experiments
   large experiment may enlist a major computing partner site, e.g.
    IN2P3 for BaBar
In the “new” model, many sites large and small participate.
   Some sites may be special based upon capacity or specialized
    capabilities (such as robotic storage).
   LHC will use a 3 tier model, with a large central facility (tier 0),
    distributing data to moderately large national centers (tier 1), which in
    turn service small nodes (tier 2)

       What is a reasonable distribution for Hall D???
      Why desert a working model?
1. Easier to get additional funds
     State matching funds
     Also: NSF or other funding agency
2. Easier to involve students
     Room full of computers more attractive than account
     on a machine 1000 km away
3. Opportunity for innovation
     Easier to play with local machine than to get root
     access on machine 1000 km away

   Case Study:

The Lattice Portal

 A prototype virtual
computer center for
   Jefferson Lab
(under development)
1. Components of the virtual computer center
  •   Data management
  •   Batch system
  •   Interactive system
2. Architectural components
  •   Information Services using XML (Java servlets)
        Replica Catalog
        Data Grid Server (file cache & transfer agent)
        Batch Server
  •   Authentication using X.509, SSL
  •   Java client packages

             A Virtual Computer Center:
                 Data Management
Global Logical File System            (possibly constrained to a project)
   1.   Logical names for files (location independent)
   2.   Active files possibly cached at multiple sites
   3.   Inactive files in off-line storage (tape silo, multi-site)
Data Grid Node
   1.   Manages a cache of logical files, perhaps using multiple file
        servers, NFS exports files locally
   2.   Maps logical name to local (physical) file name
   3.   Supports file transfers between managed and unmanaged
        storage, and between grid nodes (queuing of transfer
Replica Catalog
   1.   Tracks which logical files exist at which data grid nodes
   2.   Contains some file meta-data to allow file selection by
        attributes as well as by name
                            In picture form…

                                                     library   ClientProgram

 MetaData Catalog Host

       ReplicaCatalog                       DataGridServer            FileServer

     Replica Catalog Host                                 File Host

1.   Get file names from meta data              2.   Get file state (on disk),
     (energy, target, magnet settings)               additional info, referral to
                                                     transfer agent
2.   Contact replica catalog to locate
     desired file. Get referral to a Data       3.   Get the file (parallel streams)
     Grid Server                                                                       10
         A Virtual Computer Center:
                  Batch System
Global Queue(s)
  A user submits a batch job, perhaps using a web
  interface, to the virtual computer center (a.k.a. meta-
  facility). Based upon the locations of the executable,
  the input data files, and the necessary compute
  resources, the job is assigned to a particular compute
  grid node (cluster of machines).

Compute Grid Node
  Set of co-located compute resources managed by a
  batch system. Typically co-located with a data grid
  node. E.g. Jefferson Lab’s Computer Center.

             Virtual Computer Center:
Conventional remote login is expected to be less
   common, as all capabilities are remotely accessible.

Interactive Services
   1.   ssh login to machine of desired architecture and
        operating system
   2.   interactive access to small clusters for serial and
        parallel jobs (or fast turnaround on local batch

As with any distributed system, there are many
  ways to construct a meta-facility or grid:
     CORBA (distributed object system)
     DCOM (Windows only)
     Custom protocols over TCP/IP or UDP/IP
     Grid Middleware
        Globus (from ANL)
        Legion (UVA)

     Web Services
                       . . . or some combination of the above
           What are Web Services?

Web Services are functions or methods that can
 be accessed across the web.

Think of this as a “better” RPC (remote procedure call)
  system. Why better?

            Why Web Services ?
Use of industry standards
Support for many languages
   Compiled and scripted
Self describing protocols
   easier management of versioning, evolution
Support for authentication
Strong Industry Support:
   Microsoft’s .NET initiative
   SUN’s ONE (Open Net Environment)
   IBM contributions to Apache / SOAP           15
                A three tier web services
        Web Browser                               Application

               Authenticated connections

XML to HTML servlet                     Web Service               Web Server
       Web Service          Web Service

       Grid Service                                         Web Service    Web

                         Local Backend
    Grid resources,                              Storage system
     e.g. Condor
                         (batch, file, etc.)     Batch system

               Web Services Details:
                         Data Grid
Replica Catalog & Data Grid Node
   List – contents of a directory
   Navigate – to another directory, or follow a soft link
   Mkdir – make a new directory
   Link – make a new link
   Delete – a logical file, directory, or link
   Properties – set / retrieve properties of a file, directory, or
      link (including protection / access control)
Replica Catalog specific
   Create – a new logical file
   Add/Remove/Select/Delete replica – manipulate references to
     where file is stored
Data Grid Node specific
   Allocate – space for an incoming file
   Copy – a file to/from unmanaged space, or another grid node
   Locate – get reference to physical file for transfer or local
     access                                                          17
                 Web Services Details:
                        Batch System
User Job Operations
          Resource requirements (CPU, memory, disk, net, …)
          Dependencies on other jobs / events
          Executables, libraries, etc., input files, output files
  Suspend / Resume
  List – by queue, owner, site, …
  View allocation, usage
Operator Operations
  On systems, queues, jobs
  On quota / allocation system

      Technology Choice: XML + …
   Self describing data (contains meta data)
       Facilitates heterogenous systems
   Robust against evolution
    (no fragile versioning that distributed object systems encounter)
       New server generates additional tags which are ignored by old client
       New client detects absence of new tags & knows it is talking to an old
       server (and/or supplies defaults)
   Capable of defining all key concepts and operations for both
    client-server and client-portal communications
   XML – eXtensible Markup Language
   SOAP – Simple Object Access Protocol (~modern rpc system)
   WSDL – Web Services Description Language (~idl)
   UDDI – Universal Description, Discovery and Integration
  Technology Choices: Java Servlets
Java Advantages
  1.   Rapid code development
  2.   No memory leaks
  3.   Easy to use interface to SQL databases, XML libraries
  4.   Rich library of low level components (containers, etc.)
Web + Servlet Advantages
  1.   Java (see above)
  2.   Scalability (see e-commerce)
  3.   Modular web services
       –   One servlet can invoke another, e.g. to translate XML to HTML
Minor Web Inconvenience
  1.   Asynchronous notification of clients of web services

     PPDG Collaboration: JLAB/SRB
Web services as a general portal to a variety of back end storage
systems (JLAB, SRB, …)
    And other services – batch
Project should define the abstractions at the web services level;
define all metadata for interacting with a storage system
    Define XML to describe digital objects and collections/directories (ALL)
         Metadata to describe logical namespace of the grid (SRB, JLAB, GridFTP
         Standard structure for organizing as XML
    Define (WSDL ?) operations of browse, query, manage (ALL)
         Listing files available through interface,
         Caching, replication, pinning, staging, resource allocation, etc
Back-end implementations
    JASMine (JLAB)
    SRB (SDSC)
    (SRM, Globus)
Implement demonstration web services client (JLAB)
    Web services clients should be able to interact with any of these
                              JLAB mss - JASMine
                                                                                     Stand alone cache
                                                        2 TB Farm cache
                                                                                            Pluggable policies
                                                                                     Implemented in Java
                                                                                     Distributed, scaleable
                                                                                     Pluggable security
                                                                                            Authentication &
Tape storage system                                                                         To be integrated with
• 12000 slot STK silos                            15 TB Experiment cache pools              GSI
• 8 Redwood, 10 9940, 10 9840 drives                                                 Scheduling of drives
• 7 Data movers ~ 300 GB buffer each                                                 Can manage tape, tape
• Software – JASMine
                                                                                      and disk, or disk alone

                                          JASMine managed
                                       mass storage sub-systems
   0.5 TB LQCD cache pool
Example – demo client

           Similar to a graphical ftp client, but:
                  Each half can attach to a grid
                       Cache – managed filesystem
                       User’s home directory
                       Other file systems at web server
                       Replica catalog
                       Local mss if it is separate from
                       replica system
                  Can move files in/out of managed
                  Negotiates compatible protocols
                   between grid nodes
                       E.g., http, SRB, gridFTP, ftp,
                       bbftp, JASMine, etc

          Technologies Employed
Apache web server
Tomcat servlet engine, SOAP libraries
Java Server Pages (JSP)
XML data format
XSL style sheets for presentation
X.509 certificate authentication
   Web interface to a simple certificate authority to issue
    certificates valid within the meta-facility (signed by
    Jefferson Lab)

                            Data Grid
Capabilities planned:
   Replicated data (multi-site), global tree structured name space
    (like Unix file system)
   Replica catalog, replicated to multi-site
     using mySQL as back end, probably using mySQL’s ability to replica
       the catalog (fault tolerance)
   Browse by attributes as well as by name
   Parallel file transfers (
        bbftp, gridftp, …
        Jpars – 100% java parallel file transfers (w/ 3rd party, authen.)
   Drag-n-drop between sites
   Policy based replication (auto migrate between sites)

   Browse contents of a prototype disk cache / tape
    storage file system
   Move files between managed and unmanaged
    storage on data node
   Move files (including entire directories) between
    desktop and data node
   Displays if file is currently in disk cache
   Can request move from tape to disk (not released)
   3rd party file transfers (between 2 servers)

                Near Term
Convert from raw XML to SOAP (this month)
Deploy disk cache manager to FSU & MIT
Abstract disk-to-tape migration of current system
to use WAN site-to-site migration of files;
wrapping, e.g. gridftp or other parallel transfer


Grid Capabilities are starting to emerge
Jefferson Lab will have a functioning data
grid in FY02
Jefferson Lab will have a functioning
meta-facility in FY03


To top