Hall D The Grid _ Web Services
Document Sample


Grid Computing
Chip Watson
Jefferson Lab
Hall B Collaboration Meeting
1-Nov-2001
What is the Grid?
Some wax philosophical, and say it is an unlimited
capacity for computing. Like the power grid, you
just plug in and use it, don’t care who provides it.
Difficulty: “metering” your use of resources, and
charging for them. We aren’t there yet.
Simpler view: it is a large computer center, with a
geographically distributed file system and batch
system.
This view assumes you have a right to use each
piece of the distributed system, subject to perhaps
local accounting constraints. 2
Key Aspects of the Grid
Data Grid: Location independent file system.
If you know the “logical name” of a data set, you can
find it. (Normal access controls apply).
Files can migrate around the grid to optimize usage,
and may exist in multiple locations.
Computational Grid: Submit a job to “the grid”.
You describe the requirements of your job, and grid
middleware finds an appropriate place to run it.
Jobs can be batch, or even interactive.
3
Other Important Aspects
Single Sign-On
You “log on” to the grid once, and you can use the
distributed resources for a certain period of time
(sort of like the AFS file system)
Analog: all day metro ticket
4
Distributed Computing Model
In the “old” model, a lab has a large computer center,
provisioned for all demanding data storage, analysis and
simulation requirements.
In the “current” model, only a fraction resides at the lab.
already widely used in HEP experiments
large experiment may enlist a major computing partner site, e.g.
IN2P3 for BaBar
In the “new” model, many sites large and small participate.
Some sites may be special based upon capacity or specialized
capabilities (such as robotic storage).
LHC will use a 3 tier model, with a large central facility (tier 0),
distributing data to moderately large national centers (tier 1), which in
turn service small nodes (tier 2)
What is a reasonable distribution for Hall D???
5
Why desert a working model?
1. Easier to get additional funds
State matching funds
Also: NSF or other funding agency
2. Easier to involve students
Room full of computers more attractive than account
on a machine 1000 km away
3. Opportunity for innovation
Easier to play with local machine than to get root
access on machine 1000 km away
6
Case Study:
The Lattice Portal
A prototype virtual
computer center for
Jefferson Lab
(under development)
Contents
1. Components of the virtual computer center
• Data management
• Batch system
• Interactive system
2. Architectural components
• Information Services using XML (Java servlets)
Replica Catalog
Data Grid Server (file cache & transfer agent)
Batch Server
• Authentication using X.509, SSL
• Java client packages
8
A Virtual Computer Center:
Data Management
Global Logical File System (possibly constrained to a project)
1. Logical names for files (location independent)
2. Active files possibly cached at multiple sites
3. Inactive files in off-line storage (tape silo, multi-site)
Data Grid Node
1. Manages a cache of logical files, perhaps using multiple file
servers, NFS exports files locally
2. Maps logical name to local (physical) file name
3. Supports file transfers between managed and unmanaged
storage, and between grid nodes (queuing of transfer
requests)
Replica Catalog
1. Tracks which logical files exist at which data grid nodes
2. Contains some file meta-data to allow file selection by
attributes as well as by name
9
In picture form…
library ClientProgram
MetaDataCatalog
MetaData Catalog Host
ReplicaCatalog DataGridServer FileServer
Replica Catalog Host File Host
1. Get file names from meta data 2. Get file state (on disk),
(energy, target, magnet settings) additional info, referral to
transfer agent
2. Contact replica catalog to locate
desired file. Get referral to a Data 3. Get the file (parallel streams)
Grid Server 10
A Virtual Computer Center:
Batch System
Global Queue(s)
A user submits a batch job, perhaps using a web
interface, to the virtual computer center (a.k.a. meta-
facility). Based upon the locations of the executable,
the input data files, and the necessary compute
resources, the job is assigned to a particular compute
grid node (cluster of machines).
Compute Grid Node
Set of co-located compute resources managed by a
batch system. Typically co-located with a data grid
node. E.g. Jefferson Lab’s Computer Center.
11
Virtual Computer Center:
Interactive
Conventional remote login is expected to be less
common, as all capabilities are remotely accessible.
Nevertheless…
Interactive Services
1. ssh login to machine of desired architecture and
operating system
2. interactive access to small clusters for serial and
parallel jobs (or fast turnaround on local batch
system)
12
Implementation?
As with any distributed system, there are many
ways to construct a meta-facility or grid:
CORBA (distributed object system)
DCOM (Windows only)
Custom protocols over TCP/IP or UDP/IP
Grid Middleware
Globus (from ANL)
Legion (UVA)
Web Services
. . . or some combination of the above
13
What are Web Services?
Web Services are functions or methods that can
be accessed across the web.
Think of this as a “better” RPC (remote procedure call)
system. Why better?
14
Why Web Services ?
Use of industry standards
HTTP, HTTPS, XML, SOAP, WSDL, UDDI, …
Support for many languages
Compiled and scripted
Self describing protocols
easier management of versioning, evolution
Support for authentication
Strong Industry Support:
Microsoft’s .NET initiative
SUN’s ONE (Open Net Environment)
IBM contributions to Apache / SOAP 15
A three tier web services
architecture
Web Browser Application
Authenticated connections
XML to HTML servlet Web Service Web Server
(Portal)
Web Service Web Service
Remote
Grid Service Web Service Web
Server
Local Backend
Grid resources, Storage system
Services
e.g. Condor
(batch, file, etc.) Batch system
16
Web Services Details:
Data Grid
Replica Catalog & Data Grid Node
List – contents of a directory
Navigate – to another directory, or follow a soft link
Mkdir – make a new directory
Link – make a new link
Delete – a logical file, directory, or link
Properties – set / retrieve properties of a file, directory, or
link (including protection / access control)
Replica Catalog specific
Create – a new logical file
Add/Remove/Select/Delete replica – manipulate references to
where file is stored
Data Grid Node specific
Allocate – space for an incoming file
Copy – a file to/from unmanaged space, or another grid node
Locate – get reference to physical file for transfer or local
access 17
Web Services Details:
Batch System
User Job Operations
Submit
Resource requirements (CPU, memory, disk, net, …)
Dependencies on other jobs / events
Executables, libraries, etc., input files, output files
…
Cancel
Suspend / Resume
List – by queue, owner, site, …
View allocation, usage
Operator Operations
On systems, queues, jobs
On quota / allocation system
18
Technology Choice: XML + …
Advantages
Self describing data (contains meta data)
Facilitates heterogenous systems
Robust against evolution
(no fragile versioning that distributed object systems encounter)
New server generates additional tags which are ignored by old client
New client detects absence of new tags & knows it is talking to an old
server (and/or supplies defaults)
Capable of defining all key concepts and operations for both
client-server and client-portal communications
Technologies
XML – eXtensible Markup Language
SOAP – Simple Object Access Protocol (~modern rpc system)
WSDL – Web Services Description Language (~idl)
UDDI – Universal Description, Discovery and Integration
19
Technology Choices: Java Servlets
Java Advantages
1. Rapid code development
2. No memory leaks
3. Easy to use interface to SQL databases, XML libraries
4. Rich library of low level components (containers, etc.)
Web + Servlet Advantages
1. Java (see above)
2. Scalability (see e-commerce)
3. Modular web services
– One servlet can invoke another, e.g. to translate XML to HTML
Minor Web Inconvenience
1. Asynchronous notification of clients of web services
20
PPDG Collaboration: JLAB/SRB
Web services as a general portal to a variety of back end storage
systems (JLAB, SRB, …)
And other services – batch
Project should define the abstractions at the web services level;
define all metadata for interacting with a storage system
Define XML to describe digital objects and collections/directories (ALL)
Metadata to describe logical namespace of the grid (SRB, JLAB, GridFTP
attributes…)
Standard structure for organizing as XML
Define (WSDL ?) operations of browse, query, manage (ALL)
Listing files available through interface,
Caching, replication, pinning, staging, resource allocation, etc
Back-end implementations
JASMine (JLAB)
SRB (SDSC)
(SRM, Globus)
Implement demonstration web services client (JLAB)
Web services clients should be able to interact with any of these
21
JLAB mss - JASMine
Features
Stand alone cache
manager
2 TB Farm cache
Pluggable policies
Implemented in Java
Distributed, scaleable
Pluggable security
Authentication &
authorization
Tape storage system To be integrated with
• 12000 slot STK silos 15 TB Experiment cache pools GSI
• 8 Redwood, 10 9940, 10 9840 drives Scheduling of drives
• 7 Data movers ~ 300 GB buffer each Can manage tape, tape
• Software – JASMine
and disk, or disk alone
JASMine managed
mass storage sub-systems
0.5 TB LQCD cache pool
22
Example – demo client
Similar to a graphical ftp client, but:
Each half can attach to a grid
node:
Cache – managed filesystem
User’s home directory
Other file systems at web server
Replica catalog
Local mss if it is separate from
replica system
Can move files in/out of managed
store
Negotiates compatible protocols
between grid nodes
E.g., http, SRB, gridFTP, ftp,
bbftp, JASMine, etc
23
Technologies Employed
Apache web server
Tomcat servlet engine, SOAP libraries
Java Server Pages (JSP)
XML data format
XSL style sheets for presentation
X.509 certificate authentication
Web interface to a simple certificate authority to issue
certificates valid within the meta-facility (signed by
Jefferson Lab)
24
Data Grid
Capabilities planned:
Replicated data (multi-site), global tree structured name space
(like Unix file system)
Replica catalog, replicated to multi-site
using mySQL as back end, probably using mySQL’s ability to replica
the catalog (fault tolerance)
Browse by attributes as well as by name
Parallel file transfers (
bbftp, gridftp, …
Jpars – 100% java parallel file transfers (w/ 3rd party, authen.)
Drag-n-drop between sites
Policy based replication (auto migrate between sites)
25
Status
Prototype
Browse contents of a prototype disk cache / tape
storage file system
Move files between managed and unmanaged
storage on data node
Move files (including entire directories) between
desktop and data node
Displays if file is currently in disk cache
Can request move from tape to disk (not released)
Soon
3rd party file transfers (between 2 servers)
26
Near Term
Convert from raw XML to SOAP (this month)
Deploy disk cache manager to FSU & MIT
(4Q01)
Abstract disk-to-tape migration of current system
to use WAN site-to-site migration of files;
wrapping, e.g. gridftp or other parallel transfer
(1Q02)
27
Conclusions
Grid Capabilities are starting to emerge
Jefferson Lab will have a functioning data
grid in FY02
Jefferson Lab will have a functioning
meta-facility in FY03
28