CHEP 2003 Summary Grid Architecture, Infrastructure, & Middleware Monitoring & Security
Andrew Hanushevsky
Stanford Linear Accelerator Center
Legal Disclaimer
This summary is from one perspective
It is not representative of any particular view
Other than the presenter
This summary is not warranted for any purpose whatsoever
Participants assume all direct and indirect (consequential or inconsequential) damages
Do you want to stay?
March 25, 2003 2: CHEP 2003
Grid Deployment
Track I talks referenced grid “deployment”
Deployment has many meanings
Minimally, if you have it working it better be usable
Is it production ready?
March 25, 2003
3: CHEP 2003
Production Grids
LCG Experience Suggests It Is Difficult
Packaging, Installation, Configuration, & Validation Issues “These issues (and more) make the difference between the research project ending with a demo and the product to be used for production.”
-- Zdenek Sekera
Assume LCG (T#184) interpretation of production
Harsh but be need a benchmark
4: CHEP 2003
March 25, 2003
What is “production quality”?
It is all of the following in no particular order: availability 24 x 7 performance stability, robustness user friendliness maintainability user support
From LCG T#184
March 25, 2003 5: CHEP 2003
So Where Are We?
Let’s take a look at presented “grid” projects in alphabetic order
From Grid to Grid-Like
Disclaimer!
This is not representative of all such projects
March 25, 2003
6: CHEP 2003
AliEn (M#253)
Distributed environment with Grid interface
SASL (includes GSI) EDG compatible authentication Distributed RDBMS-based file catalog Condor-like job scheduling Attempts to unify grid infrastructures Adopted by MamoGrid (M#66)
March 25, 2003
7: CHEP 2003
Amanda (M#110)
Ostensibly production ready
Condor + Bypasses + Local Tools (Grid Navigator)
Uses central s/w and data repositories Runs a specific application software suite Plan to integrate Globus middleware as it matures
March 25, 2003
8: CHEP 2003
DIRAC (M#253)
Distributed environment
Essentially a roll-your-own grid-like solution
Interface to EDG now in test
EDG stability considered problematic
Successfully deployed on 17 sites
March 25, 2003
9: CHEP 2003
EDG
Workload Management WP1 (M#132 & 137)
Deployed for 18 months
Still pre-production stage
Various problems in reliability & scalability DAGMan integration Grid Accounting Resource reservation & co-allocation • Globus GARA Approach
Numerous improvements planned
March 25, 2003
10: CHEP 2003
EDG (continued)
Data Management WP2 (T#249 & 490)
Basic use cases satisfied Not proven in a “real user environment”
Pre-production
Numerous additions planned
Logical collection Enhanced security
Authorization and delegation
OGSA direction with future compliance
11: CHEP 2003
March 25, 2003
NorduGrid (M#109)
Modified/Extended Globus + EDG RLS
Pre-production stage Additional EDG integration as stability improves Web Services (OGSA) plans
March 25, 2003
12: CHEP 2003
SAM (T#335)
Successful for D0 and CDF
Work under way to integrate with grid middleware Production D0 release of SAMGrid (JIM+Condor-G) scheduled for April One of the arguably successful grid-like projects
Largely dealing with data management issues
March 25, 2003
13: CHEP 2003
STAR (T#442)
Distributed environment
Essentially a roll-your-own grid-like solution
Interface to Condor-G Uses LBL HRM/DRM
Successful (but limited) deployment
NERSC & BNL
March 25, 2003
14: CHEP 2003
Storage Resource Broker (T#211)
Successful deployment across multiple fields
Work underway to integrate with Globus data mangement One of the arguably successful grid-like projects
Limited to data management
March 25, 2003
15: CHEP 2003
The Successes
Few projects have achieved “production” status
Those which have are focused and grid-like
SAM, SRB soon to follow AliEn. Dirac, & Star Historical timeline? Immediate need for results? Funding model? Grid protocols in flux (e.g., Globus 2 vs Globus 3)? Open software/collaboration issues? Sociological phenomena? Time will tell….
16: CHEP 2003
It is not clear why this is so
Fortunately many plan to integrate with the “standard” grid
March 25, 2003
The Fast Trackers
These projects have only incorporated some grid middle-ware
Amanda & NorduGrid Are we entering the OSI model of development?
Many difficult issues have been avoided, but….
Pick and choose from a bag of protocols & tools
This does not bode well for interoperability
March 25, 2003
17: CHEP 2003
The Simmering
“These” projects have embraced the grid
EDG (parallels and derivates) Adopted the long range view (2 or more years) Depends on your of view of next generation computing It seems that all projects are hedging their bet
Problems not being avoided
Will this be to the benefit of the HEP community?
You wonder where we would be if all the hundreds of current FTE’s were focused on making this model really work
March 25, 2003
18: CHEP 2003
State of Security
Three dominate themes
Private Key Management
KCA (T#422), VSC etc. (T#81) VOMs (T#317) & GUMs (T#363) GACL (T#190), SAZ (T#423), Akenti (T#426), CAS (T#441, 518)
Virtual Organization Management
Authorization (a.k.a. Access Control)
March 25, 2003
19: CHEP 2003
Security Convergence
Other than x.509 there is little common ground
But, does there need to be any common ground?
Key management is a matter of trust policy VO administration is a site or multi-lateral prerogative Authorization is largely a local issue
It seems that if you can agree on the credentials (i.e., x.509 + endorsements) the rest is relegated to collaboration policy irrespective of implementation This appears to be the direction
March 25, 2003
Even if it’s not obvious at the moment
20: CHEP 2003
Grid Monitoring
There is much activity
Much of it overlapping
BOSS (M#84), GMA (M#403), GridMonitor (M#321), Mona Lisa (M#103), PerfMC (M#522), & R-GMA (M#407)
Some convergence
Minimum set of events Format (XML yet no “lingua franca” agreement)
This is an area to watch!
GGF is likely the stomping ground for agreement
21: CHEP 2003
March 25, 2003
The Ultimate Highlights
Virtual Data XML Distributed File Systems Job Scheduling Peer to Peer Computing “The” Award
March 25, 2003
22: CHEP 2003
The Innovation Most At Risk
Virtual Data (T#106 & 114)
Great concept at technological mercy
The Optiputer is the menace. Consider….
Unlimited bandwidth Ever decreasing storage costs Constant software changes Sociological problems of capturing the processing path
Together these may make VD untenable
March 25, 2003
23: CHEP 2003
Things to Watch For I
XML
This is rapidly becoming the common syntax
Yet little effort in developing a common language
Assumption, perhaps misguided, that WSDL repositories will address the problem • Diamonds (iKnow) architecture (Java RMI + JINI)
Distributed Grid File Systems
Minimal data movement with global access
AlienFS (R#254)
There are many others that were not presented
24: CHEP 2003
March 25, 2003
Things to Watch For II
Job to Data Scheduling
Algorithms to place a job near the data
Minimize data movement
Peer To Peer Computing
Marxist scheduling aiming for 100% utilization
Not yet addressed by current grid architectures
Ad hoc protocols Subversive in that this may be the “real” next thing
March 25, 2003
Augernome (R#293)
25: CHEP 2003
Summarizer’s Award
The project that makes innovative yet practical use of existing grid protocols
Grid Brick (R#493)
Parallel root-based query using Globus scheduling
Uncomplicated and practical needs-based approach It’s so obvious you wonder why you didn’t do it first Load balancing and fault tolerance to be explored
It works within a standard grid environment!
March 25, 2003
26: CHEP 2003
Conclusions
Grid efforts are still meandering
Great for innovation Dismal for standardization
Security is a bright spot
Rapid convergence on authentication issues Authorization is more fuss than furry
There is a light at the end of tunnel
Monitoring situation is disappointing
The need is recognized but no agreement on how to proceed
Cross grid monitoring is in serious jeopardy
27: CHEP 2003
March 25, 2003