Grid Computing Elementary Particle Physics
Document Sample


Grid Computing &
Elementary Particle Physics
brought to you by
Dr. Rüdiger Berlich
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Agenda
Who am I
Past and present work
Ph.D. thesis
Grid Computing & Particle Physics
Motivation and Problem Definition
Conflicting Visions ?
Software Framework & Grid Development
Large Computing Resources
Conclusion
Q&A
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Who am I ?
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Past and Present Work
Physicist – Diploma at Ruhr-Universität Bochum / Germany
Recognition of hadronic splitoffs using Neural Networks
Crystal Barrel experiment / CERN / Geneva
Aleph experiment (Max Planck Institute, Munich / Germany)
Member of SuSE Linux AG until July 2001
Engineer + international support, authoring of manual
Technical Manager (Support) of US Office, Oakland / CA
Managing Director UK office
Ph.D. thesis at Ruhr-Universität Bochum
“Application of Evolutionary Strategies to automated
parametric optimisation studies in physics research”
Permanently based at Forschungszentrum Karlsruhe
(Grid Computing and eScience)
Completed “Magna Cum Laude” in January 2004
Postdoctoral position at RUB / FZK
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Ph.D. thesis
Signal in histogram resulting from cuts
Varying cuts can lead to improved signal
With figure of merit == f(cuts) :
Accessible to computerized maximization techniques
,,, Often used : Significance 2 : S2 = N 2 / (N + 2B 0 )
Problem : Testing quality of cuts in particle physics
requires the processing of huge amounts of data for
each set of cuts
„Events“ Histogram with signal e.g. reduced Background
Analysis / Cuts Optimization
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Ph.D. thesis / EVA
Implementation of Evolutionary evaDouble evaMember
evaMember STL vector<T*>
Strategies and Genetic Algorithms
Parallel Execution on SMP evaBit
evaIndividual<T>
(POSIX threads), clusters and
Grid (through MPICH) possible evaMember
evaBitset<N>
In MPI-mode : data exchange evaPopulation<T>
gap
through XML
Seemless parallelization – no change
boundary evaMPIPopulation<S,T> evaPthreadPopulation<S,T>
of user code required
Serial mode allows easy debugging
Implemented in C++
Derivative of STL vector class,
hence fully templatized
Open Source
Interface to ROOT
In idealised environment : reduction
of compute time from close to 6 hours
to under 3 minutes (1+128 ES)
Almost linear speedup
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Ph.D. thesis / EVA
Variation of 4 parameters
Significant improvement of
figure of merit even for hand optimised
selection
Through parallelisation: reduction of compute
time from 38 hours to under 3 hours
High Number of
lokal Minima
Increasing
Frequency and
Amplitude
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Relationship to Grid
Original thought: Make EVA a Grid application
but: comes for free (e.g.: submit mpd as Grid job, with
program in input sandbox; use MPICH-G2; ...)
GridKa (Karlsruhe Cluster – 4500 CPUs in final stage)
BaBar Grid activities
D-Grid / EGEE
Freelance journalistic activities (Grid series in Linux Magazin,
Linux User & Developer / UK, ...)
Talks at Linux Tag, UKUUG, DSY, ...
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Grid Computing &
Elementary Particle Physics
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
In particle physics :
Got many “events”
Here : From BaBar
experiment
Collision of electrons
and positrons
Need to store and
analyse all available
data
How much data is
there ?
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
Moore's Law : Processing Power doubles every 18 months*)
Demand for computing power grows faster than available resources
*) slow-down expected in about 10 years (www.wired.com)
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
future data sources at CERN
Atlas CMS
LHCb
.
.
. .
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
Expect datarate of
40 Gb/s for all 4 LHC
experiments, equivalent
to a datarate of
10 petabytes
for all experiments
per year.
This is comparable to
a 16 sqm room, filled
every year with DVDs
containing the data.
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
Analysis in PP must keep up with integrated Luminosity
But : Need for computing power rises faster than Moore's Law
Possible solutions :
a) Find someone to pay for additional computers and/or
b) use existing, distributed resources.
Solution b) needs standardised framework for distributed computing
Problem applies to many areas, beyond the boundaries
of Particle Physics.
“New” research discipline, sometimes
called GRID Computing
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Motivation :
Using existing, distributed resources is a good idea anyway
Single countries can't afford multi-billion dollar investments
into particle physics anymore
Might still be willing to donate a portion of overall expenses
But: want to spend money locally !
More willing to invest in local compute centres than into resources
elsewhere ...
Vision:
Need to bundle existing or newly established computing resources
to form a coherent, standardized ensemble capable of running very
large scale jobs (p.physics, wheather simulation, biology, ...)
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Two people with another vision :
Carl Kesselman and Ian Foster
“The Grid - Jointly they can be considered
Blueprint for a new to be the “fathers of
Computing Infrastructure” Grid Computing”
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
The Vision (a la Foster & Kesselman)
Computing Power from a plug in the wall
The world is your computer
Analogy : Power Grid
Seamless exchange of computing power
Logical extension of the World Wide Web ?
"The Web on Steroids"
A hype (which has good and bad sides ...)
Takes distributed computing to a new level
⇒ Different people mean different things
when talking about “The Grid”
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
The Vision (common grounds + “definition”)
"When the network is as fast as the computer's
internal links, the machine disintegrates across the
net into a set of special purpose appliances"
(Gilder Technology Report, June 2000)
"Grid technologies and infrastructures can be defined
as supporting the sharing and coordinated use of
diverse resources in dynamic, distributed 'virtual
organisations' " (OGSA white paper)
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Limitations of distributed computing
Ping : round-trip
time on LAN and
WAN
Already in the
range of, say, an
old MFM hard-
drive. And
bandwidth beats it
by several orders
of magnitude ...
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Limitations of distributed computing
High-speed network connection (the faster, the more data
can be exchanged between participating nodes).
"Speed" means :
a) High Bandwidth - Scales ! No longer a limiting factor.
b) Low Latency - Doesn't scale ! (X-North America Latency ca. 20 msec)
20msec (Latency)
Bandwidth :
#packets/s
San Francisco Network New York
Latency limits possible application types !
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Requirements of distributed computing
High Bandwidth / Low Latency connections are needed !
One current focus of research and development
From "local GRIDs" to the "World Wide GRID"
GEANT, TeraGrid
Common middleware framework needed; Standards !!
⇒ Joint GRID efforts needed and possible
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Middleware initiatives related to P.P.
Of course: Globus 2 (3,4,... not too relevant a.t.m.)
The European DataGrid (project finished succesfully in March 2003)
From the Data Grid Project Presentation :
"The main goal of the DataGrid is to develop and test
the technological infrastructure that
will enable the implementation of scientific collaboratories
where researchers and scientists will perform their
activities regardless of geographical location."
LCG2 (based on EDG)
EDG terminated March 2004 – EGEE will take over
GriPhyN (Grid Physics Network) - american counterpart to EDG
AliEn (Alice Environment)
(Cross Grid -> Interactive Applications based on EDG framework)
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework
The Globus Toolkit 2 - www.globus.org
Base technology for the majority of Grid projects
Globus is based on the I-WAY project (“Information Wide
Area Year”), a test bed of 17 US research institutions
Effort led by Foster and Kesselman (Argonne Lab and USC)
Layer between Operating System and GRID application
Takes care of many aspects of GRID computing including
the communication between the participating program
fragments, authentication, security, ...
"Protocol layer"
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework - Globus
Basic functionality : submit job -> like batch submission
Does not contain a Resource Broker (see EDG)
The Globus Toolkit includes among other components :
GSI (Grid Security Infrastructure) : Authentication
GASS (Global Access to Secondary Storage)
Uses RSL (Resource Specification Language) to specify resources
(min. size of memory, OS, etc.)
When GRID computing becomes more mature, more
services will migrate from the middleware into the
Operating System. Needed for seemless integration !
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework – Globus 2
1.) Job transmission to server via
HTTP as an RSL document
2.) Server forks jobmanager, hands
over RSL document
3.) jobmanager parses RSL, checks
the job requirements
4.) jobmanager distributes the
job to local resources in cluster
5.) jobmanager sends a unique job
id (URI) to the client
6.) The client ca use the URI to
cancel the job, when needed, or
gain status information
Courtesy
Dr. Harald Kornmayer
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework – Globus 2
Three major services
Resource Management
Information Service
Data Management
Courtesy
Dr. Harald Kornmayer
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Globus 3: Software Framework
Compute Service Provider
Miner
factory Database
Community 3.) C reate miner 5.) Query service
registry service with 4.) C reate
lifetime 10 instance
1. Bio
1.) Find datamining 2.) Handles for Miner
service and miner and database
possible storage database factories
locations 6.) Results
6.) Keepalive
User
application 5.) Query
6.) Keepalive
Database
Database
6.) Results service
I want to create a
personal database 3.) C reate database 4.) C reate
containing data on service with instance 2. Bio
lifetime 1000
E.coli metabolims Database
database
factory
Storage Service Provider
Most PP-based Grid Computing still based on GT2
Globus 4 (Web service / WSRF): source of confusion in P.P.
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework – EDG
Joint three-year project of European Union
Goal: Development of methods for the transparent distribution
of data and programs
Needed in particle physics, biology (genome project), earth observation ...
21 members, 15 compute centers (2-32 CPUs, up to
1 Terabyte of mass storage)
LCG based on EDG2 – accesses resources with thousands of CPUs, e.g.
GridKa at Forschungszentrum Karlsruhe
Major new component: resource broker
Project was finished in March 2003. Successor will be EGEE
(“Enabling Grids for eScience in Europe” - Edinburgh NeSC, kick-off
next week in Cork / Ireland)
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework – EDG
Compute centers
(sites) offer resources
(Storage Elements SE
or Compute Elements CE)
Each site publishes its
status in an information
index
Information about
available data is stored
in the Replica Location
Server (RLS)
Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
EDG – Virtual Organisations
Users and resources
belong to Virtual
Organisations
Users can only access
resources that belong
to their own VO
Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
European DataGrid – JDL + RB
Jobs are sent to the Resource Broker by the user
Job specifications are encoded in the “Job Description Language” (JDL)
Executable = "./myexecutable";
Arguments = "--config small-file.cfg --data BIG-file.dat";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"small-file.cfg"};
OutputSandbox = {"std.out","std.err","run.log"};
InputData = {"LF:BIG-file.dat"};
ReplicaCatalog = "ldap://rls.example.org:12345/lc=WPsixCollection,\
rc=WPsixRC,dc=testbed,dc=fzk,dc=de";
DataAccessProtocol = {"gridftp"};
Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
The AliEn Toolkit (“Alice Environment”)
Grid + OpenSource
Pure Open Source project, started as part of ALICE collaboration (CERN)
Small development team (very different from EDG)
Pragmatic approach (what do we have, how can we make it work)
3 Million lines of code (cmp. Linux kernel: ca. 5.5 Mio LOC)
99 % of the code taken from publicly available packages, mostly Perl
Only about 1 % of the code had to be developed in addition
Similar functionality to EDG framework
Based on WebServices (SOAP, XML)
Used in other projects, e.g. MammoGrid (UK), a breast cancer database
See http://alien.cern.ch
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
The AliEn Toolkit / Grid + OpenSource
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Large Compute Resources for P.P.
At Forschungszentrum Karlsruhe :
GridKa Cluster
Tier-1 centre for LCG and other experiments
4500 CPU's in the final stage
Research in fast interconnects (Infiniband), ROOT toolkit, ...
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
The LHC multi-Tier
Computing
Model Uni e Lab z
CMS
ATLAS
virtual organizations
Uni a
CERN Tier 1
Lab y UK (RAL)
USA
(Fermi, BNL)
France
Tier 3 Tier 1 (IN2P3)
(Institute Tier 2 C ERN Uni b
computer)
(Uni-CCs, Italy Tier 0LHCb
……….
Lab-CCs) (CNAF)
Tier 4
( Desktop) γ Germany
………. (FZK)
Lab x
Lab i
β
working groups
Uni d
α Uni c
Tier 0 Centre at CERN
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
GridKa Machine Hall
Also : worlds first totally water-cooled rack ...
1 Gbit/s between Karlsruhe and CERN
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
GridKa planned Resources
8000 4000
CPU
Disk
6000 Tape 3000
kSI95
Tbyte
4000 680 CPUs 2000
160 TB disk
250 TB tape
2000 1000
0 0
2002 2003 2004 2005 2006 2007 2008 2009
LCG Phase I Phase II Phase III
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
End of June 2003 – delivery of another 70 TB
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Software Framework / FZK
Distributed / cluster filesystems ? (Only partially related)
NFS, Open AFS, OpenGFS, PVFS, PVF, PNFS, LegionFS
Avaki (based on Legion FS), Lustre, AliEn-FS
GPFS
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Conclusion
Still early days, but ...
Huge investments into infrastructure
Many, partially incoherent Grid initiatives
Far from standardisation
Globus 2 still in use, will remain so
for a while ...
Requirements of LHC, other
particle physics experiments
enforce some sort of a GRID
oriented solution in P.P.
Real need starts with LHC (when ??)
Research needed to start early to ensure efficient
usage of billion dollar investments
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Please feel free to ask questions !
I appreciate the continuous interest and support by the
German Federal Ministry of Education and Research, BMB+F .
April 16th / 2 004, Sankt. Augustin / NEC ruediger@ berlich.de
Related docs
Get documents about "