Benefits of the MAGIC Grid Status report of an EGEE generic application
Harald Kornmayer, Ariel Garcia (Forschungszentrum Karlsruhe) Toni Coarasa (Max-Planck-Institut für Physik, München)
EnablingGrids for E-sciencE
Ciro Bigongiari (INFN, Padua)
Esther Accion, Gonzalo Merino, Andreu Pacheco, Manuel Delfino
(PIC, Barcelona)
Mirco Mazzucato (CNAF/INFN Bologna)
in cooperation with MAGIC collaboration
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 1
Outline
• Introduction
• What kind of MAGIC? • The idea of a MAGIC Grid
EnablingGrids for E-sciencE
• Grid added value
• Expectations vs. reality?
• Data challenges
• Experience
• Conclusion and Outlook
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 2
Introduction: The MAGIC Telescope
• Ground based Air Cerenkov Telescope • Gamma ray: 30 GeV - TeV • LaPalma, Canary Islands (28° North, 18° West) • 17 m diameter • operation since autumn 2003 (still in commissioning) • Collaborators: IFAE Barcelona, UAB Barcelona, Humboldt U. Berlin, UC Davis, U.
Lodz, UC Madrid, MPI München, INFN / U. Padova, U. Potchefstrom, INFN / U. Siena, Tuorla Observatory, INFN / U. Udine, U. Würzburg, Yerevan Physics Inst., ETH Zürich
EnablingGrids for E-sciencE
Physics Goals: Origin of VHE Gamma rays Active Galactic Nuclei Supernova Remnants Unidentified EGRET sources Gamma Ray Burst
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 3
Ground based γ-ray astronomy
Gamma ray Particle shower GLAST (~ 1 m2)
EnablingGrids for E-sciencE
~ 10 km
Cherenkov light Image of particle shower in telescope camera
~ 1o
~ 120 m
reconstruct: arrival direction, energy reject hadron background
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 4
MAGIC – Why the Grid?
MAGIC is an international collaboration
• Partners
EnablingGrids for E-sciencE
Analysis is based on Monte Carlo simulations
• CORSIKA code • CPU
distributed all over
Europe
• Amount
consuming
of data can NOT be handled by one partner only (up to 200 GB per night) to data and computing needs to be more efficient will build a second telescope
• 1 night of hadronic background needs 20000 days on 70 computer
• Lowering
• Access
• MAGIC
the threshold of MAGIC telescope requires new methods based on MC simulations CPU power needed!
• More
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 5
Developments - Requirements
explore the energy range 10 GeV – 100 GeV
EnablingGrids for E-sciencE
• MAGIC needs a lot of CPU to simulate the hadronic background to
• MAGIC needs a coordinated effort for the MonteCarlo production
• MAGIC needs an easy accessible system
(Where are the data from run_1002 and run_1003?)
• MAGIC needs an scalable system (as MAGIC II will come 2007) • MAGIC needs the possibility to access data from other experiments (HESS, Vertias, GLAST, PLANCK(?)) for multi wavelength campaigns
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 6
The infrastructure idea
• Use three national Grid centers
• CNAF, PIC, GridKA • All the EGEE members • Run the central services
EnablingGrids for E-sciencE
• Connect MAGIC resources to enable collaboration • (Get resources for free! ) • 2 subsystems
• MC (Monte Carlo) • Analysis
• Start with MC first!!
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 7
Development – MC Workflow
I need 1.5 million hadronic showers with Energy E, Direction (theta, phi), ... As background sample for observation of „Crab nebula“
Run Magic Run Magic MonteCarlo Run Magic MonteCarlo Simulation and MonteCarlo Simulation and register Run Magic output data MonteCarlo Monte Simulation and register Run Magic output data Carlo Simulation Simulation data register outputand (MMCS) and register register output data output data
EnablingGrids for E-sciencE Merge the shower simulation and the StarLight simulation and produce a MonteCarlo data sample
Simulate the Starlight Background for a given position in the sky and register output data
Simulate the Telescope Geometry with the reflector program for all interesting MMCS files and register output data
MAGIC-GRID Status report
Simulate the response of the MAGIC camera for all interesting reflector files and register output data
H. Kornmayer
EGAAP meeting, Athens, 21th April 2005 - 8
Implementation
EnablingGrids for E-sciencE
3 main components: • meta data base • bookkeeping of the requests, their jobs and the data • Requestor • user define the parameters by inserting the request to the meta data base • Executor • creates Grid jobs by checking the metadatabase frequently (cron) and generating the input files
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 9
Grid added value
expectations vs. reality : • Collaboration (-)
•
•
EnablingGrids for E-sciencE
Complex software, limited # of OS, limited # of batch systems make the integration of new sites of MAGIC collaborators difficult The final integration of a cluster (SUSE, SGE batch system, AFS, firewall) took too long (9 months)
The reliable infrastructure and the good support from many sites made that possible! Many thanks to sk, bg, pl, uk, gr, it, es, de, … Service offered was overall good
• with problems when new releases appeared (every time! :–( ) • with problems to have a sustainable configuration (for VO, replica service, …)
• Speed up of MC production (+)
• •
•
Central services run by EGEE were stable!
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 10
Grid added value II
expectations vs. reality II: • Persistent storage (+)
•
EnablingGrids for E-sciencE
of Monte Carlo data
• Some problems during the first runs
– (Too many small files on a tape system is equal to /dev/null). We learnt that lesson!
•
of observation data
• Started the automated production data transfer of real observation data from LaPalma to PIC, Barcelona in november 2001 • 3.2 TB of real data are available on the Grid now
• Improvements of data availability (?)
• • •
Replica mechanisms needs to be tested! Measurements needed in the future! Ongoing work!
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 11
Grid added value III
expectations vs. reality III: • Cost reduction (-)
•
EnablingGrids for E-sciencE
additional implementations were necessary (-)
• MAGIC implemented its own prototype meta data base system – to monitor the status of many jobs of a mass production – to check the “status” of a job! ( later) • MAGIC implemented its own rudimentary workflow system – Nothing was available at the beginning
•
GGUS reduced the costs definitely (+)
• MAGIC Grid participants appreciated the support structure of the GGUS portal
•
Every new middleware release forced (-)
• a downtime of the system • customization of the system
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 12
Data challenges
Past experience - Three MMCS data challenges:
-
EnablingGrids for E-sciencE
Last data challenge - December - today - Successful: 13500
-
Mar/Apr 2005: 10% failure July 2005: 3.9%failure Sept 2005: 3.4% failure
Success (Data available) Mmcs output registered in the Grid Done (Failed): 249 Done (Success): 2830 Scheduled: 86 Submitted: 9 Aborted: 930 Waiting: 473
-
Improvements:
- Underlying Middleware
- FAILED: 4567 ()
-
- Operation of Services
-
- Many lessons learnt
-
Data management Additional checks
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 13
Useless status of jobs
• Data storage site is selected by the JDL
•
EnablingGrids for E-sciencE
….OutputData = { [OutputFile="data/cer012345" ; LogicalFileName="lfn:mmcs_cer012345\“; StorageElement=" castorgrid.pic.es“; ], …..
• The WMS should register the file automatically on the Grid! • BUT: If the job fails
•
(RLS service down, SE not available, ...)
• the WMS mention the STATUS as “Done (Successful)” „DONE (Successful)“ has NO meaning for the output of data specified in the JDL! • A more sophisticated system is necessary for a production system!
• •
We developed it for out own! (As every VO?) Can we get a WMS that takes data output into account?
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 14
Missing VO support in WMS
• Mass production is managed by one member of the VO
•
EnablingGrids for E-sciencE
VO production manager
• No need to be a Grid expert!
• Every job is assigned to him exclusively!
•
edg-job-submit -- vo magic mmcs_012345.jdl
• NO other member of the VO can get information
•
about the status of the job
• edg-job-status https://theUniqueIdentifierOfTheJob
•
about the stdout/stderr of the job
• edg-job-get-output https://theUniqueIdentifierOfTheJob
• The basic commands MUST have more VO support!
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 15
Meta data base
Grid!
EnablingGrids for E-sciencE
• The output data files should stored and registered on the
• But the files are only useful if “content describing” information
can be attached to the files!
• •
“From Storage to knowledge!” “from Grid to e-Science”
• We implemented a “separate” meta data base that links this information and the file URI One extensible framework for replica and meta data services would be nice!
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 16
Workflows
workflow
•
EnablingGrids for E-sciencE
• The MAGIC Monte Carlo system is a good example for a scientific
1000 jobs can be started in parallel (embarrassingly!)
• MAGIC looked for a middleware tool which support workflows
• •
Using standard workflow description Support for self recovery of failed jobs
• 3% of jobs “fail” 30 out of 1000 • Without this feature NO workflow will succeed!
• There are tools around
• •
but we need something like a “best practise guide” for one tool! We don’t want to program it by our own on top of the meta data base!
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 17
Experience – reliability
2005: Three different data challenges • March/april • 10,4% successful jobs • July • 3,8 % successfull jobs • September • 3.1 % successful jobs EGEE infrastructure became more reliable!
EnablingGrids for E-sciencE
Mass production: • Started in December after a training of users at FZK
failed/all
1
0,8 0,7 0,6 0,5
0,3 0,2 0,1 0
New year
0,4
Christmas in Spain
LCG 2,7
0,9
week
• There is always a reason for failure! • Deployment is a challenge too!
H. Kornmayer MAGIC-GRID Status report EGAAP meeting, Athens, 21th April 2005 - 18
EGEE – MAGIC Grid
• MAGIC Grid is reality
EnablingGrids for E-sciencE
• Production of MC using MAGIC Grid resources started in december!
– We plan to ask (temporarily) for more CPUs for stress testing!
• MAGIC collaboration will put their real data on the Grid • The challenges for computing will increase with the second telescope
H. Kornmayer
MAGIC-GRID Status report
EGAAP meeting, Athens, 21th April 2005 - 19
MAGIC Grid – future prospects
• MAGIC is a good example
• • •
EnablingGrids for E-sciencE
to do e-Science to use the e-Infrastructure to exploit Grid-Technology
• What is about a „GRID“ of different VHE gamma ray observatories?
„Towards a virtual observatory for VHE g-rays“
MAGIC/EU
Veritas/US HESS/EU/Africa
H. Kornmayer MAGIC-GRID Status report
Kangaroo - AUS/JP
EGAAP meeting, Athens, 21th April 2005 - 20