Enabling Grids for E-sciencE
EGEE and the future of Grid Infrastructures
International Symposium on Grid Computing 2007 Academia Sinica, Taipei 26-29 March 2007 Bob Jones EGEE-II Project Director CERN
www.eu-egee.org
INFSO-RI-508833
eScience
Enabling Grids for E-sciencE
• Science is becoming increasingly digital, needs to deal with increasing amounts of data and computational needs • Simulations get ever more detailed
– Nanotechnology – design of new materials from the molecular scale – Modelling and predicting complex systems (weather forecasting, river floods, earthquake) – Decoding the human genome
• Experimental Science uses ever more sophisticated sensors to make precise measurements
Need high statistics Huge amounts of data Serves user communities around the world
INFSO-RI-508833 ISGC2007 2
High Energy Physics
Enabling Grids for E-sciencE
Large Hadron Collider (LHC): • One of the most powerful instruments ever built to investigate matter • 40 Million Particle collisions per second • 4 Experiments: ALICE, ATLAS, CMS, LHCb • ~15 PetaBytes/year from the 4 experiments • First beams in 2007
HEP track today
Mont Blanc (4810 m)
Downtown Geneva
INFSO-RI-508833
ISGC2007
3
In silico drug discovery
Enabling Grids for E-sciencE
• Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons • Grids open new perspectives to in silico drug discovery
– Reduced cost and adding an accelerating factor in the search for new drugs
International collaboration is required for:
• Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines
•Avian influenza: •bird casualties
presentation by Ying-Ta WU & Hurng-Chun LEE in life sciences track on Wednesday
INFSO-RI-508833 ISGC2007 4
WISDOM
Enabling Grids for E-sciencE
http://wisdom.healthgrid.org/
Mini Workshop on Thursday
INFSO-RI-508833
ISGC2007
5
Medical image processing: analysing tumours
Enabling Grids for E-sciencE
• Pharmacokinetics: contrast agent diffusion study
– co-registration of a time series of volumetric medical images to analyse the evolution of the diffusion of contrast agents
• Computational Costs
– 20 Patients: 2623 hours (Co-registration + Parametric Image) – Using a 20-processor Computing Farm: 146 hours – Using the Grid: <20 hours
Sequential
HPC
Grid
INFSO-RI-508833
If you have enough resources 20x12=240 computers, EGEE has >30,000
ISGC2007 6
Enabling Grids for E-sciencE
Example: Determining earthquake mechanisms
• Seismic software application determines epicentre, magnitude, mechanism • Analysis of Indonesian earthquake (28 March 2005)
– Seismic data within 12 hours after the earthquake – Analysis performed within 30 hours after earthquake occurred – Results
Not an aftershock of December 2004 earthquake Different location (different part of fault line further south) Different mechanism
Rapid analysis of earthquakes important for relief efforts
Earth Science & Astronomy track today
Peru, June 23, 2001 Mw=8.4
Sumatra, March 28, 2005 Mw=8.5
INFSO-RI-508833
ISGC2007
7
Bioinformatics
Enabling Grids for E-sciencE
GPS@: bioinformatics portal
– http://gpsa.ibcp.fr/ web portal – Access up-to-date sequence and 3D-structure databanks (EMBL, GenBank, SWISSPROT etc.) – Tens of bioinformatics legacy code
• Convenient easy-to-use interface with access to well-known databanks • Uses grid resources to analyse the sequences
INFSO-RI-508833 ISGC2007 8
Data, Data, Data
Enabling Grids for E-sciencE
Slide by Carole Gobel
INFSO-RI-508833
ISGC2007
9
Main trend
Enabling Grids for E-sciencE
The size of data an organization owns, manages, and depends on is dramatically increasing:
– – – – Ownership cost of storage capacity goes down Data generated and consumed goes up Network capacity goes up Distributed computing technology matures and is more widely adopted
INFSO-RI-508833
ISGC2007
10
How e-Infrastructrures help e-Science
Enabling Grids for E-sciencE
•
e-Infrastructures provide easier access for
– Small research groups – Scientists from many different fields – Remote and still developing countries
•
To new technologies
– Produce and store massive amounts of data – Transparent access to millions of files across different administrative domains – Low cost access to resources
Mobilise large amounts of CPU & storage on short notice (PC clusters)
– High-end facilities (supercomputers)
•
And help to find new ways to collaborate
– Develops applications using distributed complex workflows – Eases distributed collaborations – Provides new ways of community building – Gives easier access to higher education
INFSO-RI-508833
ISGC2007
11
EGEE
Enabling Grids for E-sciencE
Flagship grid infrastructure project co-funded by the European Commission Now in 2nd phase with 91 partners in 32 countries
Objectives
• Large-scale, production-quality grid infrastructure for e-Science • Attracting new resources and users from industry as well as science • Maintain and further improve gLite Grid middleware
INFSO-RI-508833
ISGC2007
12
Applications on EGEE
Enabling Grids for E-sciencE
• Multitude of applications from a growing number of domains
– – – – – – – – – – – Astrophysics Computational Chemistry Earth Sciences Keynote by Financial Simulation Luigi Fusco Wednesday Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …..
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
INFSO-RI-508833 ISGC2007 13
Production Usage Status
Enabling Grids for E-sciencE
250 200
No. Sites
150 100 50 0
~17.5 million jobs run (6450 cpu-years) in 2006; Workloads of the “not HEP VOs” is now significant – approaching 810K jobs per day; and 1000 cpu-months/month • one year ago this was the overall scale of work for all VOs
40000 35000 30000
No. CPU
25000 20000 15000 10000 5000 0
INFSO-RI-508833
04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 4 Fe b05 Ap r05 Ju n0 Au 5 g05 O ct -0 5 D ec -0 5 Fe b06 Ap r06 Ju n0 Au 6 g06 O ct -0 6 D ec -0 6
Grid operations & management track on Thursday
ISGC2007 14
Ap r-
04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 4 Fe b05 Ap r05 Ju n05 Au g05 O ct -0 5 D ec -0 5 Fe b06 Ap r06 Ju n0 Au 6 g06 O ct -0 6 D ec -0 6
Ap r-
EGEE Middleware Distribution
Enabling Grids for E-sciencE
• gLite
– Exploit experience and existing components from VDT (Condor, Globus), EDG/LCG, and others
– Develop a lightweight stack of generic middleware useful to EGEE applications (HEP and Life Sciences are pilot applications)
Pluggable components – cater for different implementations Follow SOA approach, WS-I compliant where possible
– Focus is on re-engineering and hardening – Business friendly open source license
Moving to Apache-2 Tutorial held yesterday
INFSO-RI-508833 ISGC2007 15
Grid Middleware
Enabling Grids for E-sciencE
Applications
Higher-Level Grid Services
Workload Management Replica Management Visualization Workflow Grid Economies ...
• Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware will be deployed on the EGEE infrastructure
– Must be complete and robust – Should allow interoperation with other major grid infrastructures – Should not assume the use of Higher-Level Grid Services
ISGC2007 16
Foundation Grid Middleware
Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring
INFSO-RI-508833
gLite Grid Middleware Services
Enabling Grids for E-sciencE
Access
CLI API
Security
Authorization Auditing Authentication
Information & Monitoring
Information & Monitoring Application Monitoring
Data Management
Metadata Catalog Storage Element File & Replica Catalog Data Movement Accounting
Workload Management
Job Provenance Computing Element Package Manager Workload Management
Site Proxy
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
INFSO-RI-508833 ISGC2007 17
Grid of Grids - from Local to Global National
Campus
ISGC2007
Community
18
OSG sites
Keynote by Ruth Pordes on Wednesday
ISGC2007
19
32 Virtual Organizations participating Groups
3 with >1000 jobs max. (all particle physics) 3 with 500-1000 max. (all outside physics) 5 with 100-500 max (particle, nuclear, and astro physics)
ISGC2007
20
The DEISA supercomputing environment
(21.900 processors and 145 Tf in 2006, more than 190 Tf in 2007)
•
IBM AIX Super-cluster – FZJ-Julich, 1312 processors, 8,9 teraflops peak
–
– – – – – • • • •
RZG – Garching, 748 processors, 3,8 teraflops peak
IDRIS, 1024 processors, 6.7 teraflops peak CINECA, 512 processors, 2,6 teraflops peak CSC, 512 processors, 2,6 teraflops peak ECMWF, 2 systems of 2276 processors each, 33 teraflops peak HPCx, 1600 processors, 12 teraflops peak
BSC, IBM PowerPC Linux system (MareNostrum) 4864 processeurs, 40 teraflops peak SARA, SGI ALTIX Linux system, 416 processors, 2,2 teraflops peak LRZ, Linux cluster (2.7 teraflops) moving to SGI ALTIX system (5120 processors and 33 teraflops peak in 2006, 70 teraflops peak in 2007) HLRS, NEC SX8 vector system, 646 processors, 12,7 teraflops peak.
•
Systems interconnected with dedicated 1Gb/s network – currently upgrading to 10 Gb/s – provided by GEANT and NRENs
ISGC2007V. Alessandrini IDRIS-CNRS
March 2007EGEE Workshop on Management of Rights in Production Grids Paris, June 19th, 2006
21
National Research Grid Infrastructure (NAREGI) 2003-2007
• Petascale Grid Infrastructure R&D for Future Deployment
– $45 mil (US) + $16 mil x 5 (2003-2007) = $125 mil total – Hosted by National Institute of Informatics (NII) and Institute of Molecular Science (IMS) Keynote by Satoshi Matsuoka on Thursday – PL: Ken Miura (FujitsuNII)
• Sekiguchi(AIST), Matsuoka(Titech), Shimojo(Osaka-U), Aoyagi (Kyushu-U)…
– Participation by multiple (>= 3) vendors, Fujitsu, NEC, Hitachi, NTT, etc. – Follow and contribute to GGF Standardization, esp. OGSA Focused “Grand Challenge” Grid Apps Areas
Nanotech Grid Apps “NanoGrid” IMS ~10TF
(Biotech Grid Apps) (BioGrid RIKEN)
(Other Apps) Other Inst.
NEC
Osaka-U
Titech AIST
Fujitsu
U-Tokyo U-Kyushu
Grid and Network Management Grid Middleware
March 2007
National Research Grid Middleware R&D
Grid R&D Infrastr. 15 TF-100TF
ISGC2007 SuperSINET
22 Hitachi
Interoperability
Enabling Grids for E-sciencE
• Interoperability between e-Infrastructures is essential to provide services to global user communities • “Grid-Interoperability-Now” group within the OpenGridForum is providing a good environment for practical developments • Experience shows this work is most successful when it is driven by the needs of user communities
INFSO-RI-508833 ISGC2007 23
G IN
Middleware & interoperability track on Wednesday & Thursday
Collaborating e-Infrastructures
Enabling Grids for E-sciencE
TWGRID
Potential for linking ~80 countries
INFSO-RI-508833 ISGC2007 24
Middleware Standards
Enabling Grids for E-sciencE
Slide by Dave Snelling
INFSO-RI-508833
ISGC2007
25
Middleware Concepts
Enabling Grids for E-sciencE
Slide by Dave Snelling
INFSO-RI-508833
ISGC2007
26
Enabling Grids for E-sciencE
Co-located with OGF 20
www.eu-egee.org
INFSO-RI-508833
The Future of Grids
Enabling Grids for E-sciencE
•
Increasing the number of infrastructure users by increasing awareness
– Dissemination and outreach Education track on Wednesday – Training and education – Grids offer new opportunities for collaborative work
•
Increasing the number of applications by improving application support and middleware functionality
– Increase stability, scalability, and usability
Major efforts needed particularly on VO management, security infrastructure, data management, and job management
– High level grid middleware extensions
•
Increasing the grid infrastructure
– Increase manageability of Grid services – Reducing the cost of operation – Ensuring interoperability between infrastructures
•
Protecting user investments
– Better involvement of industry – Move towards a sustainable grid infrastructure
Industry & Government track today
INFSO-RI-508833
ISGC2007
28
Sustainability: Beyond EGEE-II
Enabling Grids for E-sciencE
• Need to prepare for permanent Grid infrastructure
– Ensure a reliable and adaptive support for all sciences – Independent of short project funding cycles – Infrastructure managed in collaboration with national grid initiatives
Presentation by Dieter Kranzlmueller today
INFSO-RI-508833 ISGC2007 29
EGEE’07 Conference
Enabling Grids for E-sciencE
Building Bridges… • Between Science and business • Between users and infrastructures • Between countries • Between scientific disciplines • Between projects • Etc
http://www.eu-egee.org/egee07
INFSO-RI-508833 ISGC2007 30
Summary
Enabling Grids for E-sciencE
• Grids are all about sharing – they are a means of working with groups around the world
– Today we have a window of opportunity to move grids from research prototypes to permanent production systems (as networks did a few years ago)
• Interoperability is key to providing the level of support required for our user communities • EGEE operates the world’s largest multi-disciplinary grid infrastructure for scientific research
– In constant and significant production use
• Need to prepare the long-term
– EGEE, collaborating projects, national grid initiatives and user communities are working to define a model for a sustainable grid infrastructure that is independent of short project cycles
www.eu-egee.org
INFSO-RI-508833 ISGC2007 31