Docstoc

The Grid The Future of HEP Computing

Document Sample
The Grid The Future of HEP Computing Powered By Docstoc
					HEP, Grids and the Networks
   They Depend Upon

 Shawn McKee (smckee@umich.edu)
        January 29, 2004
       University of Hawaii
                   Acknowledgements
• Disclaimer: This talk will be an overview from a
  physicist who is a grid/network user, rather than a
  computer scientist who is a grid/network expert!
• Much of this talk was constructed from various
  sources. I would like to thank:
   –   Rob Gardner (U Chicago)     – Sylvain Ravot (Caltech)
   –   Harvey Newman (Caltech)     – Charles Severance (Michigan)
                                   – Les Cottrell (SLAC)
   –   Ian Foster (U Chicago/ANL) – The Globus Team
   –   Jennifer Schopf (GriPhyN/ANL)


       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   2
                               Outline
• The Grid: Overview & Definitions
    – Example Grid Uses
• HEP Motivations for the Grid
    – LHC Experiments and their scope
    – ATLAS as an example
•   Grid/Network Efforts at Michigan
•   Pending proposals
•   Grid Software
•   Future and Conclusions



      January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   3
              What is ―The Grid‖?

• There are many answers and interpretations
• The term was originally coined in the mid-
  1990’s (in analogy with the power grid) and
  can be described thusly:
    ―The grid provides flexible, secure, coordinated
    resource sharing among dynamic collections of
    individuals, institutions and resources (virtual
    organizations:VOs)‖


   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   4
                  Grid Perspectives
• Users Viewpoint:
  – A virtual computer which minimizes time to
    completion for my application while transparently
    managing access to inputs and resources
• Programmers Viewpoint:
  – A toolkit of applications and API’s which provide
    transparent access to distributed resources
• Administrators Viewpoint:
  – An environment to monitor, manage and secure access
    to geographically distributed computers, storage and
    networks.


   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   5
                     The Grid Vision
―Resource sharing & coordinated problem
solving in dynamic, multi-institutional
virtual organizations‖
 – On-demand, ubiquitous access to computing,
   data, and services
 – New capabilities constructed dynamically and
   transparently from distributed services
 “When the network is as fast as the computer's
  internal links, the machine disintegrates across
 the net into a set of special purpose appliances”
                         (George Gilder)
  January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   6
                 The Grid Problem
• Flexible, secure, coordinated resource sharing
  among dynamic collections of individuals,
  institutions, and resources
   From ―The Anatomy of the Grid: Enabling Scalable Virtual Organizations‖

• Enable communities (―virtual organizations‖)
  to share geographically distributed resources as
  they pursue common goals -- assuming the
  absence of…
   –   central location,
   –   central control,
   –   omniscience,
   –   existing trust relationships.
  January 29, 2004       U Hawaii - Shawn McKee - University of Michigan Physics   7
      Elements of the Problem
• Resource sharing
   – Computers, storage, sensors, networks, …
   – Sharing always conditional: issues of trust, policy,
     negotiation, payment, …
• Coordinated problem solving
   – Beyond client-server: distributed data analysis,
     computation, collaboration, …
• Dynamic, multi-institutional virtual orgs
   – Community overlays on classic org structures
   – Large or small, static or dynamic

 January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   8
Example Grid Projects
                                                                     A Brain
                                                                     is a Lot
                                                                     of Data!
                                                         (Mark Ellisman, UCSD)




 And comparisons must be
    made among many

We need to get to one micron to know location of every cell. We’re just now
   starting to get to 10 microns – Grids will help get us there and further

          January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   10
        Mathematicians Solve NUG30
• Looking for the solution to the
  NUG30 quadratic assignment
  problem
           n n
  min
 p     aijbp(i)p(j)
         i=1 j=1
• An informal collaboration of
  mathematicians and computer                          14,5,28,24,1,3,16,15,
  scientists                                           10,9,21,2,4,29,25,22,
• Condor-G delivered 3.46E8 CPU                        13,26,17,30,6,20,19,
  seconds in 7 days (peak 1009
  processors) in U.S. and Italy (8 sites)              8,18,7,27,12,11,23
        MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
          January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   11
         Network for Earthquake
         Engineering Simulation
 • NEESgrid: national
   infrastructure to couple
   earthquake engineers with
   experimental facilities,
   databases, computers, &
   each other
 • On-demand access to
   experiments, data streams,
   computing, archives,
   collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
     January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   12
              Grids at NASA: Aviation Safety
                                           Wing Models

                                                             •Lift Capabilities
                                                             •Drag Capabilities             Stabilizer Models
                                                             •Responsiveness
                      Airframe Models




                                                                                                •Deflection capabilities
                                                                                                •Responsiveness
Crew Capabilities
- accuracy
- perception
- stamina
- re-action times
- SOPs
                                                                                  Engine Models




 Human Models                                   •Braking performance
                                                •Steering capabilities
                                                                                    •Thrust performance
                                                •Traction
                                                                                    •Reverse Thrust performance
                                                •Dampening capabilities
                                                                                    •Responsiveness
                                                                                    •Fuel Consumption
                                  Landing Gear Models



                    January 29, 2004     U Hawaii - Shawn McKee - University of Michigan Physics           13
  Grids and Industry: Early Examples
Entropia: Distributed computing
     (BMS, Novartis, …)
                                                       Butterfly.net: Grid
                                                        for multi-player
                                                             games




       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   14
      Data Grids for High Energy Physics
                                                                      CERN/Outside Resource Ratio ~1:2
                                ~PByte/sec                            Tier0/( Tier1)/( Tier2)   ~1:1:1
                                                                      ~100-400 MBytes/sec
                                        Online System
                                                                                  Offline Farm,
                                                         Tier 0 +1              CERN Computer Ctr
                                                                                    ~25 TIPS
              10+ Gbits/sec
 Tier 1
         France                              UK                         Italy                BNL Center



                                                  Tier 2                                Tier2    Tier2  Tier2
                                                                                 Tier2 Center Center Center Center
                                                                      Tier2 Center
                  ~10+ Gbps
   Tier 3                                                                 Physicists work on analysis
                       Institute Institute
                      ~0.25TIPS
                                             Institute    Institute
                                                                          “channels”
Physics data cache                           100 - 10000
                                              Mbits/sec                   Each institute has ~10 physicists
                                                   Tier 4                 working on one or more channels
            Workstations
ATLAS version from Harvey Newman’s original
                 January 29, 2004      U Hawaii - Shawn McKee - University of Michigan Physics               15
                    Broader Context
• ―Grid Computing‖ has much in common with
  major industrial thrusts
   – Business-to-business, Peer-to-peer, Application
     Service Providers, Storage Service Providers,
     Distributed Computing, Internet Computing…
• Sharing issues not adequately addressed by
  existing technologies
   – Complicated requirements: ―run program X at site Y
     subject to community policy P, providing access to
     data at Z according to policy Q‖
   – High performance: unique demands of advanced &
     high-performance systems
 January 29, 2004     U Hawaii - Shawn McKee - University of Michigan Physics   16
                      Why Now?
• Moore’s law improvements in computing
  produce highly functional end systems
• The Internet and burgeoning wired and
  wireless provide universal connectivity
• Changing modes of working and problem
  solving emphasize teamwork, computation
• Network exponentials produce dramatic
  changes in geometry and geography

   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   17
     Living in an Exponential World
        (1) Computing & Sensors
Moore’s Law: transistor count doubles each 18 months




   Magnetohydro-
      dynamics
   star formation


        January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   18
 Living in an Exponential World:
            (2) Storage
• Storage density doubles every 12 months
• Dramatic growth in online data (1 petabyte = 1000
  terabyte = 1,000,000 gigabyte)
   –   2000        ~0.5 petabyte
   –   2005        ~10 petabytes
   –   2010        ~100 petabytes
   –   2015        ~1000 petabytes?
• Transforming entire disciplines in physical and,
  increasingly, biological sciences; humanities next?


    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   19
                    Network Exponentials
           • Network vs. computer performance
                – Computer speed doubles every 18 months
                – Network speed doubles every 9 months
                – Difference = order of magnitude per 5 years
           • 1986 to 2000
                – Computers: x 500
                – Networks: x 340,000
           • 2001 to 2010
                – Computers: x 60
                – Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-
2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
               January 29, 2004      U Hawaii - Shawn McKee - University of Michigan Physics 20
                          The Network
• As can be seen in the previous transparency, it can be
  argued it is the evolution of the network which has been
  the primary motivator for the Grid.
• Ubiquitous, dependable worldwide networks have
  opened up the possibility of tying together
  geographically distributed resources
• The success of the WWW for sharing information has
  spawned a push for a system to share resources
• The network has become the ―virtual bus‖ of a virtual
  computer.
• More on this later…
       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   21
  Motivation for the Grid
(and therefore the network)
     A HEP Perspective
                   The Problem




January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   23
                   The Solution




January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   24
      Four LHC Experiments: The
     Petabyte to Exabyte Challenge
   ATLAS, CMS, ALICE, LHCB
Higgs + New particles; Quark-Gluon Plasma; CP Violation




 Data stored           ~40 Petabytes/Year and UP;
 CPU                       0.30 Petaflops and UP
  0.1 to       1     Exabytes (1 EB = 1018 Bytes)
  (2007)    (~2012 ?) for the LHC Experiments
     January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   25
          How Much Data is Involved?
                              High Level-1 Trigger                    High No. Channels
 Level 1 Rate                 (1 MHz)                                 High Bandwidth
    (Hz)      106                                                     (500 Gbit/s)
                                LHCB
                                                   ATLAS
                                                   CMS
                    105
                                      HERA-B

                              KLOE
                    104                             TeV II
Hans Hoffman                                                         High Data Archive
DOE/NSF                                          CDF/D0              (PetaByte)
Review, Nov 00      103               H1
                                     ZEUS                    ALICE

                                     UA1            NA49
                    102
                          104              105            106             107
                                     LEP
                                                                     Event Size (bytes)
           January 29, 2004      U Hawaii - Shawn McKee - University of Michigan Physics   26
                                 ATLAS
• A Torroidal LHC Apparatus
• Collaboration
  – 150 institutes
  – 1850 physicists
• Detector
  –   Inner tracker
  –   Calorimeter
  –   Magnet
  –   Muon
• United States ATLAS
  – 29 universities, 3 national labs
  – 20% of ATLAS

        January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   27
January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   28
            Discovery Potential for SM
                  Higgs Boson
• Good sensitivity over the
full mass range from ~100
GeV to ~ 1 TeV


                                   S
                                    B

• For most of the mass range
at least two channels available

• Detector performance is
crucial: b-tag, leptons, g, E
resolution, g / jet separation, ...
           January 29, 2004     U Hawaii - Shawn McKee - University of Michigan Physics   29
         H  gg




                      ATLAS




H  ZZ *  e  e   
   January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   30
     Data Flow from ATLAS




                                                     ATLAS: 9 PB/y
                                                       ~ one million PC hard drives!

January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   31
Data Flow Analysis by V. Lindenstruth




   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   32
            Data Intensive
          Computing and Grids
• The term ―Data Grid‖ is often used
   – Unfortunate as it implies a distinct infrastructure,
     which it isn’t; but easy to say
• Data-intensive computing shares numerous
  requirements with collaboration,
  instrumentation, computation, …
   – Security, resource mgt, info services, etc.
• Important to exploit commonalities as very
  unlikely that multiple infrastructures can be
  maintained
• Fortunately this seems easy to do!

 January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   33
 Data Intensive Issues Include …
• Harness [potentially large numbers of] data,
  storage, network resources located in distinct
  administrative domains
• Respect local and global policies governing what
  can be used for what
• Schedule resources efficiently, again subject to
  local and global constraints
• Achieve high performance, with respect to both
  speed and reliability
• Catalog software and virtual data

    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   34
         Examples of
Desired Data Grid Functionality
•   High-speed, reliable access to remote data
•   Automated discovery of ―best‖ copy of data
•   Manage replication to improve performance
•   Co-schedule compute, storage, network
•   ―Transparency‖ wrt delivered performance
•   Enforce access control on data
•   Allow representation of ―global‖ resource
    allocation policies


    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   35
                 HEP Data Analysis

• Raw data
   – hits, pulse heights
• Reconstructed data (ESD)
   – tracks, clusters…
• Analysis Objects (AOD)
   – Physics Objects
   – Summarized
   – Organized by physics topic
• Ntuples, histograms,
  statistical data

    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   36
                                         Production
                                          Analysis                               Physics Models
                      Trigger System

 Run Conditions                                                             Monte Carlo Truth Data
                     Data Acquisition

                                                                              Detector Simulation
                     Level 3 trigger


Calibration Data       Raw data              Trigger Tags                       MC Raw Data


                     Reconstruction                                             Reconstruction


Event Summary Data ESD          Event Tags                     MC Event Summary Data             MC Event Tags

                  coordination required at the collaboration and group levels
            January 29, 2004           U Hawaii - Shawn McKee - University of Michigan Physics        37
   ESD
     ESD
                                     Physics Analysis
       ESD
         ESD
           ESD

                                                      Event Tags                    Tier 0,1
                   Event Selection                                                   Collaboration
                                                                                         wide


                          Analysis                  Calibration Data
                          Objects


                                     Analysis
     Raw Data                        Processing
                                                                                    Tier 2
                                                                                      Analysis
                                                                                       Groups

      PhysicsObjects         PhysicsObjects       PhysicsObjects
        StatObjects            StatObjects         StatObjects
                                                                                    Tier 3, 4
                                                                                      Physicists
                            Physics Analysis
January 29, 2004          U Hawaii - Shawn McKee - University of Michigan Physics      38
A Model Architecture for Data Grids
                     Attribute
  Metadata           Specification                                     Replica
  Catalog                        Application                           Catalog
                                                                       Multiple Locations
    Logical Collection and
                                                  Selected
    Logical File Name
                                                  Replica          Replica                MDS
                                                                   Selection
                                                                          Performance
           GridFTP Control Channel                                        Information &
                                                                          Predictions
                                                                                          NWS


                          GridFTP           Disk Cache
                          Data
                          Channel        Tape Library
    Disk Array                                                           Disk Cache
 Replica Location 1                     Replica Location 2           Replica Location 3

       January 29, 2004      U Hawaii - Shawn McKee - University of Michigan Physics   39
Who is working on the Grid?

       HEP Perspective
           HENP Related Data Grid
                 Projects
Funded Projects
  –   PPDG I              USA      DOE            $ 2M                        1999-2001
  –   GriPhyN             USA      NSF            $ 11.9M + $1.6M             2000-2005
  –   EU DataGrid         EU       EC             € 10M                       2001-2004
  –   PPDG II (CP)        USA      DOE            $ 9.5M                      2001-2004
  –   iVDGL               USA      NSF            $ 13.7M + $2M               2001-2006
  –   DataTAG             EU       EC             € 4M                        2002-2004
  –   GridPP              UK       PPARC          >$15M?                      2001-2004
Many national projects of interest to HENP
  – Initiatives in US, UK, Italy, France, NL, Germany, Japan, …
  – EU networking initiatives (Géant, SURFNet)
  – US Distributed Terascale Facility:
    ($53M, 12 TFL, 40 Gb/s network)
  – ETF (Extended Terascale Facility):
    ($20M, ? TFL, 10 Gb/s network)
       January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   41
Grid Physics Network (GriPhyN)
     Enabling R&D for advanced data grid systems,
     focusing in particular on Virtual Data concept

                                         Production Team
 Individual Investigator                                     Other Users

                                 Interactive User Tools



                                 Request Planning and             Request Execution
 Virtual Data Tools
                                   Scheduling Tools               Management Tools

               Resource                           Security and                   Other Grid
                Resource                           Security and                   Other Grid
              Management                            Policy                        Services
               Management                             Policy                       Services
               Services                            Services
                Services                            Services

 ATLAS
 CMS                                                          Transforms

 LIGO                      Raw data
                           source
                                                      Distributed resources
                                                      (code, storage,
 SDSS                                                 computers, and network)



  January 29, 2004             U Hawaii - Shawn McKee - University of Michigan Physics         42
     Virtual Data in
         Action
• Data request may
    –   Compute locally                                  Major facilities, archives
    –   Compute remotely
    –   Access local data
    –   Access remote data
• Scheduling based on
    – Local policies                                        Regional facilities, caches
    – Global policies
    – Cost
•   More on this later
                               Fetch item
                                                           Local facilities, caches

          January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   43
  iVDGL: A Global Grid Laboratory
 “We propose to create, operate and evaluate, over a
 sustained period of time, an international research
 laboratory for data-intensive science.”
                             From NSF proposal, 2001
• International Virtual-Data Grid Laboratory
   – A global Grid laboratory (US, Europe, Asia, South America,
     …)
   – A place to conduct Data Grid tests ―at scale‖
   – A mechanism to create common Grid infrastructure
   – A laboratory for other disciplines to perform Data Grid tests
   – A focus of outreach efforts to small institutions
• U.S. part funded by NSF (2001-2006)
   – $13.7M (NSF) + $2M (matching)
      January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   44
        Transatlantic Interoperability

• iVDGL-2 Milestone (November 02)

 iGOC
                                                           Outreach
                            iVDGL-2                                                        DataTAG
                            Nov 2002
      ATLAS
                                                       SDSS/NVO

                   CMS                    LIGO
ANL
                            CS Research                           UC
BNL
                                                              FNAL
BU                                               CIT                                             CERN
                 CIT                                           JHU
HU                                               PSU                                             INFN
                UCSD
IU                             ANL               UTB                                           UK PPARC
                 UF
LBL                             UC               UWM                                            U of A
                FNAL
UM                             UCB
OU                              IU
UTA                             ISI
                               NU
                               UW



         January 29, 2004        U Hawaii - Shawn McKee - University of Michigan Physics       45
     TeraGrid: NCSA, ANL, SDSC, Caltech




www.teragrid.org 29, 2004
          January           U Hawaii - Shawn McKee - University of Michigan Physics   46
                Initial TeraGrid Design
 384 McKinley Processors (1.5 Teraflops, 96 nodes)              384 McKinley Processors (1.5 Teraflops, 96 nodes)
               125 TB RAID storage                                            125 TB RAID storage




   Caltech                                                                                           ANL



                                                      DWDM
                                                      Optical
                                                       Mesh




SDSC                                                                                                    NCSA




   768 McKinley Processors (3 Teraflops, 192 nodes)               2024 McKinley Processors (8 Teraflops, 512 nodes)
                 250 TB RAID storage                                            250 TB RAID storage
       January 29, 2004              U Hawaii - Shawn McKee - University of Michigan Physics                        47
             NSFNET 56 Kb/s Site
                Architecture
                               Bandwidth in terms of burst data transfer
                               and user wait time.




                                      VAX

                                                                                    Fuzzball

             Across the room                          Across the country


              256 s (4 min)       1024 s (17 min) 150,000 s (41 hrs)

1024MB        4 MB/s                   1 MB/s             .007 MB/s

     January 29, 2004     U Hawaii - Shawn McKee - University of Michigan Physics     48
2002 Cluster-WAN Architecture


                                                                        OC-48 Cloud

                                                       OC-12
                                   n x GbE (small n)




                 Across the room                       Across the country


                  2000 s (33 min)                13k s (3.6h)

1 TB              0.5 GB/s                             78 MB/s

   January 29, 2004      U Hawaii - Shawn McKee - University of Michigan Physics   49
        Distributed Terascale Cluster
                                            Interconnect

                                         OC-192
                                     Big Fast
                                   Interconnect


                            n x GbE (large n)




                                      2000 s (33 min)


10 TB                                     5 GB/s                                     10 TB


        January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   50
Grid and Network Related Work at
            Michigan
                       MGRID Center

• Central core of technical staff (new hires – 3 FTE)
• Faculty and staff from participating units
• Executive committee from participating units and
  the provost office
• Collaborative grid research and development with
  technical staff from participating units
• Primary goal: develop and deploy an Institutional
  Grid for the University of Michigan


    January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   52
MGrid Research Project Partners
• College of LS&A (Physics) (www.lsa.umich.edu)
• Center for Information Technology Intergration
  (www.citi.umich.edu)
• Michigan Center for
  BioInformatics(www.ctaalliance.org)
• Visible Human Project (vhp.med.umich.edu)
• Center for Advanced Computing
  (cac.engin.umich.edu)
• Mental Health Research Institute
  (www.med.umich.edu/mhri)
• ITCom (www.itcom.itd.umich.edu)
• School of Information (si.umich.edu)
    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   53
                       MGRID: Goals
• Focused on enabling and extending existing
  projects to deliver a useable, powerful grid
  infrastructure: (Grid filesystem, collaborative tools,
  AAA, user portals, integrated middleware)
• Provide participating units knowledge, support
  and a framework to deploy Grid technologies
• Provide test bench for existing and emerging Grid
  technologies
• See our website, http://www.mgrid.umich.edu
  for more information


    January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   54
          Internet2 HENP Network SIG
                    http://henp.internet2.edu
• To help ensure that the required
    –   National and international network infrastructures
    –   Standardized tools and facilities for high performance and end-to-
        end monitoring and tracking, and
    –   Collaborative systems
• are developed and deployed in a timely manner,
  and used effectively to meet the needs of the US LHC and
  other major HENP Programs, as well as the general needs of
  our scientific community.
• To carry out these developments in a way that is broadly
  applicable across many fields, within and beyond the scientific
  community
•    Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); With
    thanks to R. Gardner and J. Williams (Indiana)
        January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   55
           USATLAS Data Grid Testbed
                                                                  U Michigan       Boston
                                          ESnet,                                   University
           UC Berkeley
                                           Mren                                        NPACI,
           LBNL-NERSC                                                                  Abilene



                                      Argonne
                                      National
                                      Laboratory
Calren Esnet,
Abilene, Nton


                                                                                Brookhaven
                                                                      ESnet     National
University of
New Mexico                           Abilene                                    Laboratory

          University of
           Oklahoma
                                                         Indiana
                     Southern                            University
                     Methodist
                     University          University of
                                         Texas at
                HPSS sites               Arlington

                January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics       56
Michigan Grid Testbed Layout




 January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   57
           Introduction to Grid3
• Grid3 is a coordinated project between US LHC
  experiments (US ATLAS, US CMS), grid projects
  (iVDGL, GriPhyN, PPDG), and computing projects
  (LIGO, SDSS, BTeV)
• Purpose of Grid3 is to build a multi-experiment
  multi-VO grid environment
   − Test the infrastructure and services for production
     and analysis of scientific experiments
   − Provide a platform for technology demonstrators




                                                                Grid3 is supported by the National
                                                                Science Foundation and the
                                                                Department of Energy
    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics           58
                 The Grid3 Project
• Grid3 is running at 28 sites
• The peak processor count is ~2800 CPUs
• There are 6 virtual organizations (VO)
       –   SDSS
       –   ATLAS
       –   iVDGL
       –   USCMS
       –   LIGO (now LIGO Scientific Collaboration, LSC)
       –   BTeV
•     There are currently 11 application
•     Resources are dynamically roll-in/out
•     Applications are dynamically installed
•     Grid3 provides a base for a persistent grid




    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   59
                      NMI Overview

• New NSF Program - Anticipated (by the
  NSF) to be ―the next NSFnet‖
• Initial funding of ~$12 million in 3-year
  awards in 3 categories: System Integrator,
  Service Provider, Application Developer
• Program expected to grow substantially
  over time


   January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   60
             NMI Press Release




January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   61
            Michigan NMI Testbed
Michigan has been selected as an ―unsponsored‖ NMI testbed
  member.
    – Shawn McKee is technical lead, Victor Wong is admin. lead
Goals are to:
• Develop and release a first version of GRIDS and Middleware
  software
• Develop security and directory architectures, mechanisms and best
  practices for campus integration
• Put in place associated support and training mechanisms
• Develop partnership agreements with external groups focused on
  adoption of software
• Put in place a communication and outreach plan
• Develop a community repository of NMI software and best
  practices

     January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   62
  ATLAS Collaboratory Project

     A U-M project to develop and use new
 collaboratory tools in support of training and
      research in global scientific projects

• Homer A. Neal, Jeremy Herr, Shawn
  McKee, Steven Goldfarb (Physics)
• Charles Severance (UM Media Union)
• Giosué Vitaglione (Telecom Italia)


   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   63
                         WLAP Server

• Ann Arbor server: http://www.wlap.org
  – Dell PowerEdge 2500 dual 1 GHz Pentium III
  – 180 GB of RAID5 disk storage
  – Triple hot-swapable power supplies and UPS
  – 100 Mbs network connection
                                                                                          visits
  – Linux w/ Apache server                        UM-WLAP: Visits/Week
                                                                                          visits >

  –  300 Web Presentations
                                                                                          1 min
                                                                                          visits >
                                450                                                       5 min
                                                                       server down

  – Originally a mirror, now    400
                                     GEANT4 Web
                                350 Lectures
                                                                  network down

                                                            Christmas
      the primary server.       300 announced     USA Today article
                                250
  – Includes new Michigan
                                                server down
                                200
                                150
      content.                  100
                                             50
                                              0

                                                 01


                                                 01


                                                 01
                                                  1


                                                  1


                                                  1


                                                  1




                                                  2


                                                  2

                                                  2
                                                /0


                                                /0


                                                /0


                                                /0




                                                /0


                                                /0

                                                /0
                                               7/


                                               7/


                                               7/
                                              17


                                              17


                                              17


                                              17




                                              17


                                              17

                                              17
                                             /1


                                             /1


                                             /1
                                            6/


                                            7/


                                            8/


                                            9/




                                            1/


                                            2/

                                            3/
                                           10


                                           11


      January 29, 2004                     12
                          U Hawaii - Shawn McKee - University of Michigan Physics    64
  Trying to Address Problems
Pending Proposals: GECSR and
           UltraLight
 GEC-SR -- Global Grid Enabled Collaboratory for Scientific Research
• Medium ITR submitted in Feb 2003, not funded (but close).
• Submitted to NSF MPS/PHYS PIF program…program delayed 1
   year…resubmit to ITR again
• Research and Extend the envelope for the human-human
   interactions through the Grid
• PI: Homer Neal (University of Michigan),
   –   Persistent Collaboration: on the desktop, in small and large conference rooms, in halls
       and in Virtual Control Rooms;
   –   A flexible, extensible structure of hierarchical (role-based), openly persistent, and ad
       hoc peer groups, extending the concepts in Virtual Organization management
   –   A "Language of Access" and Control: to people, meetings, casual conversations;
       scheduled and "on demand".
   –   Human-System-Human as well as Human-Human interactions for Collaborative
       Work. to make the ensemble of human-to-human interactions, and joint interactions
       between humans and the system consistent, and tolerable.
   –   Evaluation, Evolution and Optimization of the System using proven, effective,
       iterative evaluation methods
   –   Collaborative agent-based decision support. For data-intensive transactions; system
       and humans decide what to do "together".
          January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   66
                                   GECSR
•   Will build on:
    –    VRVS from Caltech
    –    CHEF CompreHensive collaborative Framework, used by
         NEESGrid, from Michigan
    –    WLAP - recording and web playback of audio, video and
         PowerPoint slides at the University of Michigan.
    –    HEPBook collaborative notebook from Fermilab
    –    Pervasive Collaborative Computing Environment(PCCE), from
         LBNL,
    –    MonaLisa monitoring and information system from Caltech.


•   4 year program of work:: .Delivering New
    Collaborative Capability. ….Augmenting Dynamic
    Collaboration Capability… .Extending Dynamic
    Collaboration.
        January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   67
                                        Native
                                        Portlet

                                        Applet
                                        Portlet
Grid
                                     JAVA Web




                                                           PORTAL
And Other
Services                               Start
                                      Portlet

                                       Desktop
                                       Launch
                                        Portlet

                                       Access
        Application                    Portlet
      Which Supports
       Grid Services
       and/or Clarens




                                                                       Clarens
     January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   68
                     What is UltraLight?
• UltraLight is a program to explore the integration of cutting-edge
  network technology with the grid computing and data infrastructure of
  HEP/Astronomy
• The program intends to explore network configurations from common
  shared infrastructure (current IP networks) thru dedicated optical paths
  point-to-point.
• A critical aspect of UltraLight is its integration with two driving
  application domains in support of their national and international
  eScience collaborations: LHC-HEP and eVLBI-Astronomy
• The Collaboration includes:
    –   Caltech                                ― UC Riverside
    –   Florida Int. Univ.                     ― BNL
    –   MIT                                    ― FNAL
    –   Univ. of Florida                       ― SLAC
    –   Univ. of Michigan
                                               ― UCAID/Internet2


         January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   69
     UltraLight Network: PHASE I

• Implementation
  via ―sharing‖ with
  HOPI/NLR

• MIT not yet
  ―optically‖
  coupled


      January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   70
   UltraLight Network: PHASE III

• Move into
  production
• Optical switching
  fully enabled
  amongst primary
  sites
• Integrated
  international
  infrastructure
      January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   71
Identifying the pieces of the LHC Computing
       Infrastructure - and where they fit




  January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   72
Globus V3/V4…
                          ―Web Services‖
• Increasingly popular standards-based framework for
  accessing network applications
   – W3C standardization
   – Broad industry support
   – Independent of implementation technology
• WSDL: Web Services Description Language
   – Interface Definition Language for Web services
• SOAP: Simple Object Access Protocol
   – XML-based RPC protocol
   – Common WSDL target
• Many more specifications in the pipeline


       January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   78
OGSI Grid Service Specification

• Defines WSDL conventions and GSDL
  extensions
  – For describing and structuring services
  – Working with W3C WSDL working group to drive
    GSDL extensions into WSDL
• Defines fundamental interfaces (using WSDL)
  and behaviors that define a Grid Service
  – A unifying framework for interoperability &
    establishment of total system properties


   January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   79
                               Architecture Framework
Integrated Functionality

                                                                    Applications

                                     System Management
                                                                                            Grid Services
                                           Sevices
      Autonomic Capabilities




                                                                                                                             Professional Services
                                           Open Grid Services Architecture (OGSA)
                                               OGSI – Open Grid Services Infrastructure


                                                                  Web Services

                               OGSA Enabled   OGSA Enabled   OGSA Enabled    OGSA Enabled      OGSA Enabled   OGSA Enabled
                                Security       Workflow       Database       File Systems       Directory      Messaging

                                       OGSA Enabled                    OGSA Enabled                    OGSA Enabled


                                       Servers                           Storage                       Network




      January 29, 2004                                 U Hawaii - Shawn McKee - University of Michigan Physics                                       80
Back to Virtual Data and Grid
         Toolkits…
                Virtual Data Queries
• A query for events implies:
   – Really means asking if a input data sample corresponding to a
     set of calibrations, methods, and perhaps Monte Carlo history
     match a set of criteria
• It is vital to know, for example:
   – What data sets already exist, and in which formats? (ESD,
     AOD,Physics Objects) If not, can it be materialized?
   – Was this data calibrated optimally?
   – If I want to recalibrate a detector, what is required?
• Methods:
   – Virtual data catalogs and APIs
   – Data signatures
• Interface to Event Selector Service
     January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   82
              Virtual Data Scenario
• A physicist issues a query for events
   – Issues:
        • How expressive is this query?
        • What is the nature of the query?
        • What language (syntax) will be supported for the query?
   – Algorithms are already available in local shared libraries
   – For ATLAS, an Athena service consults an ATLAS Virtual Data
     Catalog or Registry Service
• Three possibilities
   – File exists on local machine
        • Analyze it
   – File exists in a remote store
        • Copy the file, then analyze it
   – File does not exists
        • Generate, reconstruct, analyze; possibly done remotely, then copied

     January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   83
                                 Chimera Application:
                           Sloan Digital Sky Survey Analysis
Size distribution of
 galaxy clusters?




             Galaxy cluster
100000
             size distribution
10000
                                                                  Chimera Virtual Data System
 1000                                                            + iVDGL Data Grid (many CPUs)
  100



   10



    1
         1          January 29, 2004
                       10               100   U Hawaii - Shawn McKee - University of Michigan Physics   84
                   Number of Galaxies
Programs as Community Resources:
  Data Derivation and Provenance
• Most [scientific] data are not simple
  ―measurements‖; essentially all are:
   – Computationally corrected/reconstructed
   – And/or produced by numerical simulation
• And thus, as data and computers become ever larger
  and more expensive:
   – Programs are significant community resources
   – So are the executions of those programs
• Management of the transformations that map
  between datasets an important problem

    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   85
“I’ve come across some
interesting data, but I need                 Motivations (1)
to understand the nature of
the corrections applied                            “I’ve detected a calibration
when it was constructed               Data         error in an instrument and
before I can trust it for my                       want to know which derived
purposes.”                                             data to recompute.”

                   created-by                     consumed-by/
                                                  generated-by



      Transformation             execution-of          Derivation
                                                          “I want to apply an jet
“I want to search an ATLAS event
                                                                  analysis program to
database for events with certain
characteristics. If a program that                       millions of events. If the
performs this analysis exists, I                         results already exist, I’ll
won’t have to write one from                                                 save weeks of
scratch.” January 29, 2004                                                   computation.”
                           U Hawaii - Shawn McKee - University of Michigan Physics 86
                 Motivations (2)
• Data track-ability and result audit-ability
   – Universally sought by GriPhyN applications
• Repair and correction of data
   – Rebuild data products—c.f., ―make‖
• Workflow management
   – A new, structured paradigm for organizing, locating,
     specifying, and requesting data products
• Performance optimizations
   – Ability to re-create data rather than move it
• And others, some we haven’t thought of

    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   87
The Future of e-Science…
Current Programs
• U.K. eScience
  program
• EU 6th Framework
• U.S. Committee on
  Cyberinfrastructure
• Japanese Grid
  initiative




       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   89
                U.S. Cyberinfrastructure:
                   Recommendations
•   New INITIATIVE to revolutionize science and engineering research at NSF and
    worldwide to capitalize on new computing and communications opportunities 21st
    Century Cyberinfrastructure includes supercomputing, but also massive storage,
    networking, software, collaboration, visualization, and human resources
     – Current centers (NCSA, SDSC, PSC) are a key resource for the INITIATIVE
     – Budget estimate: $1 Billion/year (continuing)
•   An INITIATIVE OFFICE with a highly placed, credible leader empowered to
     – Initiate competitive, discipline-driven path-breaking applications within NSF of
       cyberinfrastructure which contribute to the shared goals of the INITIATIVE
     – Coordinate policy and allocations across fields and projects. Participants across NSF
       directorates, Federal agencies, and international e-science
     – Develop high quality middleware and other software that is essential and special to
       scientific research
     – Manage individual computational, storage, and networking resources at least 100x
       larger than individual projects or universities can provide.



            January 29, 2004    U Hawaii - Shawn McKee - University of Michigan Physics   90
              Distributed Computing
               Problem Evolution
• Past-present: O(102) high-end systems; Mb/s networks;
  centralized (or entirely local) control
   – I-WAY (1995): 17 sites, week-long; 155 Mb/s
   – GUSTO (1998): 80 sites, long-term experiment
   – NASA IPG, NSF NTG: O(10) sites, production
• Present: O(104-106) data systems, computers; Gb/s
  networks; scaling, decentralized control
   – Scalable resource discovery; restricted delegation; community
     policy; Data Grid: 100s of sites, O(104) computers; complex
     policies
• Future: O(106-109) data, sensors, computers; Tb/s
  networks; highly flexible policy, control
       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   91
 The Globus View of the Future:
All Software is Network-Centric
 • We don’t build or buy ―computers‖ anymore,
   we borrow or lease required resources
    – When I walk into a room, need to solve a problem,
      need to communicate
 • A ―computer‖ is a dynamically, often
   collaboratively constructed collection of
   processors, data sources, sensors, networks
    – Similar observations apply for software



  January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   92
                       And Thus …
• Reduced barriers to access mean that we do much
  more computing, and more interesting
  computing, than today => Many more
  components (& services); massive parallelism
• All resources are owned by others => Sharing
  (for fun or profit) is fundamental; requires trust,
  policy, negotiation, payment
• All computing is performed on unfamiliar
  systems => Dynamic behaviors, discovery,
  adaptivity, failure handling all critical


    January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   93
        Future of the Grid for HEP?
• Grid Optimist
   – Best thing since the WWW. Don’t worry, the grid will solve
     all our computational and data problems! Just click ―Install”
• Grid Pessimist
   – The grid is ―merely an excuse by computer scientists to milk
     the political system for more research grants so they can write
     yet more lines of useless code‖ [The Economist, June 21,
     2001]
   – ―A distraction from getting real science done‖ [McCubbin]
• Grid Realist
   – The grid can solve our problems, because we design it to! We
     must work closely with the developers as it evolves, providing
     our requirements and testing their deliverables in our
     environment.
       January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   94
                            Conclusions
• Networks form the critical basis for the future of e-Science
• LHC Physics will depend heavily on globally distributed
  resources => the NETWORK is critical!
• LHC Computing Model adopted by CERN
   – Strong endorsement multi-tiered, hierarchy of distributed
     resources
   – This model will rely on grid software and network infrastructure
     to provide efficient, easy access for physicists (and other
     scientists)
   – This is a new platform for physics analysis and e-Science
• Like the web, if the grid is going to happen, it will be
  pushed forward by HENP experiments
         January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   95
           For More Information…
• Globus Project™
  – www.globus.org
• Grid Forum
  – www.gridforum.org
• HENP Internet2 SIG
  – henp.internet2.edu
• MGRID
  – mgrid.grid.umich.edu
• Questions?

      January 29, 2004   U Hawaii - Shawn McKee - University of Michigan Physics   96

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:11/11/2011
language:English
pages:92