Grid Computing Elementary Particle Physics by lso20334

VIEWS: 12 PAGES: 41

									                                            Grid Computing &
                     Elementary Particle Physics
                                               brought to you by
                                               Dr. Rüdiger Berlich



April 16th / 2 004, Sankt. Augustin / NEC                            ruediger@ berlich.de
              Agenda
                 Who am I
                      Past and present work
                      Ph.D. thesis
                 Grid Computing & Particle Physics
                      Motivation and Problem Definition
                      Conflicting Visions ?
                      Software Framework & Grid Development
                      Large Computing Resources
                 Conclusion
                 Q&A




April 16th / 2 004, Sankt. Augustin / NEC                     ruediger@ berlich.de
                                            Who am I ?




April 16th / 2 004, Sankt. Augustin / NEC                ruediger@ berlich.de
      Past and Present Work
          Physicist – Diploma at Ruhr-Universität Bochum / Germany
               Recognition of hadronic splitoffs using Neural Networks
               Crystal Barrel experiment / CERN / Geneva
               Aleph experiment (Max Planck Institute, Munich / Germany)
          Member of SuSE Linux AG until July 2001
               Engineer + international support, authoring of manual
               Technical Manager (Support) of US Office, Oakland / CA
               Managing Director UK office
          Ph.D. thesis at Ruhr-Universität Bochum
               “Application of Evolutionary Strategies to automated
                parametric optimisation studies in physics research”
               Permanently based at Forschungszentrum Karlsruhe
               (Grid Computing and eScience)
               Completed “Magna Cum Laude” in January 2004
               Postdoctoral position at RUB / FZK

April 16th / 2 004, Sankt. Augustin / NEC                                  ruediger@ berlich.de
    Ph.D. thesis
                                                    Signal in histogram resulting from cuts
                                                    Varying cuts can lead to improved signal
                                                    With figure of merit == f(cuts) :
                                                    Accessible to computerized maximization techniques
                           ,,,                      Often used : Significance 2 : S2 = N 2 / (N + 2B 0 )
                                                    Problem : Testing quality of cuts in particle physics
                                                    requires the processing of huge amounts of data for
                                                    each set of cuts




             „Events“                        Histogram with signal                  e.g. reduced Background
                           Analysis / Cuts                           Optimization
April 16th / 2 004, Sankt. Augustin / NEC                                                           ruediger@ berlich.de
    Ph.D. thesis / EVA
 Implementation of Evolutionary          evaDouble              evaMember
                                                                 evaMember    STL vector<T*>
 Strategies and Genetic Algorithms
 Parallel Execution on SMP                 evaBit
                                                                     evaIndividual<T>
 (POSIX threads), clusters and
 Grid (through MPICH) possible           evaMember
                                       evaBitset<N>

 In MPI-mode : data exchange                                        evaPopulation<T>
                                             gap
  through XML
 Seemless parallelization – no change
                                          boundary    evaMPIPopulation<S,T>   evaPthreadPopulation<S,T>
 of user code required
 Serial mode allows easy debugging
 Implemented in C++
 Derivative of STL vector class,
 hence fully templatized
 Open Source
 Interface to ROOT
 In idealised environment : reduction
 of compute time from close to 6 hours
 to under 3 minutes (1+128 ES)
 Almost linear speedup
April 16th / 2 004, Sankt. Augustin / NEC                                                   ruediger@ berlich.de
    Ph.D. thesis / EVA
                                                       Variation of 4 parameters
                                                       Significant improvement of
                                                       figure of merit even for hand optimised
                                                       selection
                                                      Through parallelisation: reduction of compute
                                                      time from 38 hours to under 3 hours




                                     High Number of
                                     lokal Minima
                                     Increasing
                                     Frequency and
                                     Amplitude
April 16th / 2 004, Sankt. Augustin / NEC                                            ruediger@ berlich.de
    Relationship to Grid

       Original thought: Make EVA a Grid application
          but: comes for free (e.g.: submit mpd as Grid job, with
          program in input sandbox; use MPICH-G2; ...)
       GridKa (Karlsruhe Cluster – 4500 CPUs in final stage)
       BaBar Grid activities
       D-Grid / EGEE
       Freelance journalistic activities (Grid series in Linux Magazin,
       Linux User & Developer / UK, ...)
       Talks at Linux Tag, UKUUG, DSY, ...




April 16th / 2 004, Sankt. Augustin / NEC                        ruediger@ berlich.de
                                        Grid Computing &
                    Elementary Particle Physics




April 16th / 2 004, Sankt. Augustin / NEC                  ruediger@ berlich.de
 Motivation :
 In particle physics :
  Got many “events”
  Here : From BaBar
   experiment
  Collision of electrons
   and positrons
  Need to store and
   analyse all available
   data
  How much data is
   there ?


April 16th / 2 004, Sankt. Augustin / NEC   ruediger@ berlich.de
      Motivation :
         Moore's Law : Processing Power doubles every 18 months*)
         Demand for computing power grows faster than available resources
      *) slow-down expected in about 10 years (www.wired.com)




April 16th / 2 004, Sankt. Augustin / NEC                       ruediger@ berlich.de
         Motivation :
         future data sources at CERN



              Atlas                         CMS




                                            LHCb
         .

         .
         .                                  .
April 16th / 2 004, Sankt. Augustin / NEC          ruediger@ berlich.de
Motivation :
    Expect datarate of
  40 Gb/s for all 4 LHC
experiments, equivalent
      to a datarate of
          10 petabytes
    for all experiments
             per year.
  This is comparable to
   a 16 sqm room, filled
  every year with DVDs
    containing the data.




April 16th / 2 004, Sankt. Augustin / NEC   ruediger@ berlich.de
 Motivation :
    Analysis in PP must keep up with integrated Luminosity
    But : Need for computing power rises faster than Moore's Law
    Possible solutions :
    a) Find someone to pay for additional computers and/or
    b) use existing, distributed resources.
    Solution b) needs standardised framework for distributed computing
    Problem applies to many areas, beyond the boundaries
    of Particle Physics.

 “New” research discipline, sometimes
 called GRID Computing

April 16th / 2 004, Sankt. Augustin / NEC                  ruediger@ berlich.de
 Motivation :
    Using existing, distributed resources is a good idea anyway
    Single countries can't afford multi-billion dollar investments
     into particle physics anymore
    Might still be willing to donate a portion of overall expenses
    But: want to spend money locally !
    More willing to invest in local compute centres than into resources
    elsewhere ...


 Vision:
 Need to bundle existing or newly established computing resources
 to form a coherent, standardized ensemble capable of running very
 large scale jobs (p.physics, wheather simulation, biology, ...)

April 16th / 2 004, Sankt. Augustin / NEC                     ruediger@ berlich.de
             Two people with another vision :
                     Carl Kesselman         and    Ian Foster




  “The Grid -                                     Jointly they can be considered
  Blueprint for a new                                      to be the “fathers of
  Computing Infrastructure”                                     Grid Computing”



April 16th / 2 004, Sankt. Augustin / NEC                           ruediger@ berlich.de
           The Vision                       (a la Foster & Kesselman)

              Computing Power from a plug in the wall
              The world is your computer
              Analogy : Power Grid
              Seamless exchange of computing power
              Logical extension of the World Wide Web ?
              "The Web on Steroids"
              A hype (which has good and bad sides ...)
              Takes distributed computing to a new level



           ⇒ Different people mean different things
                  when talking about “The Grid”
April 16th / 2 004, Sankt. Augustin / NEC                               ruediger@ berlich.de
               The Vision                   (common grounds + “definition”)

                  "When the network is as fast as the computer's

               internal links, the machine disintegrates across the
               net into a set of special purpose appliances"
               (Gilder Technology Report, June 2000)
                 "Grid technologies and infrastructures can be defined
               as supporting the sharing and coordinated use of
               diverse resources in dynamic, distributed 'virtual
               organisations' " (OGSA white paper)



April 16th / 2 004, Sankt. Augustin / NEC                                     ruediger@ berlich.de
     Limitations of distributed computing
                                            Ping : round-trip
                                            time on LAN and
                                            WAN
                                            Already in the
                                            range of, say, an
                                            old MFM hard-
                                            drive. And
                                            bandwidth beats it
                                            by several orders
                                            of magnitude ...




April 16th / 2 004, Sankt. Augustin / NEC            ruediger@ berlich.de
   Limitations of distributed computing
     High-speed network connection (the faster, the more data
    can be exchanged between participating nodes).
     "Speed" means :
   a) High Bandwidth - Scales ! No longer a limiting factor.
   b) Low Latency - Doesn't scale ! (X-North America Latency ca. 20 msec)
                                                 20msec (Latency)

                           Bandwidth :
                           #packets/s




                                 San Francisco     Network          New York



    Latency limits possible application types !

April 16th / 2 004, Sankt. Augustin / NEC                                      ruediger@ berlich.de
   Requirements of distributed computing
      High Bandwidth / Low Latency connections are needed !
          One current focus of research and development
          From "local GRIDs" to the "World Wide GRID"
          GEANT, TeraGrid
      Common middleware framework needed; Standards !!


     ⇒ Joint GRID efforts needed and possible



April 16th / 2 004, Sankt. Augustin / NEC                     ruediger@ berlich.de
    Middleware initiatives related to P.P.
   Of course: Globus 2 (3,4,... not too relevant a.t.m.)
   The European DataGrid (project finished succesfully in March 2003)
    From the Data Grid Project Presentation :
   "The main goal of the DataGrid is to develop and test
    the technological infrastructure that
    will enable the implementation of scientific collaboratories
    where researchers and scientists will perform their
    activities regardless of geographical location."
    LCG2 (based on EDG)
    EDG terminated March 2004 – EGEE will take over
    GriPhyN (Grid Physics Network) - american counterpart to EDG
    AliEn (Alice Environment)
     (Cross Grid -> Interactive Applications based on EDG framework)

April 16th / 2 004, Sankt. Augustin / NEC                              ruediger@ berlich.de
             Software Framework
             The Globus Toolkit 2 - www.globus.org
              Base technology for the majority of Grid projects
              Globus is based on the I-WAY project (“Information Wide
               Area Year”), a test bed of 17 US research institutions
              Effort led by Foster and Kesselman (Argonne Lab and USC)
              Layer between Operating System and GRID application
              Takes care of many aspects of GRID computing including
              the communication between the participating program
              fragments, authentication, security, ...
              "Protocol layer"



April 16th / 2 004, Sankt. Augustin / NEC                     ruediger@ berlich.de
     Software Framework - Globus
       Basic functionality : submit job -> like batch submission
       Does not contain a Resource Broker (see EDG)


    The Globus Toolkit includes among other components :
      GSI (Grid Security Infrastructure) : Authentication
      GASS (Global Access to Secondary Storage)
      Uses RSL (Resource Specification Language) to specify resources
      (min. size of memory, OS, etc.)
    When GRID computing becomes more mature, more
    services will migrate from the middleware into the
    Operating System. Needed for seemless integration !


April 16th / 2 004, Sankt. Augustin / NEC                          ruediger@ berlich.de
             Software Framework – Globus 2
                                            1.) Job transmission to server via
                                             HTTP as an RSL document
                                            2.) Server forks jobmanager, hands
                                             over RSL document
                                            3.) jobmanager parses RSL, checks
                                            the job requirements
                                            4.) jobmanager distributes the
                                            job to local resources in cluster
                                            5.) jobmanager sends a unique job
                                            id (URI) to the client
                                            6.) The client ca use the URI to
                                             cancel the job, when needed, or
                                            gain status information
                                                          Courtesy
                                                          Dr. Harald Kornmayer


April 16th / 2 004, Sankt. Augustin / NEC                         ruediger@ berlich.de
         Software Framework – Globus 2




                                            Three major services
                                             Resource Management
                                             Information Service
                                             Data Management




                                            Courtesy
                                            Dr. Harald Kornmayer



April 16th / 2 004, Sankt. Augustin / NEC               ruediger@ berlich.de
Globus 3: Software Framework
                                                                 Compute Service Provider

                                                                         Miner
                                                                         factory                            Database
                    Community               3.) C reate miner                                 5.) Query     service
                    registry                service with                        4.) C reate
                                            lifetime 10                         instance

                                                                                                            1. Bio
       1.) Find datamining   2.) Handles for                              Miner
       service and           miner and                                                                      database
       possible storage      database factories
       locations                                                                              6.) Results
                                              6.) Keepalive
                    User
                    application                                                                 5.) Query
                                              6.) Keepalive


                                                                                                            Database
                                                                          Database
                                                                                              6.) Results   service
         I want to create a
         personal database                3.) C reate database                  4.) C reate
         containing data on               service with                          instance                    2. Bio
                                          lifetime 1000
         E.coli metabolims                                                Database
                                                                                                            database
                                                                          factory

                                                                  Storage Service Provider



Most PP-based Grid Computing still based on GT2
Globus 4 (Web service / WSRF): source of confusion in P.P.
April 16th / 2 004, Sankt. Augustin / NEC                                                                              ruediger@ berlich.de
Software Framework – EDG
  Joint three-year project of European Union
  Goal: Development of methods for the transparent distribution
  of data and programs
  Needed in particle physics, biology (genome project), earth observation ...
  21 members, 15 compute centers (2-32 CPUs, up to
  1 Terabyte of mass storage)
  LCG based on EDG2 – accesses resources with thousands of CPUs, e.g.
  GridKa at Forschungszentrum Karlsruhe
  Major new component: resource broker
  Project was finished in March 2003. Successor will be EGEE
  (“Enabling Grids for eScience in Europe” - Edinburgh NeSC, kick-off
    next week in Cork / Ireland)


April 16th / 2 004, Sankt. Augustin / NEC                        ruediger@ berlich.de
   Software Framework – EDG
                                             Compute centers
                                            (sites) offer resources
                                            (Storage Elements SE
                                            or Compute Elements CE)
                                             Each site publishes its
                                            status in an information
                                            index
                                             Information about
                                            available data is stored
                                            in the Replica Location
                                            Server (RLS)




                                            Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC            ruediger@ berlich.de
    EDG – Virtual Organisations


                                             Users and resources
                                            belong to Virtual
                                            Organisations
                                             Users can only access
                                             resources that belong
                                             to their own VO




                                            Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC            ruediger@ berlich.de
    European DataGrid – JDL + RB
       Jobs are sent to the Resource Broker by the user
      Job specifications are encoded in the “Job Description Language” (JDL)

    Executable               = "./myexecutable";
    Arguments                 = "--config small-file.cfg --data BIG-file.dat";
    StdOutput                = "std.out";
    StdError               = "std.err";
    InputSandbox                = {"small-file.cfg"};
    OutputSandbox = {"std.out","std.err","run.log"};
    InputData               = {"LF:BIG-file.dat"};
    ReplicaCatalog = "ldap://rls.example.org:12345/lc=WPsixCollection,\
                      rc=WPsixRC,dc=testbed,dc=fzk,dc=de";
    DataAccessProtocol = {"gridftp"};


                                                                                 Courtesy Marcus Hardt
April 16th / 2 004, Sankt. Augustin / NEC                                               ruediger@ berlich.de
The AliEn Toolkit (“Alice Environment”)
Grid + OpenSource
 Pure Open Source project, started as part of ALICE collaboration (CERN)
 Small development team (very different from EDG)
 Pragmatic approach (what do we have, how can we make it work)
 3 Million lines of code (cmp. Linux kernel: ca. 5.5 Mio LOC)
 99 % of the code taken from publicly available packages, mostly Perl
 Only about 1 % of the code had to be developed in addition
 Similar functionality to EDG framework
 Based on WebServices (SOAP, XML)
 Used in other projects, e.g. MammoGrid (UK), a breast cancer database
 See http://alien.cern.ch


April 16th / 2 004, Sankt. Augustin / NEC                       ruediger@ berlich.de
   The AliEn Toolkit / Grid + OpenSource




April 16th / 2 004, Sankt. Augustin / NEC   ruediger@ berlich.de
           Large Compute Resources for P.P.
           At Forschungszentrum Karlsruhe :
              GridKa Cluster
                  Tier-1 centre for LCG and other experiments
                  4500 CPU's in the final stage
              Research in fast interconnects (Infiniband), ROOT toolkit, ...




April 16th / 2 004, Sankt. Augustin / NEC                          ruediger@ berlich.de
    The LHC multi-Tier
        Computing
          Model                                          Uni e                    Lab z
                                                                                                              CMS
                                                         ATLAS
                                              virtual organizations
                                                                                                       Uni a
                                                                      CERN Tier 1
                                        Lab y                                        UK (RAL)
                                                               USA
                                                           (Fermi, BNL)
                                                                                             France
                      Tier 3                            Tier 1                              (IN2P3)
                      (Institute     Tier 2                               C ERN                            Uni b
                      computer)
                                    (Uni-CCs,                Italy           Tier 0LHCb
                                                                                      ……….
                                    Lab-CCs)                (CNAF)
                Tier 4
               ( Desktop)       γ                                                   Germany
                                                                      ……….           (FZK)
                                            Lab x
                                                                                                      Lab i
                                       β
                                                                                          working groups
                                                           Uni d
                                                  α                                 Uni c
                                                              Tier 0 Centre at CERN



April 16th / 2 004, Sankt. Augustin / NEC                                                                           ruediger@ berlich.de
                                            GridKa Machine Hall




        Also : worlds first totally water-cooled rack ...
        1 Gbit/s between Karlsruhe and CERN

April 16th / 2 004, Sankt. Augustin / NEC                         ruediger@ berlich.de
                                        GridKa planned Resources
                   8000                                                       4000
                                                CPU
                                                Disk
                   6000                         Tape                          3000




                                                                                     kSI95
           Tbyte




                   4000                680 CPUs                               2000
                                       160 TB disk
                                       250 TB tape
                   2000                                                       1000


                       0                                                      0
                              2002 2003 2004 2005 2006 2007 2008 2009
                                       LCG Phase I     Phase II   Phase III

April 16th / 2 004, Sankt. Augustin / NEC                                         ruediger@ berlich.de
                        End of June 2003 – delivery of another 70 TB




April 16th / 2 004, Sankt. Augustin / NEC                              ruediger@ berlich.de
     Software Framework / FZK
     Distributed / cluster filesystems ?      (Only partially related)

       NFS, Open AFS, OpenGFS, PVFS, PVF, PNFS, LegionFS
       Avaki (based on Legion FS), Lustre, AliEn-FS
       GPFS




April 16th / 2 004, Sankt. Augustin / NEC                    ruediger@ berlich.de
   Conclusion
   Still early days, but ...
      Huge investments into infrastructure
      Many, partially incoherent Grid initiatives
      Far from standardisation
      Globus 2 still in use, will remain so
      for a while ...
      Requirements of LHC, other
     particle physics experiments
     enforce some sort of a GRID
     oriented solution in P.P.
      Real need starts with LHC (when ??)
      Research needed to start early to ensure efficient
      usage of billion dollar investments

April 16th / 2 004, Sankt. Augustin / NEC                  ruediger@ berlich.de
                  Please feel free to ask questions !



                I appreciate the continuous interest and support by the
             German Federal Ministry of Education and Research, BMB+F .




April 16th / 2 004, Sankt. Augustin / NEC                           ruediger@ berlich.de

								
To top