Document Sample
Grid Powered By Docstoc
					Particle Physics Computing
          The Grid
          David Colling
       e-Science Lecturer
         Particle Physics
     Imperial College London
Apologies and Acknowledgements
  • I have been a rather unwell earlier in the
  week and so this presentation may not be
  quite as polished as it should be

  •I have “borrowed” slides from several of
  my friends and colleagues including Les
  Robertson, Ian Bird and Roger Jones
                    What is the LHC?
• LHC will collide beams of protons at an energy of 14 TeV
                                                                LHC is due to
                                                                switch on in 2008
• Using the latest super-conducting technologies, it will
  operate at about – 270ºC, just above the absolute zero of     Four experiments,
                                                                with detectors as
  temperature                                                   ‘big as cathedrals’:
             The largest terrestrial scientific                 ALICE
•   With its 27 km circumference, the accelerator will be the   CMS
                endeavor ever undertaken
    largest superconducting installation in the world.          LHCb

• Four detectors constructed and operated by international
  collaborations of thousands of physicists, engineers and

                                    David Colling
                               Imperial College London
 ATLAS/CMS – general purpose detectors

               Dimensions                       Heavy Ion

SUSY                                       Electroweak

   B physics                              QCD
                     David Colling
                Imperial College London
     ALICE and LHCb more directed
          in their physics goals

                                          Heavy Ion
B physics                                  Physics
                     David Colling
                Imperial College London
        The Challenge
                          Collaboration of 1700
The CMS detector at LHC
                          Weighs 12000 tonnes

                          Has to survive radiation
                          levels only seen in nuclear
                          reactors, bombs and stars

                          The tracker alone
                          Has 12,000,000 data

                          Several Petabytes/second

                          Greater data rate than all
                          the telephones in the
The Challenge
                              Selectivity: 1 in 1013

                              Like looking for 1 person in
                              a thousand world

                              Or for a needle in 20 million

                     Interesting interactions
                     1 in 10Trillion!

         David Colling
    Imperial College London
                 The Challenge

Each event is complicated:
                                                            (30 Km)

                 The Challenge                             CD stack with
Although the events are filtered                           1 year LHC data!
                                                           (~ 20 Km)

Data accumulating at ~15 PetaBytes/year

Equivalent to writing a CD every 2 seconds
                                                      (15 Km)

                                      6 cm
                                                      Mt. Blanc
                                                      (4.8 Km)
             50 CD-ROM
               = 35 GB
                                 David Colling
                            Imperial College London
             LHC data (simplified)
Per experiment:
                                                                   1 Megabyte (1MB)
• 40 million collisions per second                                    A digital photo

• After filtering, 100 collisions of interest per second           1 Gigabyte (1GB)
                                                                          = 1000MB
• A Megabyte of digitised information for each                         A DVD movie
  collision                                                         1 Terabyte (1TB)
                                                                          = 1000GB
• 1 billion collisions recorded/year                                   World annual
                                                                    book production
Processed data and simulation we will
  accumulate 15 PetaBytes of new data                               1 Petabyte (1PB)
                                                                           = 1000TB
  each year                       = 1% of                          10% of the annual
                                                                  production by LHC
       Each collision is independent and the processing              1 Exabyte (1EB)
              is simplified and easily parallelised                        = 1000 PB
                                                                        World annual
                                                              information production
                                         David Colling
                                    Imperial College London
             The bottom line...

• Need ~100K CPUs by the end of 2008, growing each year

• Need tens of PB of storage each year

  This is too much for any single computing centre to
  cope with we decided to use   The Grid
               Aglite-wms-job-submit myjob.jdlof how
                 very simplified view
               the middleware works
                        JobType = “Normal”;
                        Executable = "$(CMS)/exe/sum.exe";
                        InputData     = "LF:testbed0-00019";
Now a happy user        ReplicaCatalog = "ldap:// INFN Test Replica
                        Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";
                        DataAccessProtocol = "gridftp";
             Each Site consists of: Replica
     glite-wms-job-get-output <dg-job-id>
                        InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};
                        OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
                                 Location service
                        Requirements = other. GlueHostOperatingSystemName == “linux" &&
             VOMS and unhappy
                        other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ &&
              So nowis needed is (Replicac
             Confusedthe user knows
                        other.GlueCEPolicyMaxWallClockTime > 10000;
                What                an
             This is introduce some grid
            So lets are not identical. Grids
             serverthe world without
                        Rank = other.GlueCEStateFreeCPUs;
              Sites what machines are
            infrastructure… Security and
               Different can
                   there and
              out Workload
            an information system
               Different Storagethem…
              communicate with
Job &         Management to submitstorage RC element
                 Different Files
              however where
                                        A using element
                                      WMS A compute
               Different Usage Policies on
Input              job is too
              the(Resource complex a
Sandbox                               execution
              decision for user alone.
                    Broker)           location

                                                                                 Logging &
                                     David Colling
                                Imperial College London
                   A Hierarchical Distributed Computing Solution
                     (2008 Numbers for CMS, Atlas are similar)

                                        Level 1 trigger          High Level Trigger             CERN
                              ~PBytes/sec                                ~250 MBytes/sec
                                                                                  • Accept data from HLT
   •One bunch crossing per 25 ns                               CERN Computer      • First Pass reconstruction
A physicist sitting anywhere, specifiesTier data the 0
                                                               Centre. 7.3MSI2K
   •150 triggers per second                                                       • Data Custody
   •Each event is their analysis job and submits the
they need for ~1 MByte
job to the Grid Infrastructure. This infrastructure                         • Data Custody
discovers the locations of all replicas of the                              • Skimming
Tier 1 data and chooses the best location to
required                                                                    • Calibration
execute the job.
                                  CNAF-Italian          IN2P3-French        • Hosting Regionaland MC data
                                                                                RAL- UK Real Centre
            Regional Centre      Regional Centre       Regional Centre
                                                                            • Hosting Complete copy of AOD

Later the user retrieves the output from the job                            • Reprocessing data
                                                         NorthGrid ScotGrid SouthGrid
without ever having to know where the job
    Tier 2
actually ran
                                London Tier 2 (0.5MSI2K)
                                                           • Physics Data Analysis
                                 Brunel      RHUL          • MC simulationNot CERN
                                                    RHUL • Some calibration
                               Imperial      QMUL
                            The London Tier 2
• Five institutes, nine sites (some
dedicated and some shared)

• Agreed to share resources via the

• Agreed to support each other’s
                                        RHUL     Imperial
• Agreed to act as a distributed Tier
2 for the LHC experiments

• Encourages collaboration on non-
Particle Physics Grid Projects
      The efficient running of the London Tier 2 is London’s
       baseline contribution to the LHC computing project.
      This is the equivalent of maintaining the subdetector
        that you have built. Everything else builds on this.
              So where are we now?
monitoring and
analysis, as well a
series of grand
challenges and

Job Run Time
  (log scale)

        David Colling
   Imperial College London
               Where the CPU hours
                    are spent
• Distribution of Job Run Time(h) weighted by Job Run Time (h).

                    LHCb Pilot jobs

                                    David Colling
                               Imperial College London
              CE Jobs Distribution
Number of Jobs CE distribution

              Ordered list of sites

                   Using more sites
                   than anybody

                                 David Colling
                            Imperial College London
Distribution of Job Efficiency (running time/total time) for each LHC VO.

                  So different usage patterns
                     do make a difference

                                  David Colling
                             Imperial College London

 Including Brunel

          David Colling
     Imperial College London
Monitoring (continued)

            David Colling
       Imperial College London
  So where are we now?
4.5 PB
of test data
moved by CMS
in CCRC/Feb

on > 340 links CCRC Phase I
                  Phase I complete (Feb)

                            Very successful
                  Transferred >4PB (above baseline)
                 Measured CERN tape staging - 2GB/s
                 Other Tier1 staging largely acceptable
                         UK passed all metrics
                  Only problem - not really combined
                          CCRC Phase II
   Phase II - try again in context of csa08 and measure competing

       -Transfer tests now focus on latency for complete dataset

       - Large analysis challenge - standard crab jobs and use of Crab
       analysis server

       - Importance of regular LCG tests increased - now monitored
       for all cms sites

For CMS at least things seem to be on course
               A Grid for all subjects
Within Europe EGEE is developing a Grid infrastructure for all scientific
endeavours. Particle Physics has been the driving force behind this but
the Grid that is being produced is already being used by many different
disciplines including:
• biomedical, bioinformatic, medical research etc
• plasma physics and fusion
• earth observation
• Digital Holography (centred at Brunel)
• Condensed Matter Theory
• Imaging
• the list goes on and on.
          Particle Physics gave the world the web it is
          now encouraging it to use the Grid
    The Current middleware is…

 All available under open source license (modified BSD)
and approved by Open Source Initiative

 New software/middleware produced will be under the
same license.

                          David Colling
                     Imperial College London

     We have a Grid of over 300 sites

     We are building a Grid capable of delivering the requirements of the LHC
LHC Grid not working would mean no physics from LHC
                                                      to fail
              We cannot affordinfrastructure for all Science, not
    This Grid will provide partwould mean $Billions wasted
LHC Grid not working            of a global e-Science
    just particle physics.

     We know that we are getting there because we are testing it and users are using

                                       David Colling
                                  Imperial College London

Shared By: