Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

ppt - Experimental Particle Physics - Experimental Particle Physics

VIEWS: 9 PAGES: 21

									EPP Grid Activities
 AusEHEP Wollongong
       Nov 2004
Grid Anatomy
• What are the essential components
   – CPU Resources + Middleware (software – common interface)
              nd
   – Data Resources + Middleware
                    Generation Grid
             2catalogues; unifying many data sources
      • replica
                          Globus
   – Authentication Mechanism
      • Certificates (Globus GSI), Certificate Authorities
   – Virtual Organisation Information Services
      • Grid consists of VOs!? users + resources participating in a VO
      • Who is a part of what research/effort/group
      • Authorisation for resource use
   – Job Scheduling, Dispatch, and Information Services
                 rd
              3 Generation Grid
   – Collaborative Information Sharing Services
      • Documentation & Discussion (web, wiki,…)
      • Meetings & Conferences (video conf., AccessGrid…)
      • Code & Software (CVS, CMT, PacMan…)
      • Data Information (Meta Data systems)
2nd Generation
• Accessible resources for Belle/ATLAS
  – We have access to around ~120 CPU (over 2 GHz)
     • APAC, AC3, VPAC, ARC
     • currently 50% Grid accessible
     • Continuing to encourage HPC
       facilities to install middleware
  – We have access to ANUSF
    petabyte storage facility
     • Will request ~100 TB for Belle
       data.
  – SRB (Storage Resource
    Broker)
     • Replica catalogue federating
       KEK/Belle, ANUSF, Melbourne EPP data storage
  – Used to participate in Belle’s 4x109 event MC
    production during 2004
2nd Generation
• SRB (Storage Resource Broker)
  – Globally accessible virtual file system
  – Domains of storage resources
     • eg. ANUSF domain contains the ANU petabyte
      storage facility and disk on Roberts in Melbourne
  – Federations of Domains
     • eg. ANUSF and KEK are federated

      $   Scd /anusf/home/ljw563.anusf
      $   Sls –l
      $   Sget datafile.mdst
      $   Scd /bcs20zone/home/srb.KEK-B
Grid Anatomy
• What are the essential components
   – CPU Resources + Middleware
   – Data Resources + Middleware
        • replica catalogues; unifying many data sources
   –   Authentication Mechanism
        • Globus GSI, Certificate Authorities
   –   Virtual Organisation Information Services
        • Grid consists of VOs!? users + resources participating in a VO
        • Who is a part of what research/effort/group
        • Authorisation for resource use
   –   Job Scheduling, Dispatch, and Information Services
   –
                   rd
               3 Generation Grid
       Collaborative Information Sharing Services
        • Documentation & Discussion (web, wiki,…)
        • Meetings & Conferences (AccessGrid…)
        • Code & Software (CVS, CMT, PacMan…)
        • Data Information (Meta Data systems)
3rd Generation Solutions
• NorduGrid -> ARC (Advanced Resource Connector)
   –   Nordic Countries plus others like Australia
   –   We’ve used this for ATLAS DC2
   –   Globus 2.4 based middleware
   –   Stable, patched, and redesigned collection of existing
       middleware (Globus, EDG)
• Grid 3 Middleware -> VDT
   – US based coordination between iVDGL, GriPhyN, PPDG
   – Globus 2.4 based middleware
• LHC Computing Grid (LCG) <- EDG -> EGEE
   – Multiple Tiers: CERN T0, Japan/Taiwan T1, Australia T2 ?
   – Regional Operations Centre in Taiwan
   – Substantial recent development – needs to be looked at once
       again!
3rd Generation Solutions
• Still a lot of development going on.
  – data aware job scheduling is still developing
  – VO systems are starting to emerge
  – meta-data infrastructure is basic
• Deployment is still a difficult task.
  – prescribed system/OS only
Grid Anatomy
• What are the essential components
   – CPU Resources + Middleware
   – Data Resources + Middleware
        • replica catalogues; unifying many data sources
   –   Authentication Mechanism
        • Globus GSI, Certificate Authorities
   –   Virtual Organisation Information Services
        • Grid consists of VOs!? users + resources participating in a VO
                                 VOs
        • Who is a part of what research/effort/group
        • Authorisation for resource use
   –   Job Scheduling, Dispatch, and Information Services
   –   Collaborative Information Sharing Services
        • Documentation & Discussion (web, wiki,…)
        • Meetings & Conferences (AccessGrid…)
        • Code & Software (CVS, CMT, PacMan…)
        • Data Information (Meta Data systems)
Virtual Organisation Systems
• Now there are 3 systems available
   – EDG/NorduGrid LDAP based VO
   – VOMS (VO Membership Service) from LCG
   – CAS (Community Authorisation Service) from Globus
• In 2003 we modified NorduGrid VO software for use with the Belle
  Demo Testbed, SC2003 HPC Challenge (worlds largest testbed)
   – More useful for rapid Grid deployment than above systems.
   – Accommodates Resource Owners security policies
       • resource organisations are part
         of the community
       • their internal security policies are
         frequently ignored/by-passed
   – Takes into account CA
       • certificate authorities are a part
         of the community
       • a VO should be able to list CAs
         who they trust to sign certificates
   – Compatible with existing Globus
   – Might be of use/interest to the
     Australian Grid community?
   – GridMgr (Grid Manager)
Virtual Organisation Systems
• How do VOs manage internal priorities?
   – This problem has not yet become apparent!
   – This has been left up to local resource settings.
       • For non VO resources, changes would require allocation or
         configuration renegotiation.
   – CAS is only VO middleware to address this
       • done by VOs specifying policies allowing/denying access to
         resources
       • local resource priorities are not taken into account
       • difficult to predict the effect
• VO manage job queue
   – centrally managed VO priorities, independent of
   – locally managed resource priorities
   – resource job consumers “pull” jobs from the queue
       • the VO decides and can change which jobs are run first
   – results of prototype testing – “fair-share” system could be
     used
       • users/groups are allocated a target fraction of all resources
Grid Anatomy
• What are the essential components
   – CPU Resources + Middleware
   – Data Resources + Middleware
        • replica catalogues; unifying many data sources
   –   Authentication Mechanism
        • Globus GSI, Certificate Authorities
   –   Virtual Organisation Information Services
        • Grid consists of VOs!? users + resources participating in a VO
        • Who is a part of what research/effort/group
        • Authorisation for resource use
   –                Job Scheduling
       Job Scheduling, Dispatch, and Information Services
   –   Collaborative Information Sharing Services
        • Documentation & Discussion (web, wiki,…)
        • Meetings & Conferences (AccessGrid…)
        • Code & Software (CVS, CMT, PacMan…)
        • Data Information (Meta Data systems)
Data Grid Scheduling
• Task -> Job1, Job2 ...
• Job1 -> input replica 1, input replica 2 ...
• Job1 + Input -> CPU resource 1 ...



         Job1
         Job2
  Task
         Job3
         …




• How do you determine what+where is best?
Data Grid Scheduling
• What’s the problem?
   – Try to schedule wisely
      • free resources, close to input data, less failures
   – Some resources are inappropriate
      • need to parse and check job requirements and resource info
         (RSL - Resource Specification Language)
   – Job failure is common
      • error reporting is minimal
      • need multiple retries for each operation
      • need to try other resources in case of resource failure
      • eventually we stop and mark a job as BAD
   – What about firewalls
      • some resource have CPUs which cannot access data
• Schedulers
   – Nimrod/G (parameter sweep, not Data Grid)
   – GridBus Scheduler (2003, 2004 aided them towards SRB)
   – GQSched (prototype developed in 2002, used in 2003 demos)
Data Grid Scheduling
•   GQSched (Grid Quick Scheduler)
     – Idea is based around the Nimrod model (user driven parameter sweep dispatcher)
     – Addition of sweeps over data files and collections
     – Built in 2002 as a demonstration to computer scientists of simple data grid scheduling
     – Simple tool familiar to Physicists
         • Shell script, Environment parameters
     – “Data Grid Enabled”
         • Seamless access to data catalogues and Grid storage systems
                  – Protocols – GSIFTP, GASS (and non-Grid protocols also HTTP, HTTPS, FTP)
                  – Catalogues – GTK2 Replica Catalog, SRB (currently testing)
           •   Scheduling based on metrics for CPU Resource – Data Resource combinations
                  – previous failures of job on resource
                  – “nearness” of physical file locations (replicas)
                  – resource availability
     – Extra features
         • Pre- and Post-processing for preparation/collation of data and job status checks
         • Creation and clean-up of unique job execution area
         • Private network “friendly” staging of files for specific resources (3 stage jobs)
         • Automatic retry and resubmit of jobs
         • Reporting of file access errors and job errors
         • Merging of RSL requirements for Resources and Jobs
         • Automatic checking and creation of Grid proxy
         • File globbing for Globus file staging, GSIFTP, GTK2 Replica Catalog, SRB
     – Future features
         • Scheduling based on optimised output storage location
         • Dynamic parameter sweep and RSL specification (eg. splitting file processing via meta-data)
         • Job profile reporting (in XML format, partially tested)
         • Automatic proxy renewal or notification of expiry (for long term jobs)
         • “Job Submission Service mode” for ongoing tasks such as Belle MC production?
Grid Scheduling
• $ gqsched myresources myscript.csh
 #!/bin/csh -f
 #:Param MYFILE GridFile srb:/anusf/home/ljw563.anusf/proc1/*.mdst

 #:StageIn $MYFILE
 #:StageIn recon.conf ; event.conf
 #:StageIn particle.conf

 echo ”Processing Job $JOBID on $MYFILE on host ”`hostname`
 basfexec -v b20020424_1007 << EOF
 path create main
 module register user_ana
 path add_module main user_ana
 initialize
 histogram define somehisto.hbook
 process_event $FILE 1000 $EVENTSKIP
 terminate
 EOF
 echo Finished JobID $JOBID .

 #:StageOut output.mdst srb:/anusf/home/ljw563.anusf/procout1/
 #:StageOut myana.hbook myana.${JOBID}.hbook
Grid Anatomy
• What are the essential components
   – CPU Resources + Middleware
   – Data Resources + Middleware
        • replica catalogues; unifying many data sources
   –   Authentication Mechanism
        • Globus GSI, Certificate Authorities
   –   Virtual Organisation Information Services
        • Grid consists of VOs!? users + resources participating in a VO
        • Who is a part of what research/effort/group
        • Authorisation for resource use
   –   Job Scheduling, Dispatch, and Information Services
   –   Collaborative Information Sharing Services
        • Documentation & Discussion (web, wiki,…)
        • Meetings & Conferences (AccessGrid…)
        • Code & Software (CVS, CMT, PacMan…)
        • Data Information (Meta Data systems)
                           Meta-Data
Meta-Data System
• Advanced Meta-Data Repository
   – Advanced = Above and beyond file/collection
     oriented meta-data
   – Data oriented queries…
       • List the files resulting from task X.
       • Retrieve the list of all simulation data of
          event type X.
        • How can file X be regenerated? (if lost or expired)
                                              Experiment
                                                            Just Belle!
                                                       General Analysis
                                                                        CP Violation, Charm studies, ???
   – Other queries we can imagine…                                  Description

        • What is the status of job X ?                          Specific Analysis
                                                                                    B -> D*D*Ks
        • What analyses similar to X                                         Description

          have been undertaken?                                             Interaction ?
                                                                                              Optional
        • What tools are being used for X analysis?                        Decay Chain ?
                                                                                              Optional
        • Who else is doing analysis X or using tool Y ?                     Task Type        Simulation, Filtering, Analysis
        • What are the typical parameters used for                                          Task
                                                                                                      Owner, date, status
          tool X ? And for analysis Y ?                                                          Description

        • Search for data skims (filtered sets) that                                              Application
                                                                                                             Parameters
          are supersets of my analysis criteria.
                                                                                                      Job
                                                                                                            Output + Files
Meta-Data System
• XML
  – some great advantages
     • natural tree structure
     • strict schema, data can be validated
     • powerful query language (XPath)
     • format is very portable
     • information readily transformable (XSLT)
  – some real disadvantages
     • XML databases are still developing, not scalable
     • XML DBs are based on lots of documents of the same type
     • Would need to break tree into domains, query becomes difficult
• LDAP
  – compromise
        •   natural tree structure
        •   loose schema but well defined
        •   reasonable query feature, not as good as XML
        •   very scalable (easily distributed and mirrored)
        •   information can be converted to XML with little effort if necessary
        •   structure/schema is easily accessible and describes itself !
        •   might be a way to deal with schema migration (meta-data structure is
            dynamic; need to preserve old information)
Meta-Data System
• Components
  – Navigation, Search, Management of MD
  – Task/Job/Application generated MD
  – Merging and Uploading MD
                                            LDAP Server


 User supplied
   Belle
                                                              Software generated
           CKM Group                                        Task
                                                          Experiment
                   Description
                                                                  Description
                 Specific Analysis
                                                                  Application
                             Description                                      Parameters
                            Interaction ?                              Job
                           Decay Chain ?                                     Output + Files

                             Task Type
Meta-Data System
• Navigation and Creation via Web




• Search is coming
How to use it all together?
• Getting set up
   – Certificate from a recognised CA (VPAC)
   – Accounts on each CPU/storage resource
       • ANUSF storage, VPAC, ARC (UniMelb), APAC
   – Install require software on resources (eg. BASF)
   – Your certificate in the VO system
• Running jobs
  – Find SRB input files, set up output collection
  – Convert your scripts to GQSched scripts
  – Run GQSched to execute jobs
• Meta Data
  – Find/Create a context for your tasks (what you are currently
     doing)
   – Submit this with your job, or store with output
   – Merge context + output meta-data, then upload
   – NOT COMPLETE – need auto generated MD from BASF/jobs

								
To top