Physics with SAM-Grid
Stefan Stonjek
University of Oxford
CHEP 2003
CHEP 2003 25th March 2003 San Diego Stefan Stonjek
1
Outline
• • • • • • Components of SAM Grid Job submission Example Problems Outlook Summary
CHEP 2003
Stefan Stonjek
2
Components of SAM-Grid (global)
• JIM (Job and Information Management system)
– Frontend and glue
• Condor-G for global submission and brokering • GRAM protocol to transfer job to execution site • Authentication via GSI (Grid Security
Infrastructure)
CHEP 2003 Stefan Stonjek 3
Components of SAM-Grid (local)
• Data handling with SAM (Sequential data Access via Meta-data) • Local job submission to
– CDF: CAF (Central Analysis Facility)
• FBS: Farm batch system • Kerberized tools to write data back to FNAL
– DØ: OpenPBS
• Output data handling via SAM
CHEP 2003 Stefan Stonjek 4
Submission to the Grid
• User must provide :
– Grid proxy – Job description file – “tar” file with executable and configuration files
• GUI exists (generates “jdf” file, “tar” file and submits) • Submission to via JIM to Condor • Submit to the site with the most files of the required dataset already present
– New Condor-MMS feature: execution of external code when negotiating the matches – Here: calls SAM to check for the presence of input data at different locations
CHEP 2003 Stefan Stonjek 5
Local Submission
• Job is transferred as “tar” file via the GRAM protocol and than submitted to the local batch system • Different local batch systems are possible
– Need adaptor for submission and job status information – Supported at the moment: FBS, PBS, LSF, Condor
• Queues have to be the same at all sites
– Problem: job should not stop in the middle of an input file – User has to limit amount of input relative to CPU time in queue (needs to know queue CPU time) or provide CPU time per event (difficult)
CHEP 2003
Stefan Stonjek
6
SAM-Grid Architecture
CHEP 2003
Stefan Stonjek
7
Job Description File
• • • • • • • • • executable = ./run-job.sh sam_dataset = jbot0g input_sandbox_tgz = inbox.tgz output_sandbox = sam@sam.fnal.gov:/www/output.tgz email = stonjek@fnal.gov job_type = caf caf_job_type = sam caf_initial_section = 1 caf_final_section = 1
CHEP 2003
Stefan Stonjek
8
Data Handling (SAM)
• If needed SAM transfers the files to the local site • SAM translates dataset name to list of files • Selection can be based on physics metadata • File transfer and delivery is transparent for the user
CHEP 2003 Stefan Stonjek 9
Job Monitoring
• Monitoring via a Web page • Job is identified by a global job ID • Decentralized approach
– Several independent web-servers possible
CHEP 2003
Stefan Stonjek
10
Layout of the Example
(shown at Supercomputing 2002, November 2002)
• • • • • •
Submission via command line Broker to one site Transfer via GRAM protocol Local job submission by CAF Job monitoring via Web Transfer of results via kerberized rcp to FNAL web server
Stefan Stonjek 11
CHEP 2003
•CDF
–
•DØ
Kyungpook National University, Korea Rutgers State University, New Jersey, US Rutherford Appelton Laboratory, UK
–
Imperial College, London, UK Michigan State University, Michigan, US University of Michigan, Michigan, US
–
–
–
Texas Tech, Texas, US
University of Toronto, Canada
Grid Map
–
–
–
University of Texas at Arlington, Texas, US
–
CHEP 2003
Stefan Stonjek
12
z0(µ1)
z0(µ2)
Physics
Standard CDF analysis job submitted via SAM-Grid and executed somewhere
J/ψ => µ+ µ-
CHEP 2003
Stefan Stonjek
13
Problems (Security)
• Firewalls (different settings at different sites)
– Block all incoming connection to unpriviled ports – Cancel idle TCP/IP connection – Communication problems, in particular for remote execution
• Authentication (ssh, kerberos, GSI, ...)
– FNAL allows just kerberized access
• Different and local policies
• Problem: How to write back the data? • Grid Security Infrastructure (GSI) might help
CHEP 2003 Stefan Stonjek 14
Problems (Private Networks)
• • • • Already problems with SAM Worker node can contact outside world Outside world can not call back Problem if long time between call from worker and the response from the outside • IPv6 might be a solution
CHEP 2003
Stefan Stonjek
15
Outlook
• Deploy SAM-Grid to further locations • Develop SAM-Grid towards production readiness
CHEP 2003
Stefan Stonjek
16
Summary
• Several new tools and protocols were used to from a Grid enabled environment to do physics • SAM-Grid is able to use Grid technology to perform real world physics analysis
CHEP 2003
Stefan Stonjek
17
CHEP 2003
Stefan Stonjek
18