CMSfriday_final by hedongchenchen


									 First steps in the CMS FrameWork

                                CMS SW

Dr. Emanuele Simili – Friday 1st July 2554
Chulalongkorn University, Bangkok (Thailand)
۞   The CMS experiment           (a very brief description)

۞   Data infrastructure          (Tier0,1,2, etc.)

۞   Data format                  (basic concepts)

۞   User experience              (installation & setup)

۞   Exploring the Data           (EDM tools, event display)

۞   User interface(s)            (FWLite, full FrameWork)

۞   Future Plan: Heavy Ions (example: simulation)

* This is a summary of my understanding … correct me when wrong!
** References will be provided at the bottom of every slide
                          CMS @ LHC
CMS @ LHC (2)
                   Data Infrastructure
The computational infrastructure is organized in Tier(s):
Tier-0 (T0) there is only 1 site: CERN.
Data is received from the CMS detector experiment, it is "repacked" into
physics streams of events. Reconstruction algorithms are run, RAW and
RECO data are exported to Tier1.
Tier-1 (T1) there are 7 sites: FNAL, RAL, INFN, …
Used for large-scale, centrally organized activities (event reconstruction,
data storage), can provide and receive data to/from Tier-2.
Tier-2 (T2) numerous "small" centres at universities.
Substantial CPU resources for user analysis, calibration studies, and
Monte Carlo production.
Tier-3 (T3) users …
Data Infrastructure (2)
                  CMS Data Formats
RAW (raw output) : full event information from the Tier-0 (i.e. from
CERN), containing 'raw' detector information (detector hits, deposited
energy, etc). RAW is not used directly for analysis.
RECO (reconstructed events) : the output from first-pass
processing by the Tier-0. This layer contains reconstructed physics
objects, very detailed. Can be used for analysis, but is still too big.
AOD (Analysis Object Data) : „distilled‟ version of the RECO event
information, is expected to be used for most analyses. AOD provides
a trade-off between event size and complexity, optimizes flexibility
and speed for any type of analyses.
PAT(s) (Physics Analysis Toolkit) : sub-set of reconstructed data
optimized for a specific need, defined by the POGs (Physics Object
Groups). Small and easy for speed analysis, but not universal.
                   Modular Structure
The CMS software has a modular architecture: each piece of
software performs a specific actions on the „data‟, (like in
an assembly line), and it is loaded only when needed.

               ED Producer         ED Filter         ED Analyzer

   input                                                                           output

Data in the Event are uniquely identified by four quantities:
C++ class        E.g., edm::PSimHitContainer or reco::TrackCollection.
module label     label assigned to the module that created the data.
instance label   label assigned to object from within the module (default: “ ”).
process name     the process name as set in the job that created the data.
These are the 4 columns shown by edmDumpEventContent (see next…).
                       Data Provenance
It might be crucial to track the provenance of the data (e.g., corrected
calibration, faulty modules). In CMS this is done on file level, history
information of what piece of code did what is stored by mean of the 4
identifiers above (class, module, instance, process).

Provenance of a file can be quite complicated (e.g., RECO) …

C.Jones, et al. (FNAL, Cornell U.) – Presentation for CHEP 2009
         CMS Software Ingredients
The software frame-work that has been built to support
the CMS experiment is called CMS SW. It stands on:

   Operating System: Scientific Linux CERN 5 (SLC5)
(which is built on Scientific Linux 5 (SL5), which is in turn built on
Red Hat Enterprise Linux 5 (RHEL5)) .
 Programming Language(s): C++ & Python (the latter
serves just for the configuration scripts) .
 Analysis  Framework: ROOT (well known, object-oriented,
physics analysis framework) .
   Online Repository: CVS (Concurrent Versions System) .
The CMS software CMS SW can be downloaded from
the CERN CVS repository and compiled locally:
setenv CVSROOT
cvs login (password: 98passwd)
cvs co -r CMSSW_4_2_4 CMSSW (this will check-out the whole CMSSW)
scram b (compilation may take a long time)

For more info:
   Check the online documentation or … ask Phat 

However, if you work on:
or, this step is not necessary: everything
is installed and ready to run.
If you work on or on everything is installed already (thanks Phat!).
You just need to set up your working environment:
ssh (or:
cd ~/scratch0 (or whatever …)
scram list (to check what releases of CMSSW are available)
scram project CMSSW CMSSW_4_2_4 (or short: cmsrel CMSSW_4_2_4)
cd CMSSW_4_2_4/src

And configure ROOT (add this to your rootlogon.C):
TString cmsswbase = getenv("CMSSW_BASE");
if (cmsswbase.Length() > 0)
                             EDM Tools
There are simple tools to explore the event content with no need for
an analysis script, not even ROOT : EDM Tools
edmDumpEventContent file.root
Lists the content of a data file (shows which products exist in the file).
edmProvDump file.root
Prints out all the tracked parameters which were used to create the file,
Event Setup producers (ESSources and ESModules) are also shown.
edmFileUtil file.root
One-line summary of the file content (run, lumi, n. of events, size).
Try it out by yourself ...

   EDM - Event Data Model (e.g., edmDump, edm::Handle<…>) .
   ES - Event Setup (e.g., ESSource, ESModules) .
                          Event Display
FireWorks is the CMS Event Display. It is invoked from shell:
cmsShow reco.root
The simplest way to look at CMS data is by using ROOT, and the
light frame-work FWLite:
  root -l
    .L vCount.C

   * remember to configure ROOT

This macro (vCount.C)
doesn‟t do much … it
loops over the events
and counts the number
of reconstructed
vertices …
                              FWLite (2)
Another way to use the light frame-work is from the shell:
 cvs co PhysicsTools/FWLite/
 nedit PhysicsTools/FWLite/bin/ (or

 scram b
   (or FWLiteLumiAccess)
                              FWLite (3)
And the resulting histograms are saved in a ROOT file …


                                           Z0 → μ+ μ-

This example used an enhanced Z0 → μ+ μ- sample to plot the
invariant mass of the μ pair with very loose cuts..
                       Full Framework
All this is done on:
cmsrel CMSSW_4_2_4
cd CMSSW_4_2_4/src
mkedanlzr DemoAnalyzer     (creates a folder structure containing all ingredients to run analysis)

Among the other things, mkedanlzr script has generated:
The first is a Pyton config file, containing file names etc.
The second is the actual Analyzer code (the real meat).
cd DemoAnalyzer
scram b

The analysis runs over the file(s) specified and writes the output to a
ROOT file (file names are specified in
               Full Framework (2)
First thing we can try is to loop over the track collection and simply
plot η, pT and φ of all reconstructed tracks.

                   pT > 200 MeV                    |η| < 2.6

                                                 -π < φ < π

In this way we can immediately see the CMS detector acceptance …
               Full Framework (3)
After finding the outer radius of the
recontructed tracks …
                                        7 cm < r < 115 cm
… we can plot a 3d histogram of the
last track hits, and see the actual
shape of the CMS inner tracker.
                  Python Config Files
The standard way to run any job in CMS SW is:
A config file consists of the following parts (data members of cms.Process):
   a source (MC generator or stored file),
   a collection of modules to run, along with customized settings,
   an output module, to create a ROOT file which stores the data,
   a path which will list, in order, the modules to be run.
A standalone config. file can be used, or a collection of config.
fragments, already available in CMS SW. *

 a Python config file can be created without speaking any Python!
              Python Config Files (2)
* In CMS SW a configuration file can be composed with existing
configuration fragments (, -s […] --conditions […]
--fileinfile:”” --fileout:”” -n […] --no_exec
The flag --no_exec tells to produce the config file but not to run.
Many configuration fragments are available in CMS SW, to satisfy
almost any need of simulation, reconstruction, analysis ...

Another way to create a file is via the graphical interface:
                  Configuration Editor
                              Heavy Ions
First attempt: simulate 5 heavy ion events within CMS SW (luckily I got
some help from the Heavy-Ions forum on the CMS Hypernews).
I used one of the standard generator (Hydjet) with minimum bias:
Simulation (takes a few hours on -s GEN,SIM,DIGI,L1,DIGI2RAW,
 HLT:HIon -n 5 --geometry DB --conditions auto:starthi --relval 500,5 --datatier 'GEN-SIM-
 DIGI-RAW-HLTDEBUG' --eventcontent FEVTDEBUGHLT --scenario HeavyIons --no_exec

Reconstruction (takes about 1hour): hiReco -s RAW2DIGI,RECO -n -1 --scenario HeavyIons --conditions
 FrontierConditions_GlobalTag,MC_42_V12::All --datatier 'GEN-SIM-RECO' -
 -eventcontent=RECODEBUG --filein=file:hiSim.root --fileout=file:hiReco.root --no_exec

Output of the simulation for 5 events (RAW) ~ 250 Mb .
Output of the reconstruction for 5 events (RECO) ~ 120 Mb .
                    Heavy Ion Event

H.I. events are slightly different than pp …
E.g., track collection is hiSelectedTracks,
instead than generalTracks .
Heavy Ion Event (2)
After few weeks of full immersion, I start making sense out of the
CMS software frame-work. It won‟t take long before I can produce
some results, which will be about …

 If there is no opposition, I will keep working on Heavy-Ions …

 My 1st goal is to implement the Event Plane analysis on CMS
  (the time frame can be about 2 clean weeks).

 Longer term goal is to interface with the Heavy-Ions community
  in CMS and take care of some specific topic.

In the meantime, I can work with Mo and help each other to
  better understand the intricate working of CMS SW.

Please give comments on the above.
                                         Thanks for your attention!
still a long way to go …
… but there‟s a dazzling light at
    the end of the tunnel!

To top