Docstoc

IGUANA Architecture Plans (PowerPoint)

Document Sample
IGUANA Architecture Plans (PowerPoint) Powered By Docstoc
					                          CMS Data Analysis
                Current Status and Future Strategy


                                                      ACAT 2002


                                     On behalf of CMS Collaboration

                                     Lassi A. Tuura
                                     Northeastern University, Boston




http://iguana.cern.ch   June, 2002                     Lassi A. Tuura, Northeastern University
                                        Overview
 The Context — CMS Analysis Today
 Data Analysis Environment Architecture
      Overview
      COBRA
      IGUANA
      GRID/Production
 Tomorrow and Beyond
      Leveraging current frameworks in the Grid-enriched analysis environment
      Clarens client-server prototype
      Other prototype activities




   http://iguana.cern.ch   June, 2002                        Lassi A. Tuura, Northeastern University   2
                   Context              Challenges:
                                        Complexity
                                        Geographic Dispersion
                                        Direct Access To Data
                                        Migration from Reconstruction to Trigger




Environments:
Real-Time Event Filter, Online Monitoring
Pre-emptive Simulation, Reconstruction, Analysis
Interactive Statistical Analysis

   http://iguana.cern.ch   June, 2002                        Lassi A. Tuura, Northeastern University   3
                          Current CMS Production
                                       HEPEVT              CMSIM          Zebra files
       Pythia
                                       Ntuples            (GEANT3)        with HITS


                                                                          ORCA/COBRA
                                        ORCA/COBRA                            ooHit
         Objectivity                     Digitization       Objectivity     Formatter
         Database                       (merge signal       Database
                                         and pile-up)
                                                                          OSCAR/COBRA
                                                        Objectivity         (GEANT4)
 IGUANA                      ORCA                       Database
Interactive                   User
 Analysis                   Analysis
                                                    Ntuples or
                                                    Root files

  http://iguana.cern.ch   June, 2002                                        Lassi A. Tuura, Northeastern University   4
       Complexity of Production 2002
   Number of Regional Centers                                            11

   Number of Computing Centers                                           21

   Number of CPU’s                                                    ~1000

   Largest Local Center                                            176 CPUs

   Number of Production Passes for each Dataset
                                                                        6-8
   (including analysis group processing done by production)

   Number of Files                                                  ~11,000

   Data Size (Not including fz files from Simulation)                  17TB

                                                               7TB toward T1
   File Transfer by GDMP and by perl Scripts over scp/bbcp
                                                               4TB toward T2




http://iguana.cern.ch   June, 2002                            Lassi A. Tuura, Northeastern University   5
                                      Emacs Analysis
                               Interactive used to edit a CMS
                                        C++ plugin to create and
                                        fill histograms



                                                        Lizard Qt
                                                        plotter



OpenInventor-based                            Most of analysis is done
display of selected
event                                         using NTUPLEs in PAW,
                                              some in ROOT

                                                 ANAPHE histogram
Python shell with                                extended with pointers
Lizard & CMS modules                             to CMS events

http://iguana.cern.ch   June, 2002                           Lassi A. Tuura, Northeastern University   6
    Behind the Scenes: Frameworks
                    Data Browser
Generic analysis
     Tools
                                                    GRID
                                                              Distributed
Analysis job                           ORCA       Objy
                                      OSCAR COBRA tools
                                                              Data Store
  wizards
                                                            & Computing
                                       FAMOS                Infrastructure
Detector/Event
   Display                                          CMS
                                                    tools
                 Federation
                  wizards




                     Consistent User               Coherent basic tools
                            Interface              and mechanisms

 http://iguana.cern.ch   June, 2002                                Lassi A. Tuura, Northeastern University   7
                           Frameworks Disected
        Specific                        Grid-Uploadable
      Frameworks                           Physics modules

                        Event        Reconstruction         Physics               Data
                        Filter         Algorithms           Analysis            Monitoring

                                     Calibration
         Generic                      Objects
       Application                              Configuration           (Grid-aware)
                                      Event
       Framework                                  Objects              Data-Products
                                     Objects


                                      Adapters and Extensions

       Basic                 ODBMS
                                     GEANT
                                               CLHEP
                                                            PAW           C++ Standard Library
      Services                        3/4                Replacement      + Extension Toolkits



http://iguana.cern.ch   June, 2002                                        Lassi A. Tuura, Northeastern University   8
                       Framework Design Basis
 Several frameworks provide the environment together
      Open: No central framework with all functionality
        – Frameworks are designed to be extensible
        – … and to collaborate with other software
      Coherent: User sees “final” smooth interface
        – Achieved by integrating the frameworks together
        – … but the user does not do this work him/herself !
      Design applied at both framework and object design level
 Successfully applied in many parts of CMS software
      Applications, persistency; sub-frameworks; visualisation; …
      No loss of usability, functionality or performance
      Has made it easy to integrate directly with many existing tools
 This is nothing novel — it is part of the standard risk-
  mitigation strategy of any modern industrial solution
   http://iguana.cern.ch   June, 2002                       Lassi A. Tuura, Northeastern University   9
                              Frameworks: COBRA
                    Data Browser
Generic analysis
     Tools
                                                    GRID
                                                              Distributed
Analysis job                           ORCA       Objy
                                      OSCAR COBRA tools
                                                              Data Store
  wizards
                                                            & Computing
                                       FAMOS                Infrastructure
Detector/Event
   Display                                          CMS
                                                    tools
                 Federation
                  wizards




                     Consistent User               Coherent basic tools
                            Interface              and mechanisms

 http://iguana.cern.ch   June, 2002                                Lassi A. Tuura, Northeastern University   10
                    COBRA: Main Components
 Push- and pull-mode execution—and any mixture
        Reconstruction-on-demand is a key concept in COBRA
        Detector-centric reconstruction—push data from event
        Reconstruction-unit-centric reconstruction—pull/create data as needed
 Event data and related structures
        Basic support for commonly needed objects (hits, digis, containers, …)
 Application environments
        Basic application frameworks, various semi-specialised applications
        Lots of error-handling and recovery code (automatic recovery after crash, …)
 Meta data: a key component
        Data chunking, system and user collections, data streams, file management,
         job concepts, configuration and setup records, redirected navigation after
         reprocessing, …


   http://iguana.cern.ch   June, 2002                           Lassi A. Tuura, Northeastern University   11
                           COBRA: Main Strengths
 Algorithms in plug-ins
      “Publish-yourself-plug-ins”—self-describing data producers
 Strong meta-data facilities
      Reconstruction-on-demand matches data product concept very well
        – Grid virtual data products concept really just an extension
      Convenient mapping of data products to chunks: files, containers, …
      Scatter / gather: decompose jobs, gather data
        – One logical job can be chopped into many physical processes, we still
          know it is logically the same job no matter which process it is running in
 Adapts automatically to many environments without special
  configuration: interactive, batch, farm, stand-alone, trigger, …
      Through appropriate use of enabling techniques (transactions, locking, refs)
      No data post-processing required
      Well-matched to production tools (IMPALA)

   http://iguana.cern.ch   June, 2002                            Lassi A. Tuura, Northeastern University   12
                                                Object
                                                Access
                         Meta
                         Data
                                                                   DDL Source
                                                                   Processing
                                         Storage Transaction
                                         Manager  Manager


                                      Catalog         Schema
                                      Manager         Manager
MSS, Grid
 & Farm                                 Lock
                                                                     C++
                                                     File I/O      Binding
Interface                              Server
                                            Page
                                           Server
                                                     Objectivity


 http://iguana.cern.ch   June, 2002                                 Lassi A. Tuura, Northeastern University   13
                                              Queries
                                       Refs &
                                       Navigation
                                      Cache
                                                    Object
                                      Management    Access
                         Meta
                         Data
                                                                      DDL Source
                                                                      Processing
                                         Storage Transaction
                                         Manager  Manager


                                      Catalog            Schema
                                      Manager            Manager
MSS, Grid
 & Farm                                 Lock
                                                                        C++
                                                        File I/O      Binding
Interface                              Server
                                              Page
                                             Server
                                                        Objectivity


 http://iguana.cern.ch   June, 2002                                    Lassi A. Tuura, Northeastern University   14
                 Collections
                                                  Object
   Configurations
   (Data Sets)
                                                  Access
Object                     Meta
Naming                     Data
Run Resume &
                                                                     DDL Source
Crash Recovery                                                       Processing
                                           Storage Transaction
                                           Manager  Manager


                                        Catalog         Schema
                                        Manager         Manager
 MSS, Grid
  & Farm                                  Lock
                                                                       C++
                                                       File I/O      Binding
 Interface                               Server
                                              Page
                                             Server
                                                       Objectivity


   http://iguana.cern.ch   June, 2002                                 Lassi A. Tuura, Northeastern University   15
                                                    Object
                                                    Access
                             Meta
                             Data
                                                                       DDL Source
                                                                       Processing
                                             Storage Transaction
                                             Manager  Manager


                                          Catalog         Schema
                                          Manager         Manager
  MSS, Grid
   & Farm                                   Lock
                                                                         C++
                                                         File I/O      Binding
  Interface                                Server
File Size                                       Page
Control           System
                  Management                   Server
      Farm                                               Objectivity
      Management

     http://iguana.cern.ch   June, 2002                                 Lassi A. Tuura, Northeastern University   16
                            Frameworks: IGUANA
                    Data Browser
Generic analysis
     Tools
                                                    GRID
                                                              Distributed
Analysis job                           ORCA       Objy
                                      OSCAR COBRA tools
                                                              Data Store
  wizards
                                                            & Computing
                                       FAMOS                Infrastructure
Detector/Event
   Display                                          CMS
                                                    tools
                 Federation
                  wizards




                     Consistent User               Coherent basic tools
                            Interface              and mechanisms

 http://iguana.cern.ch   June, 2002                                Lassi A. Tuura, Northeastern University   17
       User Interface and Visualisation
 IGUANA: a generic toolkit for user interfaces and visualisation
      Builds on existing high-quality libraries (Qt, OpenInventor, Anaphe, …)
      Used to implement specific visualisation applications in other projects
 Main technical focus: provide a platform that makes it easy to
  integrate GUIs as a coherent whole, to provide application
  services and to visualise any application object
      Many categories / layers: GUI gadgets & support, application environment,
       data visualisers, data representation methods, control panels, …
      Designed to integrate with and into other applications
      Virtually everything is in plug-ins (can still be statically linked)
                                                                        Object
                                                                         Object
                                                                        Factory
                                        Plug-In       Plug-In            Factory                       Attached
        Component                         Plug-In       Plug-In
                                         Cache
                                            Plug-In       Plug-In
         Database                          Cache            Plug-In                                  Unattached
                                             Cache            Plug-In       Object
                                                                            Factory
   http://iguana.cern.ch   June, 2002                                              Lassi A. Tuura, Northeastern University   18
              Illustration: 3D Visualisation
    3D                          Twig
  Browser                      Browser




 QMDIShell                   QMDIShell
Browser Site                Browser Site




           QMainWindow
            Browser Site




  http://iguana.cern.ch   June, 2002       Lassi A. Tuura, Northeastern University   19
                          IGUANA GUI Integration

                                       Integration




Action
                                                     Visualise Results,
                                                      Modify Objects,
                                                     Further Interaction



  http://iguana.cern.ch   June, 2002                            Lassi A. Tuura, Northeastern University   20
                             Tomorrow and Beyond
 Leverage the current frameworks on the grid
      Many native COBRA concepts match well with grid
        – (Virtual) data products ~ reconstruction-on-demand
        – Recording and matching configuration and setup information
        – Production interfaces: catalogs, redirection, MSS hooks
        – Scatter/gather job decomposition, production environment
      COBRA-based applications can be encapsulated for distributed analysis
      IGUANA already separates application objects, model and viewer
        – Many possibilities for introducing distributed links
      IGUANA+COBRA provides a platform for a coherent, well-integrated
       interface no matter where the code runs and data comes and goes
        – Both have loads of knobs and hooks for integration
 Aiming at adapting the existing software where possible
      Adapt and work within CMS software (COBRA, ORCA, …) and
       existing analysis tools (ROOT, Lizard, …)—don’t replace them
   http://iguana.cern.ch   June, 2002                      Lassi A. Tuura, Northeastern University   21
       Prototypes: Clarens Web Portals
 Grid-enabling the working environment for
  physicists' data analysis                                        Service
 Communication with clients via the
  commodity XML-RPC protocol                                     Clarens
  Implementation independence
                                                               Web Server
 Server implemented in C++: access to the
  CMS OO analysis toolkit




                                                                                         http/https
 Server provides a remote API to Grid tools
      The Virtual Data Toolkit: Object collection access
      Data movement between tier centres using GSI-FTP
      CMS analysis software (ORCA/COBRA)
      Security services provided by the Grid (GSI)                    RPC
      No Globus needed on client side, only certificate
                                                                      Client

   http://iguana.cern.ch   June, 2002                       Lassi A. Tuura, Northeastern University   22
  Prototypes: Clarens Web Portals…
                           Production system and data repositories                               Tier
                                                                                                 0/1/2
                                  TAG and AOD extraction/conversion/transport services


ORCA analysis farm(s)                             PIAF/Proof/..           RDBMS
 (or distributed `farm’                           type analysis          based data
  using grid queues)                                 farm(s)            warehouse(s)              Tier
                                                                                                  1/2
 Production
  data flow
                                                   Data extraction        Query Web
                             Tool plugin
                                                   Web service(s)         service(s)
                              module
 TAGs/AODs
  data flow
                      Local analysis tool:                                                       Tier
                       Lizard/ROOT/…                   Local disk     Web browser                3/4/5
  Physics
 Query flow                                              User

   http://iguana.cern.ch    June, 2002                                          Lassi A. Tuura, Northeastern University   23
                                        Other Prototypes
 Tag database optimisation
      Fast sample selection is crucial
      Various models already tried
      Experimenting with RDBMS
 MOP: distributed job
  submission system
      Allows submission of CMS
       production jobs from a central
       location, run on remote locations,
       and return results
        – Job Specification: IMPALA
        – Replication: GDMP
        – Globus GRAM
        – Job Scheduling: Condor-G
          and local systems
   http://iguana.cern.ch   June, 2002                      Lassi A. Tuura, Northeastern University   24

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:2/18/2012
language:
pages:24