Docstoc

Task

Document Sample
Task Powered By Docstoc
					                H.G.Essel, J.Adamczewski, B.Kolb, M.Stockmeier



GSI, Oct 2005                Hans G. Essel DAQ Control           1
                 CMS: blueprint for clustered DAQ




 DAQ staging




TTC     Timing, T rigger and Control             FU    Filter Unit
TPD     Trigger Primitive Data                   FFN   Filter Farm Network
aT TS   asynchronous Trigger Throttle System     EVM   Event M anager
D2S     Data to Surface                          RCN   Readout Control Network
FRL     Frontend Readout Link                    BCN   Builder Control Network
RU      Readout Unit                             DCN   Detector Control Network
BU      Builder Unit                             DSN   DAQ Service Network

GSI, Oct 2005                                  Hans G. Essel DAQ Control          3
                   CMS DAQ: requirements

   •       Communication and Interoperability
            –     Transmission and reception within and across subsystem boundaries without regard of the used protocols
            –     Addition of protocols without a need for modifications in the applications
   •       Device Access
            –     Access to custom devices for configuration and readout
            –     Access to local and remote devices (bus adapters) without the need for modifications in applications
   •       Configuration, control and monitoring
            –     Make parameters of built-in or user defined types visible and allow their modification
            –     Allow the coordination of application components (define their states and modes)
            –     Allow the inspection of states and modes
            –     Provide services for recording structured information
                     •       Logging, error reporting
                     •       Interface to persistent data stores (preferrably without the need to adapt the applications)
                     •       Publish all information to interested subscribers
             –    Device allocation, sharing and concurrent access support


       •   Maintainability and Portability
            –     Allow portability across operating system and hardware platforms
            –     Support access for data across multiple bus systems
            –     Allow addition of new electronics without changes in user software
            –     Provide memory management functionality to
                         •    Improve robustness
                         •    Give room for efficiency improvements
             –     Application code shall be invariant with respect to the physical location and the network
             –     Possibility to factorise out re-usable building blocks
       •   Scalability
             –     Overhead introduced by the software environment must be constant for each transmission operation and small with respect to the
                   underlying communication hardware in order not to introduce unpredictable behaviour
             –     Allow applications to take advantage of additional resource availability
       •   Flexibility
             –     Allow the applications to use multiple communication channels concurrently
             –     Addition of components must not decrease the system’s capacity



GSI, Oct 2005                                                      Hans G. Essel DAQ Control                                                        4
                CMS XDAQ



      •   XDAQ is a framework targeted at data processing clusters
           – Can be used for general purpose applications
           – Has its origins in the I2O (Intelligent IO) specification
      •   The programming environment is designed as an executive
           – A program that runs on every host
           – User applications are C++ programmed plug-ins
           – Plug-ins are dynamically downloaded into the executives
           – The executive provides functionality for
                  •   Memory management
                  •   Systems programming
                         queues, tasks, semaphores, timers
                  •   Communication
                         Asynchronous peer-to-peer communication model
                         Incoming events (data, signals, …) are demultiplexed to callback functions of application components
      •   Services for configuration, control and monitoring
      •   Direct hardware access and manipulation services
      •   Persistency services




GSI, Oct 2005                                  Hans G. Essel DAQ Control                                                        5
          XDAQ Availability


                                  http://cern.ch/xdaq

         Platform OS CPU                              Description
         Linux (RH)       x86                         Baseline implementation

         Mac OS X         PPC G3, G4                  no HAL, no raw Ethernet PT

         Solaris          Sparc                       no HAL, ro raw Ethernet PT

         VxWorks          PPC 603, Intel x86          no GM

       Current version:         1.1
       Next releases:           V 1.2 in October 2002 (Daqlets)
                                V 1.3 in February 2003 (HAL inspector)

       Change control:          Via sourceforge: http://sourceforge.net/projects/xdaq
       Version control:         CVS at CERN
       License:                 BSD style

GSI, Oct 2005                          Hans G. Essel DAQ Control                        6
          XDAQ: References


          J. Gutleber, L. Orsini, « Software Architecture for Processing Clusters
          based on I2O », Cluster Computing, the Journal of Networks, Software and
          Applications, Baltzer Science Publishers, 5(1):55-64, 2002
          (goto http://cern.ch/gutleber for a draft version or contact me)
          The CMS collaboration, “CMS, The Trigger/DAQ Project”, Chapter 9 -
          “Online software infrastructure”, CMS TDR-6.2, in print (contact me for a
          draft), also available at http://cmsdoc.cern.ch/cms/TDR/DAQ/
          G. Antchev et al., “The CMS Event Builder Demonstrator and Results with
          Myrinet”, Computer Physics Communications 2189, Elsevier Science North-
          Holland, 2001 (contact Frans.Meijers@cern.ch)
          E. Barsotti, A. Booch, M. Bowden, “Effects of various event building
          techniques on data acquisition architectures”, Fermilab note, FERMILAB-
          CONF-90/61, USA, 1990.




GSI, Oct 2005                     Hans G. Essel DAQ Control                           7
                XDAQ event driven communication


•   Dynamically loaded application modules (from URL, from file)
•   Inbound/Outbound queue (pass frame pointers, zero-copy)
•   Homogeneous frame format




                                  Readout
                                  component
                                  Generates a DMA
                                  completion event
             Executive framework                                        Computer
             Demultiplexes
             incoming events to
             listener application
             component


                                                                   foo( )

                                                                            Application component
                                                                            Implements callback function
                Peer transport
                Receives
                messages from network
GSI, Oct 2005                              Hans G. Essel DAQ Control                                       8
                    XDAQ: I2O peer operation for clusters


•      Application component                device             •    Homogeneous communication
•      Processing node                      IOP                      – frameSend for local, remote,
                                                                        host
•      Controller node                      host
                                                                     – single addressing scheme (Tid)
                                                               •    Application framework




                       Messaging Layer                                     Messaging Layer

                                                                                              
                                                                               Peer Transport
                                Peer Transport
                               Agent                I2O Message                  Agent
                                                        Frames
                                      
                          Executive                                          Executive

            Application                                                                           Application
                                        Peer
                                      Transport
                                                                                           
    GSI, Oct 2005                              Hans G. Essel DAQ Control                                          9
                 XDAQWin client




Configuration tree
XML based configuration
of a XDAQ cluster                             Daqlet window
                                              Daqlets are Java applets that can be
                                              used to customize the configuration,
                                              control and monitoring of all components
                                              in the configuration tree
 GSI, Oct 2005                    Hans G. Essel DAQ Control                              11
                  XDAQ: component properties




Component Properties
Allows the inspection and
modification of components’
exported parameters.

  GSI, Oct 2005                    Hans G. Essel DAQ Control   13
                BTeV: a 20 THz real-time system



•    Input: 800 GB/s (2.5 MHz)
•    Level 1
     –   Lvl1 processing: 190s
         rate of 396 ns
     –   528 “8 GHz” G5 CPUs
         (factor of 50 event reduction)
     –   high performance interconnects
•    Level 2/3:
     –   Lvl 2 processing: 5 ms
         (factor of 10 event reduction)
     –   Lvl 3 processing: 135 ms
         (factor of 2 event reduction)
     –   1536 “12 GHz” CPUs commodity networking
•    Output: 200 MB/s (4 kHz) = 1-2 Petabytes/year




GSI, Oct 2005                             Hans G. Essel DAQ Control   15
                BTeV: The problem


•    Monitoring, Fault Tolerance and Fault Mitigation are crucial
     – In a cluster of this size, processes and daemons are constantly hanging/failing without
         warning or notice
•    Software reliability depends on
     – Physics detector-machine performance
     – Program testing procedures, implementation, and design quality
     – Behavior of the electronics (front-end and within the trigger)
•    Hardware failures will occur!
     – one to a few per week
•    Given the very complex nature of this system where thousands of events are simultaneously and
     asynchronously cooking, issues of data integrity, robustness, and monitoring are critically
     important and have the capacity to cripple a design if not dealt with at the outset…

     BTeV [needs to] supply the necessary level of “self-awareness” in the trigger system.

Real Time Embedded System




GSI, Oct 2005                        Hans G. Essel DAQ Control                                  16
                BTeV: RTES goals


•    High availability
     – Fault handling infrastructure capable of
         Accurately identifying problems (where, what, and why)
         Compensating for problems (shift the load, changing thresholds)
         Automated recovery procedures (restart / reconfiguration)
         Accurate accounting
         Extensibility (capturing new detection/recovery procedures)
         Policy driven monitoring and control
•    Dynamic reconfiguration
     – adjust to potentially changing resources
•    Faults must be detected/corrected ASAP
     – semi-autonomously
         with as little human intervention as possible
     –   distributed & hierarchical monitoring and control
•    Life-cycle maintainability and evolvability
     – to deal with new algorithms, new hardware and new versions of the OS




GSI, Oct 2005                               Hans G. Essel DAQ Control         17
                RTES deliverables


A hierarchical fault management system and toolkit:

     –   Model Integrated Computing
         •  GME (Generic Modeling Environment) system modeling tools
                –   and application specific “graphic languages” for modeling system configuration, messaging, fault
                    behaviors, user interface, etc.
     –   ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability)
         •      Robust framework for detection and reaction to faults in processes
     –   VLAs (Very Lightweight Agents for limited resource environments)
         •      To monitor/mitigate at every level
                –   DSP, Supervisory nodes, Linux farm, etc.




GSI, Oct 2005                                Hans G. Essel DAQ Control                                                 18
          RTES Development


• The Real Time Embedded System Group
  – A collaboration of five institutions,
    • University of Illinois
    • University of Pittsburgh
    • University of Syracuse
    • Vanderbilt University (PI)
    • Fermilab
• NSF ITR grant ACI-0121658
• Physicists and Computer Scientists/Electrical Engineers at
  BTeV institutions




GSI, Oct 2005                Hans G. Essel DAQ Control   19
                RTES structure


                                      Modeling                                                                                                             Analysis
                                  Resource Reconfigure




                                                                                                  Synthesis
                                                                                                                                               Performance
                                                                                                                                               Diagnosability
 Design
                       Feedback
                                                                  Fault                                                                        Reliability
                                  Algorithms Behavior
 and
                                       Synthesis
 Analysis
  Runtime
                   Global                                                              Region
                                      Logical Control Network

  Experiment       Operations                                                         Operations
  Control          Manager                                                               Mgr                                                       Region
  Interface                                                                                                                                        Fault Mgr



                                                                                                                    Logical Control Network
                                                                                                                    Logical Control Network




                                                                                                                                                                                                Logical Control Network
                                                                                                                                                                                                Logical Control Network
                                                                   Logical Data Net




                                                                                                                                              Logical Data Net
                    Global                                                            Trigger
                                                                                      Trigger
                                                                                        Trigger
                                                                                        Trigger
                                                                                      Algorithm
                                                                                      Algorithm
                                                                                                        Local
                                                                                                        Local
                                                                                                        Oper. Mgr
                                                                                                        Oper. Mgr
                                                                                                                                                                  Trigger
                                                                                                                                                                  Trigger
                                                                                                                                                                    Trigger
                                                                                                                                                                    Trigger
                                                                                                                                                                  Algorithm
                                                                                                                                                                  Algorithm
                                                                                                                                                                      Trigger
                                                                                                                                                                      Trigger
                                                                                                                                                                    Algorithm
                                                                                                                                                                    Algorithm
                                                                                                                                                                    TimeTrigger
                                                                                                                                                                        Trigger
                                                                                                                                                                      Algorithm
                                                                                                                                                                      Algorithm
                                                                                                                                                                                    Local
                                                                                                                                                                                    Local
                                                                                                                                                                                    Oper. Mgr
                                                                                                                                                                                    Oper. Mgr

                                                                                                                                                                                    Local
                                                                                                                                                                                    Local


                    Fault
                                                                                          Trigger                                                                   Time
                                                                                          Trigger
                                                                                        Algorithm
                                                                                        Algorithm                                                                       Algorithm
                                                                                                                                                                        Algorithm
                                                                                            Trigger
                                                                                            Trigger     Local
                                                                                                        Local                                                                       Fault Mgr
                                                                                                                                                                                    Fault Mgr
                                                                                          Algorithm
                                                                                        Time
                                                                                          Algorithm
                                                                                        Time
                                                                                            Algorithm
                                                                                            Algorithm   Fault Mgr
                                                                                                        Fault Mgr                                                        ARMOR/DSP
                                                                                                                                                                         ARMOR/DSP
                                                                                             ARMOR/Linux
                                                                                             ARMOR/Linux


                    Manager                                                                             Local
                                                                                                        Local                                                                       Local
                                                                                                                                                                                    Local                                                   Local
                                                                                                                                                                                                                                            Local
                                                                                                                                                                                                                          Trigger
                                                                                      Trigger
                                                                                      Trigger                                                                     Trigger
                                                                                                                                                                  Trigger
                                                                                                                                                                                                                          Trigger
                                                                                        Trigger         Oper. Mgr
                                                                                                        Oper. Mgr                                                   Trigger         Oper. Mgr
                                                                                                                                                                                    Oper. Mgr
                                                                                                                                                                                                                                            Oper. Mgr
                                                                                        Trigger                                                                     Trigger
                                                                                                                                                                                                                                            Oper. Mgr
                                                                                      Algorithm
                                                                                      Algorithm
                                                                                          Trigger
                                                                                          Trigger                                                                 Algorithm
                                                                                                                                                                  Algorithm
                                                                                                                                                                      Trigger
                                                                                                                                                                      Trigger
                                                                                                                                                                                                                            Trigger
                                                                                        Algorithm
                                                                                        Algorithm                                                                   Algorithm
                                                                                                                                                                    Algorithm
                                                                                                                                                                                                                            Trigger
                                                                                        TimeTrigger
                                                                                            Trigger
                                                                                          Algorithm     Local
                                                                                                        Local                                                       TimeTrigger
                                                                                                                                                                        Trigger
                                                                                                                                                                      Algorithm     Local
                                                                                                                                                                                    Local
                                                                                          Algorithm
                                                                                        Time                                                                          Algorithm
                                                                                                                                                                    Time
                                                                                            Algorithm
                                                                                            Algorithm   Fault Mgr
                                                                                                        Fault Mgr                                                       Algorithm
                                                                                                                                                                        Algorithm   Fault Mgr
                                                                                                                                                                                    Fault Mgr
                                                                                                                                                                                                                          Algorithm
                                                                                                                                                                                                                          Algorithm
                                                                                                                                                                                                                              Trigger
                                                                                                                                                                                                                              Trigger
                                                                                             ARMOR/Linux
                                                                                             ARMOR/Linux                                                                 ARMOR/DSP
                                                                                                                                                                         ARMOR/DSP                                          Algorithm
                                                                                                                                                                                                                            Algorithm
                                                                                                                                                                                                                                Trigger
                                                                                                                                                                                                                                Trigger
                                                                                                                                                                                                                              Algorithm
                                                                                                                                                                                                                            Time
                                                                                                                                                                                                                              Algorithm
                                                                                                                                                                                                                            Time            Local
                                                                                                                                                                                                                                            Local
                                                                                                                                                                                                                                Algorithm
                                                                                                                                                                                                                                Algorithm
                                                                             L2,3/CISC/RISC                                                                      L1/DSP                                                                     Fault Mgr
                                                                                                                                                                                                                                            Fault Mgr

                                                                                                                                                                                                                                 ARMOR/Linux
                                                                                                                                                                                                                                 ARMOR/Linux

                     Soft                                         Real Time                                                                                                                 Hard

GSI, Oct 2005                                                   Hans G. Essel DAQ Control                                                                                                                                                               20
                  GME: data type modeling




• Modeling of Data Types and
  Structures
• Configure marshalling-
  demarshalling interfaces for
  communication




  GSI, Oct 2005                    Hans G. Essel DAQ Control   23
                RTES: GME modeling environment




  Fault handling
  Process dataflow
  Hardware Configuration




GSI, Oct 2005                   Hans G. Essel DAQ Control   24
                RTES: GME fault mitigation modeling language (1)




                                                                                     C

                                                              •    Configuration of ARMOR
                                                                   Infrastructure (A)
                                                              •    Modeling of Fault
          A                                            B           Mitigation Strategies (B)
                                                              •    Specification of
                                                                   Communication Flow (C)




GSI, Oct 2005                     Hans G. Essel DAQ Control                              26
                RTES: GME fault mitigation modeling language (2)


     FMML Model – Behavior Aspect                                                                        ARMOR
                                                        Translator
                                                                                                         ARMOR Microkernel

                                                                Switch(cur_state)
                                                                case NOMINAL:
                                                                I f (time<100)
                                                                 {

                                                                 }
                                                                     next_state = FAULT;                    Fault
                                                                Break;
                                                                case FAULT

                                                                 {
                                                                  if ()                                    Tolerant
                                                                    next_state = NOMINAL;
                                                                 }
                                                                 break;                                    Custom
                                                                                                           Element


                                                             class armorcallback0:public Callback
                                                                                 {
                                                           public:ack0(ControlsCection *cc, void *p) :
                                                            CallbackFaultInjectTererbose>(cc, p) { }
                                                                void invoke(FaultInjecerbose* msg)
                                                                                    {
                                                                     printf("Callback. Recievede
                                                                                                                  Communication
                                                                   dtml_rcver_LocalArmor_ct *Lo;
                                                                 mc_message_ct *pmc = new m_ct;
                                                                 mc_bundle_ct *bundlepmc->ple();
                                                                        pmc->assign_name();
                                                                                                                    Custom
                                                                 bundle=pmc->push_bundle();mc);

                                                                                 };
                                                                                     }
                                                                                                                    Element




 •      Model translator generates fault-tolerant strategies and communication flow strategy from FMML models
 •      Strategies are plugged into ARMOR infrastructure as ARMOR elements
 •      ARMOR infrastructure uses these custom elements to provide customized fault-tolerant protection to the
        application




GSI, Oct 2005                                Hans G. Essel DAQ Control                                                      27
                ARMOR

•    Adaptive Reconfigurable Mobile Objects of Reliability:
      –   Multithreaded processes composed of replaceable building blocks
      –   Provide error detection and recovery services to user applications
•    Hierarchy of ARMOR processes form runtime environment:
      –   System management, error detection, and error recovery services distributed across ARMOR processes.
      –   ARMOR Runtime environment is itself self checking.
•    3-tiered ARMOR support of user application
      –   Completely transparent and external support
      –   Enhancement of standard libraries
      –   Instrumentation with ARMOR API

•    ARMOR processes designed to be reconfigurable:
      –   Internal architecture structured around event-driven modules called elements.
      –   Elements provide functionality of the runtime environment, error-detection capabilities, and recovery policies.
      –   Deployed ARMOR processes contain only elements necessary for required error detection and recovery
          services.
•    ARMOR processes resilient to errors by leveraging multiple detection and recovery mechanisms:
      –   Internal self-checking mechanisms to prevent failures from occurring and to limit error propagation.
      –   State protected through checkpointing.
      –   Detection and recovery of errors.
•    ARMOR runtime environment fault-tolerant and scalable:
      –   1-node, 2-node, and N-node configurations.




GSI, Oct 2005                               Hans G. Essel DAQ Control                                                 30
                  ARMOR system: basic configuration




                                                  Exec        App
          Execution ARMOR                        ARMOR      Process
          Oversees application process
          (e.g. the various Trigger                                        Daemons
           Supervisor/Monitors)                        Daemon              Detect ARMOR crash
                                                                           and hang failures
                                                            network

Heartbeat ARMOR                Daemon                 Daemon               Fault Tolerant Manager
Detects and recovers                                                       Highest ranking manager
FTM failures                                                               in the system
                               Heartbeat            Fault Tolerant
                               ARMOR                Manager (FTM)



                    ARMOR processes
                    Provide a hierarchy of error detection and recovery.
                    ARMORS are protected through checkpointing
                    and internal self-checking.



  GSI, Oct 2005                            Hans G. Essel DAQ Control                                 31
                  EPICS overview


EPICS is a set of software components and tools to develop control systems.
The basic components are:

                OPI (clients)
                     – Operator Interface. This is a UNIX or Windows based workstation which
                         can run various EPICS tools (MEDM, ALH, OracleArchiver).
                IOC (server)
                     – Input Output Controller. This can be VME/VXI based chassis containing a
                         Motorola 68xxx processor, various I/O modules, and VME modules that
                         provide access to other I/O buses such as GPIB, CANbus.
                LAN (communication)
                     – Local area network. This is the communication network which allows the
                         IOCs and OPIs to communicate. EPICS provides a software component,
                         Channel Access, which provides network transparent communication
                         between a Channel Access client and an arbitrary number of Channel
                         Access servers.




GSI, Oct 2005                            Hans G. Essel DAQ Control                               32
                    Hierarchy in a flat system




                                                                    IOC   tasks
•      IOCs
         – One IOC per standard CPU (Linux,
                                                              IOC
             Lynx, VxWorks)
•      clients                                                      IOC   tasks
         – on Linux, (Windows)
•      Agents                                        Client
         – Segment IOCs beeing also clients
                                                                    IOC   tasks
Name space architecture!
                                                              IOC

                                                                    IOC   tasks




    GSI, Oct 2005                       Hans G. Essel DAQ Control                 35
                Local communication (node)




                    commands
                                                                      Node


                                                      intertask
                                        Task



                                                                             Task
                                                                                command thread
                     status                                                       working thread
         IOC         segment                                                     message thread



                               memory
                                                  Task
                                                                  •    Commands handled by threads
                                                                  •    Execution maybe in working thread
                                                                  •    Message thread maybe not needed
                       messages



GSI, Oct 2005                      Hans G. Essel DAQ Control                                        36
                MBS node and monitor IOC




                                                           commands (text)



                                                                                    Task
   IOC

                                      asynchronous



  External                                                  status
                                Dispatcher                  segment
  control
                         on request
                                        Statusserver                                       Task
                 asynchronous
                                        Messageserver

                                                                  messages (text)




GSI, Oct 2005                         Hans G. Essel DAQ Control                                   37
                Screen shot FOPI




GSI, Oct 2005                      Hans G. Essel DAQ Control   38
                Kind of conclusion




       •    RTES: Very big and powerful. Not simply available!
             – Big collaboration
             – Fully modelled and simulated using GME
             – ARMORs for maximum fault tolerance and control
       •    XDAQ: Much smaller. Installed at GSI.
             – Dynamic configurations (XML)
             – Fault tolerance?
       •    EPICS: From accelerator controls community. Installed at GSI
             – Maybe best known
             – No fault tolerance
             – Not very dynamic




GSI, Oct 2005                        Hans G. Essel DAQ Control             39

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/20/2013
language:Unknown
pages:28