LHCb Trigger and Data Acquisition System

Document Sample
LHCb Trigger and Data Acquisition System Powered By Docstoc
					 Management of the LHCb Online
Network Based on SCADA System

        Guoming Liu*†, Niko Neufeld†

        * University of Ferrara, Italy
        † CERN, Geneva, Switzerland

 Introduction to LHCb Online system

 LHCb online network

 Network management based on SCADA system

 Summary

ICALEPCS2009               Guoming Liu
               LHCb online system

 LHCb is one of the large particle physics experiments on
 Online system is one of the infrastructures for LHCb,
  providing IT services for the entire experiment
 Three major components:
      Data Acquisition (DAQ)
          Transfers the event data from the detector front-end
          electronics to the permanent storage
      Timing and Fast Control (TFC)
          Provides fast clock and drives all stages of the data readout of
          the LHCb detector between the front-end electronics and the
          online processing farm
      Experiment Control System (ECS),
          Controls and monitors all parts of the experiment
ICALEPCS2009                        Guoming Liu
                LHCb online system

                                              VELO                                 ECal     HCal    Muon

                                                                                                              Experiment Control System (ECS)
                                                       ST        OT      RICH
        L0 trigger                             FEE     FEE      FEE      FEE       FEE      FEE     FEE
        LHC clock       TFC
                                             Readout Readout Readout Readout Readout Readout Readout
                       System                 Board   Board     Board    Board   Board     Board    Board
                             MEP Request
  CASTOR                                                    READOUT NETWORK

                                                                 Event building

                                            SWITCH SWITCH
                                            SWITCH               SWITCH     SWITCH        SWITCH    SWITCH

                                            C C C C   C C C C    C C C C    C C C C       C C C C   CC C C
                      CCCC                  P P P P   P P P P    P P P P    P P P P       P P P P   P P P P
                      PPPP                  U U U U   U U U U    U U U U    U U U U       U U U U   UU U U
                     MON farm                                      HLT farm
          Event data
          Timing and Fast Control Signals
          Control and Monitoring data

ICALEPCS2009                                                     Guoming Liu
               LHCb Online Network

 Two dedicated networks:
      Control network: general purpose network for experiment control
       Connects all the Ethernet devices in LHCb

      Data network: dedicated to data acquisition
        Performance critical

ICALEPCS2009                       Guoming Liu
               LHCb Online Network

   Two geographic parts: surface and underground
     Connected by two 10G links

ICALEPCS2009                Guoming Liu
                LHCb Online Network

                         On the surface

Core CTRL                                         Core DAQ
 Routers                                           Router

                                               DAQ Access
 CTRL Access                                  Switches (~50)
Switches (~100)

 ICALEPCS2009                   Guoming Liu
          Network Monitoring System based on SCADA

 Motivation
      This large network needs sophisticated monitoring
      Integration into LHCb ECS coherently
      Provides homogeneous interfaces for non-expert shift-crew

Commercial network management software?
      Expensive
      Integration?

ICALEPCS2009                      Guoming Liu
           Network Monitoring System: Architecture
 Supervisory layer
    PVSS II: commercial
     SCADA system
    JCOP: Joint Control
     Project for LHC
     experiments                            DIM
 Front–end Processes:
    SNMP
    sFlow
    syslog                     SNMP / sFlow / Syslog
 Data communication
    DIM: Distributed
     Information Management

 ICALEPCS2009                 Guoming Liu
          Network Monitoring System: FSM
 All behaviors are modeled as Finite State Machines (FSM)
 Hierarchical structure: status/command propagated
 Device Units:
    Device Description
    Device Access
    Based on PVSS II
     datapoint: Alarm
     Handling, Archiving,
     Trending etc.
 Control Units
    Abstract behavior
    Represents the
     associated sub-tree

  ICALEPCS2009                 Guoming Liu
           Network Monitoring System

The major items under monitor
 Physical topology
      Discovery of the network topology based on the Link Layer
       Discovery Protocol (LLDP)
      Discovery of the network nodes: based on the information in
       switches (ARP, MAC forwarding table)
 Traffic
      Octet / packet counters
      Discard/Error counters
 Switch status: CPU/Memory, temperature, power supply , . . .
 Data Paths for DAQ

ICALEPCS2009                       Guoming Liu
               Network Monitoring Snapshot(1): Topology

ICALEPCS2009                   Guoming Liu
          Network Monitoring Snapshot(2): traffic

ICALEPCS2009               Guoming Liu

 The network management system has been implemented
  based on the commercial SCADA system PVSS II and the
  framework JCOP

 It provides sophisticated monitoring of the network which
  are essential for our operation, i.e. switch status, traffic

 It provides the homogenous operation interface and
  intuitive display as well

 Currently only monitoring is provided, some control
  commands of switches to be integrated

ICALEPCS2009                  Guoming Liu
   Thanks for your attention!

ICALEPCS2009    Guoming Liu

ICALEPCS2009       Guoming Liu
           NMS Architecture:                front-end processes

 SNMP: Simple network management protocol
Used for general network monitoring, configuring
 sFlow:
      A sampling mechanism to capture traffic data
      Based on hardware.
      Two kinds of sFlow samples: flow samples and counter
  Used on the core switch to collect traffic counters:
  SNMP too slow, and consumes high CPU/Memory
 Syslog: event notification messages
      Three distinct parts: priority, header and message.
      The priority part represents both the facility and severity
       of the message.
ICALEPCS2009                        Guoming Liu
               Network Monitoring: hardware/system

 Syslog can collect some information not covered by SNMP

 Syslog server is setup to receive the syslog messages
  from the network devices and parse the messages.
  Alarm information:
      Hardware: temperature, fan status, power supply status
      System: CPU, memory, login authentication etc.

 All the messages with the priority higher than warning,
  will be sent to PVSS for further processing

ICALEPCS2009                       Guoming Liu
                 Network Monitoring: IP routing
 Monitoring the                                      Detector
  status of the                                                VELO     ST        OT     RICH      ECal     HCal    Muon
  routing using                   L0
  “ping“/”arping”                                               FEE     FEE      FEE      FEE      FEE      FEE     FEE

                                                              Readout Readout Readout Readout Readout Readout Readout

 Three stages for                                             Board   Board     Board   Board   Board     Board    Board
  the DAQ:                CASTOR
    1. From readout                                                          READOUT NETWORK1
        board to HLT
        farm                   3                                                  Event building
    2. From HLT Farm to                         2            SWITCH SWITCH
                                                             SWITCH               SWITCH     SWITCH       SWITCH    SWITCH
        the LHCb online                    SWITCH
        storage                            CCCC
                                                             C C C C   C C C C    C C C C   C C C C       C C C C   CC C C
                                                             P P P P   P P P P    P P P P   P P P P       P P P P   P P P P
    3. From the online                     PPPP
                                                             U U U U   U U U U    U U U U   U U U U       U U U U   UU U U

        storage to CERN                 MON farm                                    HLT farm
        CASTOR             Event data
                           Timing and Fast Control Signals
                           Control and Monitoring data

  ICALEPCS2009                                  Guoming Liu

Shared By: