TCP based Transport protocol Layer by grapieroo6

VIEWS: 0 PAGES: 12

									A TCP/IP transport layer for the DAQ of
         the CMS Experiment

                          Miklos Kozlovszky
                   for the CMS TriDAS collaboration


                                 CERN
                    European Organization for Nuclear
                               Research




                     ACAT03 - December 2003
                               CMS & Data Acquisition

                                                                                        CMS




                                                                 Data          Data

Collision rate                             40 MHz      Level 1     Detector Frontend
Level-1 Maximum trigger rate               100 kHz     Trigger
                                                                                        Readout
Average event size                     -   1 Mbyte                                      Systems

No. of In-Out units                      1000           Event                                 Run
                                                                    Builder Networks
Readout network bandwidth              - 1 Terabit/s   Manager                               Control
Event filter computing power           - 5 10 6 MIPS
Data production                        - Tbyte/day                                      Filter
                                                                                        Systems

                                                                   Computing Services
                        Building the events
Event builder :
Physical system interconnecting data sources with data destinations. It has to
move each event data fragments into a same destination

                                        Event fragments                    2
                                                                      1
           1   2   3           512      :                                      3
                                        Event data
                                        fragments are
                                        stored in separated         512
                                        physical memory
                                        systems

                                     NxM EVB

                                        Full events :
                                        Full event data are
                                        stored into one physical
                                        memory system
                                        associated to a
                                        processing unit


512    Data sources for 1 MByte events
~1000s HTL processing nodes
                         XDAQ Framework
                                                                         Processing
• Distributed DAQ framework                      Util/DDM

  developed within CMS.                                  Sensor readout


• Construct homogeneous applications
  for heterogeneous processing clusters.
                                            XDAQ

• Multi-threaded (important to take                           PCI        HTTP

                                                     TCP
  advantage of SMP efficiently).
                                                   Ethernet    Myrinet

• Zero copy message passing for the         OS and Device Drivers


  event data.                                                            Subject
                                                                     of presentation
• Peer to peer communication between
  the applications.
   • I2O for data transport, and SOAP for
     configuration and control.
• Hardware and transport independency.
        TCP/IP Peer Transport Requirements

• Reuse old, “cheap” Ethernet for DAQ
• Transport layer requirements
  –   Reliable communication
  –   Hide the complexity of TCP
  –   Efficient implementation
  –   Simplex communication via sockets
  –   Configurable
       • Support of blocking and non-blocking I/O
        Implementation of the non-blocking mode

                                                  Framesen
                                                     d
                               XDAQ Application
                                                             Select
• Pending Queues                   1 2 3 4 5 n
  – Thread safe PQ management        1 2 3 4 5 n #2
                                       1 2 3 4 5 n #n
                                         Pending Queues
  – One PQ for each destination
  – Independent sending through sockets
• Only one “Select” function call both to
  receive the packet and send the blocked
  data.
 Communication via the transport layer
                 Applications (XDAQ)


                 XDAQ Executive                         XDAQ
                                                        Framework
Receiver Object(s)                  Sender Object(s)

  Input SAP(s)         ptATCP        Output SAP(s)        Peer
                                                          Transport
                                                          Layer
                    ptATCPPort(s)

                         OS
                      Driver(s)                   = Creation of object
                                                  = Sending
         NIC (FE)      NIC (GE)   NIC (10GE)      = Receiving
                                                  = other communication
                  Throughput optimisation


• Operating System tuning (kernel options+buffers)
• Jumbo Frames
• Transport protocol options                     Single rail Multi-rail
       • Communication techniques
          – Blocking vs. Non-Blocking I/O          App 1       App 1
          – Single/Multi-rail
          – Single/Multi-thread
          – TCP options (e.g.:Nagle algorithm)
                                                   App 2       App 2
          – ….
                           Test network
Cluster size:   8x8
CPU:            2x Intel Xeon (2.4 GHz), 512KB Cache
I/O system:     PCI-X: 4 buses (max 6) .
Memory:         Two-way interleaved DDR: 3.2 GB/s (512 MB)
NICs:           1 Intel 82540EM GE
                1 Broadcom NeXtreme BCM 5703x GE
                1 Intel Pro 2546EB GE (2port)
OS:             Linux RedHat 2.4.18-27.7 (SMP)
Switches:       1 BATM- T6 Multi Layer Gigabit Switch (medium range)
                2 Dell Power Connect 5224 (medium range)
                                         Event Building on the cluster

                             140
                                                                                  Conditions:
                                                                                  • XDAQ+Event Builder
                                                                                      –   No Readout Unit inputs
                             120                                                      –   No Builder Unit outputs
Throughput per Node (MB/s)




                                                                                      –   No Event Manager

                             100                                                  •   PC: dual P4 Xeon
                                                                                  •   Linux 2.4.19
                              80                                                  •   NIC: e-1000
                                                                                  •   Switch: Powerconnect 5224
                              60                                                  •   Standard MTU (1500 Bytes)
                                                         Working point            •   Each BU builds 128 events
                              40
                                                                                  •   Fixed fragment sizes
                              20           link BW (1Gbps)
                                           8x8 EVB [P4 e1000 Powerconnect 5224]     Result:
                                           32x32 EVB [P3 AceNIC FastIron8000]
                               0
                                                                                    For fragment size > 4 kB:
                                100   1000               10000               100000 •  Thru /node ~100 MB/s
                                      Fragment Size (Byte)                             i.e. 80% utilisation
Two Rail Event Builder measurements

                        Test case:
                        Bare Event Builder (2x2)
                        •   No RU inputs
                        •   No BU outputs
                        •   No Event Manager


                        Options:
                        • Non blocking TCP
                        • Jumbo frames (mtu 8000)
                        • Two rail
                        • One thread

                        RU working point (16 kB)
                        Throughput/node = 240 MB/ s
                        i.e. 95% bandwidth
                       Conclusions

• Achieved 100 MB/s per node in 8x8 configuration (1rail).
• Improvements seen with the use of two rail, non-blocking
  I/O, with Jumbo frames. In 2x2 configuration over 230
  MB/s obtained.
• High CPU load.
• We are also studying other networking and traffic shaping
  options.

								
To top