The collectors

Shared by: fjzhangweiyun
Categories
Tags
-
Stats
views:
0
posted:
11/14/2012
language:
Latin
pages:
18
Document Sample
scope of work template
							Trace Generation to Simulate
Large Scale Distributed
Application


Olivier Dalle, Emiio P. Mancini   Mar. 8th, 2012
Outline
               • Introduction

               • The trace collection

               • The hierarchical architecture

               • The components

               • An example

               • Conclusion
O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   Mar. 8th, 2012 - 2
Introduction
• Most distributed systems, as the Grids, offer

    massively parallel but loosely coupled resources: an

    accurate application’s model can help the scheduling

    decisions

• Simulators of parallel and distributed applications

    need accurate model of application behavior: but the

    size of the traces for long running parallel

    applications tends to explode
O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   Mar. 8th, 2012 - 3
Introduction
• One solution is to buffer data locally, gathering them

    after the end of the program (post-mortem): there is

    some scalability issue



• We need to minimize the perturbation: the

    instrumentation compete with the application for the

    system’s resources.


O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   Mar. 8th, 2012 - 4
Introduction
• A distributed application is composed by a set of

     cooperating tasks

• The connection between them are in general not

     homogenous

• Networks may present some hierarchy, e.g. fat trees,

     multi switch hops ...

• Can we exploit that hierarchy on the trace

     generation/instrumentation purposes?

O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   -5
 The Trace Collection: a Simplified
 Schema

                                                              Gateway / Switch
                                                              Core         Core                      Core          Core
                        Gateway / Switch                                                             Gateway / Switch



                                                                    CPU                                    CPU

     Node Collector /
     Post processor
                                           Main Collector /
                                           Post processor
                                                                                     GPU
                                                                                 Local Collector /
                                                                                 Post processor
                                                                                                                        Node Collector /
                                                                                                                        Post processor


       Application                           Application                           Application                            Application
                                                                                     GPU
         Node 1                                Node 2                                Node 3                                 Node 4




The classical computational cluster execution model:
• Several task on several nodes (e.g., MPI)
 O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                         -6
 The Trace Collection: a Simplified
 Schema

                                                              Gateway / Switch


                        Gateway / Switch                                                             Gateway / Switch




     Node Collector /                      Main Collector /                      Local Collector /                      Node Collector /
     Post processor                        Post processor                        Post processor                         Post processor


       Application                           Application                           Application                            Application


         Node 1                                Node 2                                Node 3                                 Node 4




We need to measure some parameters on each task,
collect local data, and gather them.

 O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                         -7
     The Trace Collection: a Simplified Schema
In a Grid it is common to                                                                In HPC the bandwidth of upper
have a low quality                                                                       levels is shared between more
connecting link between the                                                              hosts than lower levels
V.O. sites                                                            Gateway / Switch


                                Gateway / Switch                                                              Gateway / Switch




             Node Collector /                      Main Collector /                       Local Collector /                      Node Collector /
             Post processor                        Post processor                         Post processor                         Post processor


               Application                           Application                            Application                            Application


                 Node 1                                Node 2                                 Node 3                                 Node 4




   We gather the data hierarchically, using local
   collectors, eventually making local decimations or pre-
   elaborations. We use the locality principle.
     O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                              -8
          The Trace Collection

                                   Management
               Simulator                                         Application               Sensors                     Collectors
                                      Unit

                                                                               a. Starts

                                                b. Starts with
                                              instrumentation
              1. Infrastructure                                                                  c. Estimate
                                                                                                  overhead

2. Execution withinitialization
                  instrumentation
                                                                               d. Event
                                                                                                     e. Event’s data
   Data collection update (e.g.,
3.5. Trace generation
      a. Environment
            Data collection
        a. LD_PRELOAD)
     a. b.Overhead estimation (e.g., mpiexec,
      b. Middleware launcher 4. Processing and Propagation
            Post-processing                                                                                                   f. Post processing
     b. Events’ measurement
        c. qsub …)
            Simulator’s trace generation
                                       a. Decimation                                                                           g. Propagation

                                       b. Compression data
                                                      h. Gathers


                                       c. Buffering
                                   i. Post processing

                       j. Traces       d. …




          O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                        -9
The architecture
                     Simulator
                                                               Traces
                                               Launching
                     Storage                                 management
                                                  unit
                                                                unit
                  Analysis
                                                   Management unit



                                                      Collectors                 Trace files
                                                      hierarchy




       Application                                                                             Application
                                    Post                                      Post
                                 processor                                 processor
                                  Buffer                                     Buffer
         Sensor                Client/Server                                                    Sensor
                                                                          Client/Server


                               Collector                                  Collector
        Operating                                                                              Operating
        systems                                                                                systems




O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                            - 10
      The sensors

The sensors:

• Instrument the application’s                                                 Simulator

                                                                               Storage
                                                                                                         Launching
                                                                                                            unit
                                                                                                                         Traces
                                                                                                                       management
                                                                                                                          unit

  tasks                                                                    Analysis
                                                                                                             Management unit



                                                                                                                Collectors                 Trace files
                                                                                                                hierarchy



• Compute the
  instrumentation’s overhead                                     Application
                                                                                              Post
                                                                                           processor
                                                                                                                                        Post
                                                                                                                                     processor
                                                                                                                                                         Application


                                                                                            Buffer                                     Buffer
                                                                  Sensor                 Client/Server                                                    Sensor
                                                                                                                                    Client/Server




• Collect the raw data
                                                                                         Collector                                  Collector
                                                                 Operating                                                                               Operating
                                                                 systems                                                                                 systems




• Send them to the first level
  collectors

      O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                                                - 11
      The sensors

• We assume the system to be
  heterogeneous                                                                Simulator
                                                                                                                         Traces
                                                                                                         Launching
                                                                               Storage                                 management
                                                                                                            unit
                                                                                                                          unit
                                                                           Analysis
                                                                                                             Management unit


• Every sensor makes an                                                                                         Collectors                 Trace files


  overhead analysis
                                                                                                                hierarchy




                                                                 Application                                                                             Application



• Then it propagates the
                                                                                              Post                                      Post
                                                                                           processor                                 processor
                                                                                            Buffer                                     Buffer
                                                                  Sensor                 Client/Server                                                    Sensor
                                                                                                                                    Client/Server



  information to the                                             Operating
                                                                                         Collector                                  Collector
                                                                                                                                                         Operating
                                                                 systems                                                                                 systems

  management unit




      O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                                                - 12
      The collectors

• The collectors gather data
  from sensors and from other                                                      Simulator

                                                                                   Storage
                                                                                                             Launching
                                                                                                                unit
                                                                                                                             Traces
                                                                                                                           management
                                                                                                                              unit

  collectors                                                                   Analysis
                                                                                                                 Management unit



                                                                                                                    Collectors                 Trace files
                                                                                                                    hierarchy




• Buffer incoming data                                               Application                                                                             Application
                                                                                                  Post                                      Post
                                                                                               processor                                 processor
                                                                                                Buffer                                     Buffer




• Process collected data before
                                                                      Sensor                 Client/Server                                                    Sensor
                                                                                                                                        Client/Server


                                                                                             Collector                                  Collector
                                                                     Operating                                                                               Operating
                                                                     systems

  sending them to upper levels                                                                                                                               systems




   • Decimation
   • Compression
   • …


      O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                       Dec. 14th, 2011 - 13
      The Management Unit
• Launches the collector
  daemons

• Launches the application

• Gather the data from the top
  collector                                                                     Simulator
                                                                                                                          Traces
                                                                                                          Launching
                                                                                Storage                                 management
                                                                                                             unit
                                                                                                                           unit
                                                                             Analysis
                                                                                                              Management unit


• Convert and store the data in                                                                                  Collectors
                                                                                                                 hierarchy
                                                                                                                                            Trace files



  the required format
                                                                  Application                                                                             Application
                                                                                               Post                                      Post
                                                                                            processor                                 processor



• Managed with scripts or
                                                                                             Buffer                                     Buffer
                                                                    Sensor                Client/Server                                                    Sensor
                                                                                                                                     Client/Server


                                                                                          Collector                                  Collector


  graphical interface                                              Operating
                                                                   systems
                                                                                                                                                          Operating
                                                                                                                                                          systems




      O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                                                 - 14
An Example of Data Collection

•    We are interested to analyze the I/O of a parallel
     synthetic benchmark

•    We want to check the overhead

•    The benchmark is a MPI application of n tasks

•    Every task runs on a different node and writes
     random data on the local file system




O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   - 15
     An Example of Data Collection:

We use the management unit to:




                                         0
                                                                                                                                 fwrite




                                        70
                                                                                                                    instrumented fwrite




                                         0
                                        60
    <?xml a hierarchical schema
1. Createversion="1.0" encoding="UTF-8"?>                                                                                                      Simulator




                                         0
                                        50
      <dtracer>                                                                                              Management Unit                        Storage

                                                                                                                                                     Analysis
       <collector host="127.0.0.1" desc="Local">

                                          0
                                        40
                                   ms
                                                                                                                     Main

2.   Create the MPI launch scripts desc="Node 1">
        <collector host="192.168.56.101"
                                                                                                                    Collector




                                         0
          <task cmd="hostname"/>        30
                                                                                             Local                                           Local
                                         0                                                  Collector                                       Collector

        </collector>
                                        20



3.                  collectors and the \
     Launch the$MPIDIR/mpiexec.hydradesc="Node 2">
                             mpiexec
        <collector host="192.168.56.102"
                                                                                 Node
                                                                                Collector
                                                                                                         Node
                                                                                                        Collector
                                                                                                                                 Node
                                                                                                                                Collector
                                                                                                                                                         Node
                                                                                                                                                        Collector
                                         0




     instrumented application-env LD_PRELOAD \ \
                                        10




                             qsub
          <task cmd="hostname"/>
                             …     $DTDIR/libdt_sensor.so              Sensor               Sensor                                          Sensor                  Sensor
                                         0




        </collector>




                                                                                                                                               8K


                                                                                                                                                         6K
                                                                                                                K


                                                                                                                          K


                                                                                                                                     K
                                                        8


                                                             6


                                                                  2
                                   $HOME/bench/bench


                                                                          1K


                                                                                   2K


                                                                                             4K


                                                                                                        8K
                                         16


                                              32


                                                   64


                                                        12


                                                             25


                                                                  51




                                                                                                               16


                                                                                                                         32


                                                                                                                                   65


                                                                                                                                             12


                                                                                                                                                        25
       </collector>                          Bytes
1.   Collect the results
      </dtracer>




     O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications                                                                                  - 16
Conclusion
 Collecting large traces in distributed systems may
 perturb the application’s execution.

 We presented a system that efficiently collects
 traces at run-time or post-mortem.

 We use a hierarchical schema matching the
 network links’ capacity, with distributed buffering
 and processing

 Future improvement will include the automatic
 discovery of the network topology

O.Dalle, E.P. Mancini - Trace generation for large scale distributed applications   - 17
Thank you

						
Other docs by fjzhangweiyun
Unwrapping Easter.doc - Cpmethodist.org
Views: 1  |  Downloads: 0
Untitled - The Gold Mine
Views: 1  |  Downloads: 0
Untitled - Birks
Views: 1  |  Downloads: 0
Unit 6_ Global Struggles
Views: 1  |  Downloads: 0
UKEPLAN 47 2012.docx - Elverumskolen
Views: 1  |  Downloads: 0
Training Manualxale.docx - chayspace
Views: 1  |  Downloads: 0
Training Manual - CM Backcountry Rentals
Views: 0  |  Downloads: 0