Copia by xiaohuicaicai

VIEWS: 17 PAGES: 62

									SYNTHESIS of PIPELINED SYSTEMS
   for the CONTEMPORANEOUS
     EXECUTION of PERIODIC
 and APERIODIC TASKS with HARD
     REAL-TIME CONSTRAINTS

Paolo Palazzari
           Luca Baldini
                      Moreno Coli

ENEA – Computing and Modeling Unit
University “La Sapienza” – Electronic Engineering Dep’t
                                                          1
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       2
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       3
    Problem Statement
   We want to synthesize a synchronous
    pipelined system which executes both the
    task PSy , sustaining its throughput, and
    m mutually exclusive tasks PAs1, PAs2, …,
    PAsm whose activations are randomly
    triggered and whose results must be
    produced within a prefixed time.


                                           4
          Problem Statement
   We represent the tasks as Control Data Flow Graphs
    (CDFG) G = (N, E)

N = {n1, n2, …, nN}: operations of the task

            
E= ni , n j | ni , n j  N, n j is data/ctrl dependent on ni   
          (data and control dependencies)

                                                         5
Problem Statement
   Aperiodic tasks, characterized by random
    execution requests and called asynchronous
    to mark the difference with the synchronous
    nature of periodic tasks, are subjected to
    Real-Time constraints (RTC), collected in the
    set RTCAs = {RTCAs1, RTCAs2, ..., RTCAsm},
    where RTCAsi contains the RTC on the ith
    aperiodic task.
   Input data for the synchronous task PSy arrive
    with frequency fi = 1/Dt, being Dt the period
    characterizing PSy.
                                               6
  Problem Statement
We present a method to determine
 The target architecture: a (nearly) minimum
  set of HW devices to execute all the tasks
  (synchronous and asynchronous);
 The feasible mapping onto the architecture:
  the allocation and the scheduling on the HW
  resources so that PSy is executed sustaining its
  prefixed throughput and all the mutually
  exclusive asynchronous tasks PAs1, PAs2, …,
  PAsm satisfy the constraints in RTCAs.
                                                7
Problem Statement
   The adoption of a parallel system can
    be mandatory when Real Time
    Constraints are computationally
    demanding
   The iterative arrival of input data makes
    pipeline systems a very suitable
    solution for the problem.

                                           8
         Problem Statement
   Example of a pipeline serving the
    synchronous task PSy
                                                                                                            DATA INTRODUCTION
Iteration 1       S1   S2   S3   S4     S5   S6     S7   S8    S9    S 10                                   INTERVAL
Iteration 2                 S1   S2     S3   S4     S5   S6    S7    S8     S9   S 10                       DII = 2
Iteration 3                             S1   S2     S3   S4     S5   S6     S7   S8     S9   S 10

Iteration 4                                         S1   S2     S3   S4     S5   S6     S7   S8      S9   S 10
                                                               S1    S2     S3   S4     S5   S6      S7   S8      S9   S 10
Iteration 5
                                                                            S1   S2     S3   S4      S5   S6      S7   S8      S9   S 10
Iteration 6


                                                                                                                                                  t (ut)

              0         100           200         300         400      500            600           700          800          900          1000


                        Tck = 50 ut


                                                                                                                                                           9
     Problem Statement

   Sk = (k-1)Tck and Sk = kTck

   In a pipeline with L stages, SL denotes the
    last stage.

   DII = Dt/Tck


                                            10
      Problem Statement

   We assume the absence of
    synchronization delays due to control or
    data dependencies:

     Throughput of the pipeline system =1/DII.



                                            11
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       12
    Asynchronous events

We assume the asynchronous tasks
to be mutually exclusive, i.e. the
activation of only one asynchronous task
can be requested between two
successive activations of the periodic
task


                                           13
         Asynchronous events
    In red the asynchronous service requests in a pipelined
    system.
Iteration 1         S1     S2     S3     S4     S5   S6     S7    S8      S9   S 10                                                              DII = 2
Iteration 2                       S1     S2     S3   S4     S5    S6      S7   S8     S9     S 10

Iteration 3                                     S1   S2     S3    S4      S5   S6     S7     S8     S9   S 10

Iteration 4                                                 S1    S2      S3   S4     S5     S6     S7   S8      S9    S 10
                                                                          S1   S2     S3     S4     S5   S6      S7    S8      S9   S 10
Iteration 5
                                                                                      S1     S2     S3   S4      S5    S6      S7   S8      S9    S 10
Iteration 6

                    t0A1        t0A2                             t0A3                        t0A4                       t0A5
                                                                                                                                                                t (ut)

                0           100               200         300          400       500              600           700           800          900           1000



              (I0A1)            (I0A2)                           (I0A3)                    (I0A4)                     (I0A5)




                                                                                                                                                                  14
       Asynchronous events
   Like the synchronous events, we represent the
    asynchronous events

{PAs1, PAs2, ..., PAsm}

through a set of CDFG

ASG = {AsG1(NAs1,EAs1), ... , AsGm(NAsm,EAsm)}

                                                    15
       Asynchronous events
   We consider a unique CDFG made up by
    composing the graph of the periodic task with
    the m graphs of the aperiodic tasks:

    G(N, E) = SyG(NSy, ESy)  AsG1(NAs1, EAs1) 
              AsG2(NAs2, EAs2)  ..…….  AsGm(NAsm, EAsm)




                                                            16
       Asynchronous events
   Aperiodic tasks are subjected to Real-Time
    constraints (RTC):
RTC Asi
            
            
                i   i     i                          i
            S L As , D | PAs execution must finish by D 
                                                          
   As all RTC must be respected, mapping
    function M has to define a scheduling so that
           i
          S LAs    - Di  0  RTCAsi  RTCAs
                                                        17
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       18
    Mapping methodology
In order to develop         a   pipeline   system
implementing G

a HW resource rj = D(nj)
and
a time step Sk



must be associated to each nj  N
                                               19
     Mapping methodology
   We must determine the mapping function
       M: N  UR  S

   UR is the set of the used HW resources (each rj
    is replicated kj times),

    UR  r j  kj
                                    
                      r j  R  UR1 ,UR2 ,...,UR p            
           
            r11 , r12 ,...,r1k1      1 2           k2     1 2          kp
                                  , r2 , r2 ,...,r2 ,...,rp , rp ,...,rp    
                                                                       20
    Mapping methodology
   rj = D(ni) is the HW resource on which
    ni will be executed

   S(ni) is the stage of the pipeline, or the
    time step, in which ni will be executed



                                                 21
    Mapping methodology
We search for the mapping function M’
 which, for a given DII:

 Respects all the RTC
 Uses a minimum number ur of resources

 Gives the minimum pipeline length for the

  periodic task

                                        22
       Mapping methodology
   The mapping is determined by solving the
    following minimization problem:
      C ( M ' )  minC M   C1 ( M )  C 2 ( M )  C 3 ( M )
                  M


 C (M) is responsible of the fulfillment of all the RTC
 1

C2(M) minimizes the used silicon area

C3(M) minimizes the length of the pipeline.


                                                                   23
     Mapping methodology
   While searching for a mapping of G, we
    force the response to aperiodic tasks to
    be synchronous with the periodic task

   The execution of an aperiodic task,
    requested at a generic time instant t0, is
    delayed till the next start of the pipeline
    of the periodic task.

                                                  24
    Mapping methodology
Iteration 1         S1     S2     S3     S4     S5   S6     S7    S8      S9   S 10                                                              DII = 2
Iteration 2                       S1     S2     S3   S4     S5    S6      S7   S8     S9     S 10

Iteration 3                                     S1   S2     S3    S4      S5   S6     S7     S8     S9   S 10

Iteration 4                                                 S1    S2      S3   S4     S5     S6     S7   S8      S9    S 10
                                                                          S1   S2     S3     S4     S5   S6      S7    S8      S9   S 10
Iteration 5
                                                                                      S1     S2     S3   S4      S5    S6      S7   S8      S9    S 10
Iteration 6

                    t0A1        t0A2                             t0A3                        t0A4                       t0A5
                                                                                                                                                                t (ut)

                0           100               200         300          400       500              600           700           800          900           1000



              (I0A1)            (I0A2)                           (I0A3)                    (I0A4)                     (I0A5)




                                                                                                                                                                    25
    Mapping methodology
Iteration 1         S1     S2     S3     S4     S5   S6     S7    S8      S9   S 10                                                              DII = 2
Iteration 2                       S1     S2     S3   S4     S5    S6      S7   S8     S9     S 10

Iteration 3                                     S1   S2     S3    S4      S5   S6     S7     S8     S9   S 10

Iteration 4                                                 S1    S2      S3   S4     S5     S6     S7   S8      S9    S 10
                                                                          S1   S2     S3     S4     S5   S6      S7    S8      S9   S 10
Iteration 5
                                                                                      S1     S2     S3   S4      S5    S6      S7   S8      S9    S 10
Iteration 6

                    t0A1        t0A2                             t0A3                        t0A4                       t0A5
                                                                                                                                                                t (ut)

                0           100               200         300          400       500              600           700           800          900           1000



              (I0A1)            (I0A2)                           (I0A3)                    (I0A4)                     (I0A5)




                                                                                                                                                                   26
    Mapping methodology
In a pipelined system with DII=1

the used resource set is maximum
the execution time of each AsGi on the pipeline
is minimum
A lower bound for the execution time of AsGi is
given by the lowest execution time of the longest
path of AsGi:
LpAsi is such a lower bound, expressed in
number of clock cycles
                                               27
Mapping methodology
Maximum value allowed for DII,
 compatible with all the RTCAsiRTCAs:

   LpAsi  Tck gives the minimal execution
    time for AsGi

   The deadline associated to AsGi is Di.

                                             28
     Mapping methodology
Maximum value allowed for DII,
 compatible with all the RTCAsiRTCAs
 (continued):

   The request for the aperiodic task can be
    sensed immediately after the pipeline
    start, the aperiodic task will begin to be
    executed DIITck seconds after the
    request: at the next start of the pipeline.
                                                  29
       Mapping methodology
Maximum value allowed for DII, compatible
 with all the RTCAsiRTCAs (continued):

   A necessary condition to match all the
    RTCAsiRTCAs is that the lower bound of the
    execution time of each asynchronous task
    must be smaller than the associated
    deadline diminished by the DII, i.e.
    Di  DII  Tck + LpAsiTck ,  i = 1, 2, ..., m
                                                      30
    Mapping methodology
   Combining previous relations with a
    congruence condition between the
    period of the synchronous task (Dt) and
    the clock period (Tck), we obtain the set
    DIIp wich contains all the admissible
    DII values.



                                            31
      Mapping methodology
Steps of the Mapping methodology:

   A set of allowed values of DII is determined
   Sufficient HW resource set UR0 is determined
   At the end of optimization process the number of
    used resources ur could be less than ur0 if
    mutually exclusive nodes are contained in the
    graph

                                                  32
     Mapping methodology
Steps of the Mapping methodology (continued):

   An initial feasible mapping M0 is determined; SL0
    is the last time step needed to execute P by using
    M0.
   Starting from M0, we use the Simulated
    Annealing algorithm to solve the minimization
    problem

    C ( M ' )  minC M   C1 ( M )  C 2 ( M )  C 3 ( M )
                M
                                                            33
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       34
    Searching space
   In order to represent a mapping function
    M we adopt the formalism based on the
    Allocation Tables t(M)
   t(M) is a table with ur horizontal lines and
    DII vertical sectors Osi with i=1,2,...,DII
   Each Osi contains time steps Si+kDII (k=0,
    1, 2, ...) which will be overlapped during
    the execution of P
                                              35
Searching space
   Each node is assigned to a cell of t(M),
    i.e. it is associated to an HW resource
    and to a time step.

   For example, we consider the 23-node
    graph AsG1


                                          36
    Searching space

                                         AsG1

1        2        3         4             5            6              7            8
    A         A         A            A             A             A             A            A



    C         C         C            C             C             C             C            C
9       10        11        12            13           14            15            16



              A         A            A             A             A             A            A
         17        18           19            20            21            22           23




                                                                                                37
     Searching space
         For DII=3, a possible mapping M is described
          through the following t(M)
           OS1               OS2              OS3
     S1     S4    S7   S2    S5    S8   S3    S6    S9
A1   n1              n6             n17        
A2   n2              n7             n18        
A3   n3              n8                  n21   
A4   n4                   n19            n22   
A5   n5                   n20                 n23
C1         n15       n9             n12        
C2         n16       n10            n13        
C3                  n11            n14        
                                                          38
  Searching space
An allocation table t(M) must respect both

1. Causality condition

And the

2. Overlapping condition

                                         39
    Searching space
   We define the Ω searching space over
    which minimization of C(M) must be
    performed.
   Ω is the space containing all the feasible
    allocation tables:
    ={t(M) | t(M) is a feasible mapping};
   t M     t M      is not feasible.

                                             40
Searching space
   We can write the minimization problem
    in terms of the cost associated to the
    mapping M represented by the
    allocation table:

    C[t (M ' )]  min C[t (M )]
                t ( M ) 



                                        41
Searching space
   We solve the problem by using a
    Simulated Annealing (SA) algorithm
   SA requires the generation of a
    sequence of points belonging to the
    searching space; each point of the
    sequence must be close, according to a
    given measure criterion, to its
    predecessor and to its successor.
                                        42
    Searching space
   As  consists of allocation tables, we have
    to generate a sequence of allocation tables

    t(Mi)Neigh[t(Mi-1)]

being Neigh[t(M)] the set of the allocation
  tables adjacent to t(M) according to
  some adjacency criteria
                                              43
     Searching space
   Searching space connection:

Theorem 2. The  searching space is
connected adopting the adjacency
conditions.



                                      44
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       45
    Optimization by RT-PSA
    Algorithm

   We start from a feasible allocation table
    t(M0)

   We entrust in the optimization algorithm
    to find the wanted mapping M


                                                46
    Optimization by RT-PSA
    Algorithm

   We iterate over all the allowed values of
    DII

   The final result of the whole optimization
    process will be the allocation table
    characterized by minimal cost.


                                            47
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       48
     Results

   In order to illustrate the results
    achievable through the presented RT-
    PSA algorithm, we consider the
    following graphs



                                      49
Results
          SyG


                (1,A),   (2,B),   (3,B),   (4,A),   (5,B),
                (6,C), (7,A), (8,C), (9,E), (10,A),
                (11,C), (12,E), (13,B), (14,C), (15,E),
                (16,A), (17,B), (18,E), (19,C), (20,A),
                (21,C), (22,E), (23,A), (24,B), (25,C),
                (26,B), (27,B), (28,E), (29,A), (30,B),
                (31,E), (32,C), (33,B), (34,A), (35,C),
                (36,E), (37,B), (38,A), (39,E), (40,A),
                (41,B), (42,A), (43,C), (44,B), (45,E),
                (46,E), (47,B), (48,B), (49,A), (50,C).




                                                             50
    Results
                                         AsG1

1        2        3         4             5            6              7            8
    A         A         A            A             A             A             A            A



    C         C         C            C             C             C             C            C
9       10        11        12            13           14            15            16



              A         A            A             A             A             A            A
         17        18           19            20            21            22           23

                                                                                                51
Results
          AsG2




                 52
Results
          AsG3




                 53
        Results
   We have
    N = NSy + NAs1 + NAs2 + NAs3 = 50 + 23 + 25 + 28 = 126
    r1 = A,   r2 = B,       r3 = C,      r4 = E

   The execution times and resources areas are T(A) = 10ut,
    Ar(A) = 10us
    T(B) = 20ut,     Ar(B) = 10us
    T(C) = 30ut,     Ar(C) = 13us
    T(E) = 40ut,     Ar(E) = 15us
    Tr = 5ut,        Ar(mr) = 1us
                                                           54
     Results
   The input data interarrival period is Dt = 150ut
   We fix the pipeline clock cycle Tck = 50ut
   RTC are
    RTCAs1 = 300ut
    RTCAs2 = 250ut
    RTCAs3 = 350ut.

   The set of DII possible values is DIIp = {1, 3}

                                                      55
Results
    Results for DII = 3

              Cost                  Fulfilled
DII = 3                  ur   LSy
            Function                 RTC
Starting   2667.942692   37   12       1

Final       3.999681     29   20       3

                                                56
Results
   Results for DII = 1


                                        Fulfilled
DII = 1    Cost Function    ur    LSy
                                         RTC
Starting         9.554314   104   10       3

Final          7.799348     79    18       3



                                                    57
Outline of Presentation
   Problem statement
   Asynchronous events
   Mapping methodology
   Searching space
   Optimization by RT-PSA Algorithm
   Results
   Conclusions
                                       58
    Conclusions

   We presented an algorithm to optimize
    the mapping, into a dedicated pipeline
    system, of a periodic task PSy and m
    mutually exclusive aperiodic tasks PAs1,
    PAs2, … PAsm subjected to real time (RT)
    constraints


                                               59
      Conclusions
   The algorithm, while searching for a mapping
    which satisfies all RT constraints of the aperiodic
    tasks, tries to minimize the number of HW
    resources needed to implement the system as
    well the length of the schedule.

    The mapping optimization is formulated as a
    minimization problem that has been solved
    through the Simulated Annealing algorithm.
                                                     60
     Conclusions
   Mappings are represented through
    allocation tables.
    The searching space, as well adjacency
    criteria on it and a cost function evaluating
    the quality of a mapping have been
    defined.
   We demonstrated that the searching space
    containing all the feasible mappings is
    connected.
                                               61
Remarks

luca.baldini@ieee.org
palazzari@casaccia.enea.it




                             62

								
To top