Modeling and Analysis of Markov Chains Using Decision Diagrams

W
Document Sample
scope of work template
							Modeling and Analysis of Markov Chains
       Using Decision Diagrams


                          Gianfranco Ciardo
           Department of Computer Science and Engineering
                   University of California at Riverside
                        Riverside, CA 92521, USA
                           ciardo@cs.ucr.edu


   (partially based on work with Andrew S. Miner and Paul L. E. Grieco)
2
                                                    Outline
    • Background on CTMCs
       ◦ Continuous–time Markov chains and high–level CTMC models
       ◦ Matrix–vector description and operations for CTMC solution
       ◦ Sparse matrices
    • Structured models
       ◦ High–level CTMC models structured into submodels
       ◦ State space and state indexing: potential vs. actual indices
       ◦ Multiway decision diagrams (MDDs)
    • Decision–diagram–based CTMC encodings
       ◦ Multiterminal multiway decision diagrams (MTMDDs)
       ◦ Kronecker descriptors
       ◦ Matrix diagrams (MXDs)
       ◦ Edge–valued multiway decision diagrams (EVMDDs)
    • Solution algorithms
       ◦ Vector–matrix multiplication with Kronecker–based encodings
       ◦ Jacobi vs. Gauss–Seidel style iterations
Background on CTMCs
4
                                           Discrete-state models



A discrete state model is fully specified by:

    • a potential state space S                                                     the “type” of the states

    • a set of initial states S init ⊆ S                           often there is a single initial state sinit
                                           b
    • a next-state function N : S → 2S                naturally extended to sets:   N (X ) =      i∈X   N (i)




The state space S of the model is the smallest set containing S init and satisfying:
    • the recursive definition                                              i ∈ S ∧ j ∈ N (i) ⇒ j ∈ S
    • or the fixed-point equation                                                         X = X ∪ N (X )


               S = S init ∪ N (S init ) ∪ N 2 (S init ) ∪ N 3 (S init ) ∪ · · · = N ∗ (S init )
5
                     Definition of continuous-time Markov chain


A stochastic process {X(t)    : t ≥ 0} is a collection of r.v.’s indexed by a time parameter t

We say that X(t) is the state of the process at time t

The possible values X(t) can ever assume for any t is (a subset of) the state space S



{X(t) : t ≥ 0} over a discrete S is a continuous-time Markov chain (CTMC) if
     Pr {X(tn+1 )     = in+1 | X(tn ) = in ∧ X(tn−1 ) = in−1 ∧ . . . ∧ X(t0 ) = i0 }

                             = Pr {X(tn+1 ) = in+1 | X(tn ) = in }
for any times 0   ≤ t0 ≤ . . . ≤ tn−1 ≤ tn ≤ tn+1 and states {i0 , . . . , in−1 , in , jn+1 } ⊆ S



                                       Markov property:
      “given the present state, the future is independent of the past”
        “the most recent knowledge about the state is all we need”
6
                             Markov chain description and analysis

A continuous-time Markov chain (CTMC) {X(t)            : t ≥ 0} with state space S is described by
    • its infinitesimal generator                Q = R − diag(R · 1) = R − diag(h)−1 ∈ R|S|×|S|
    • its initial probability vector                                                           π(0) ∈ R|S|

where
    • R is the transition rate matrix:                 R[i, j] is the rate of going to state j when in state i
    • h is the expected holding time vector :                                      h[i] = 1/     j∈S   R[i, j]
    • π(0)[i] = Pr {chain is in state i at time 0, i.e., initially}



Transient probability vector     π(t) ∈ R|S| :                                   π(t)[i] = Pr {X(t) = i}
                                   dπ(t)
    • π(t) is the solution of            = π(t) · Q with initial condition π(0)
                                    dt



Steady-state probability vector        π ∈ R|S| :                           π[i] = limt→∞ Pr {X(t) = i}
    • π is the solution of π · Q = 0 subject to            i∈S   π[i] = 1              (Q must be ergodic)
7
                                          Storing a CTMC explicitly

We need to store:
    • R       a real matrix of size |S| × |S|
    • h       a real vector of size |S|


We focus on R, for which we can employ sparse storage:
    • Requires memory proportional to η(R), the number of nonzeros in R, instead of |S|2
    • Allows iterative numerical methods to run in time proportional to η(R), instead of |S|2

                                           0    1   2    3     4
                    α
          0                 1         0         α   β                      0      1   α    2        β
                    γ
                ε                     1     γ                              1      0   γ

          β         4                 2                 α                  2      3   α

                        δ             3             γ         δ            3      2    γ   4    δ
                    α
          2                 3         4    ε                               4      0    ε
                    γ
8
                       Exploiting common entries with explicit storage

We can map the real values in R to (small) integer indices:
    • Causes a small runtime overhead
    • Reduces memory by a constant factor if R contains several entries with the same value
    • If all nonzero entries have the same value (e.g., 1), this is essentially the reachability graph



                   α
           0                1         0      1   α    2   β            0       10 2 1
                   γ
               ε                      1      0   γ                     1       0 2

          β        4                  2      3   α                     2       3 0

                        δ             3      2    γ   4   δ            3       2 2 43
                   α
           2                3         4      0    ε                    4       0 4
                   γ
                                                                           0    1     2     3     4
                                                                           α    β     γ     δ      ε
Structured models
10
                              Structured discrete-state models



A structured discrete state model is specified by

 • a potential state space S = SK × · · · × S1 = ×K≥k≥1 Sk
     ◦ the “type” of the (global) state
     ◦ Sk is the (discrete) local state space for submodel k
 • a set of initial states S init ⊆ S
     ◦ often there is a single initial state sinit
 • a set of events E defining a disjunctively-partitioned next-state function
                       b
     ◦ N α : S → 2S                       j ∈ Nα (i) iff state j can be reached by firing event α in state i
                     b
     ◦ N : S → 2S is defined by N (i) =               α∈E   Nα (i)
     ◦ we can extend N to take sets of states as argument N (X ) =              i∈X   N (i)
     ◦ α is enabled in i iff Nα (i) = ∅, otherwise it is disabled
     ◦ i is absorbing, or a trap, or dead iff N (i) = ∅
                         Petri nets and their state space S : finite case
11




                                                                   e fires            p

                                                                                          a
                                                                                  q               s


                p                                  p                          b       c       d                         p




                                                                        es




                                                                                                      d
                                                                     fir




                                                                                                          fir
                     a                                 a                                                                    a




                                                                                                           es
                                                                    c
                                                                         es
            q                s                 q               s                  r           t                     q               s




                                                                        fir
                                                                                          e




                                                                     b
                                 a fires                                              p
        b        c       d                 b       c       d                                                    b       c       d




                                                                                                          s
                                                                                                          e
                                                                                                      fir
                                                                                          a




                                                                                                      c
             r           t                     r           t         d            q               s                 r           t




                                                                                                          es
                                                                        fir
                     e                                 e                 es                                                 e




                                                                                                      fir
                                                                                                      b
                                                                              b       c       d


                                                                                  r           t
                                                                                          e

                             init                                  (N + 1)(N + 2)(2N + 3)
If the initial state is s           = (N , 0, 0, 0, 0), S contains                        states
                                                                             6
12
                                               State indexing


Let the size of the k th local state space be nk    = |Sk | and map Sk to {0, 1, . . . , nk −1}


S contains |S| = nK · nK−1 · · · n1 states, but not all of them are actually reachable


A potential indexing of a (global) state i   = (iK , . . . , i1 ) is simply the mixed-base value of i:
                                                                         

                                   ψ(i) =               i k           nl 
                                               K≥k≥1           k>l≥1




An actual indexing for the reachable states is instead harder to define, store, and compute

                                ψ : S → {0, 1, . . . , |S|−1} ∪ {null}



              in reality, we often have no a priori knowledge of Sk
                                  Explicit generation of S and R
13




 Explore(in: S init , N ; out: S, R, ψ) is
  1.   n ← 0;                                                                            state indices start at 0
  2.   S ← ∅;                                                              S contains the states explored so far
  3.   U ← S init ;                                             U   contains the unexplored states known so far
                      init
  4.   for each i ∈ S      do
  5.        ψ(i) ← n++;                                    assign to i the next available index and increment n
  6.   end for
  7.   while U = ∅ do
  8.       choose a state i in U and move it from U to S ;
  9.       for each event α ∈ E and each state j ∈ Nα (i) do
 10.            if j ∈ S ∪ U then                                 search to determine whether j is a new state
 11.                 ψ(j) ← n++;                           assign to j the next available index and increment n
 12.                 U ← U ∪ {j};                                                    remember to explore j later
 13.            end if;
 14.            R[ψ(i), ψ(j)] ← R[ψ(i), ψ(j)] + λα (i)∆α (i, j);                            ψ is used to index R
 15.       end for;
 16.   end while;


ψ : S → {0, ..., |S|−1} ∪ {null} is a state indexing function                              (e.g., discovery order)
λα (i) is the rate at which event α fires in state i
∆α (i, j) is the probability that, if event α fires in state i, the next state is j
14
             (Quasi–reduced ordered) multiway decision diagrams
 • Nodes are organized into K + 1 levels
      ◦ Level K contains only one root node
      ◦ Levels K −1 through 1 contain one or more nodes, NO DUPLICATES
      ◦ Level 0 contains only the two terminal nodes, 0 and 1 (false and true).
 • For k > 0, a node at level k has |Sk | arcs pointing to nodes at level k−1

                     S4 = {0, 1, 2, 3}                      0 1 2 3

                     S3 = {0, 1, 2}          0 1 2              0 1 2           0 1 2

                     S2 = {0, 1}             0 1           0 1        0 1           0 1

                     S1 = {0, 1, 2}          0 1 2              0 1 2           0 1 2

                                                       0                    1
                                                                                                           
         0      1    1   1    1   1     2    2    2        2     2     3       3    3    3   3   3   3   3 
     S=   2      0    0   1    1   2     0    0    1        1     2     0       1    2    2   2   2   2   2
         1      0    1   0    1   1     0    1    0        1     1     1       1    0    0   0   1   1   1 
          0      0    0   0    0   0     0    0    0        0     0     0       0    0    1   2   0   1   2

[Kam et al. 1998] defined fully–reduced ordered MDDs as an interface to BDDs
15
       How to define and store the indexing function ψ using MDDs
                     S4 = {0, 1, 2, 3}                   0 1 2 3

                     S3 = {0, 1, 2}         0 1 2           0 1 2           0 1 2

                     S2 = {0, 1}            0 1         0 1       0 1           0 1

                     S1 = {0, 1, 2}         0 1 2           0 1 2           0 1 2

                                                    0                   1
                                                                                                          
         0     1   1    1   1    1    2   2    2       2     2   3         3    3    3     3    3   3   3 
     S=   2     0   0    1   1    2    0   0    1       1     2   0         1    2    2     2    2   2   2
         1     0   1    0   1    1    0   1    0       1     1   1         1    0    0     0    1   1   1 
          0     0   0    0   0    0    0   0    0       0     0   0         0    0    1     2    0   1   2

To compute the index of a state, use edge–value offsets:                          19 0 1 2 3
                                                                                     0 1 6 11
 • Sum the offsets found on the corresponding path:
                                                                                1 2 5 0 1 2 8 0 1 2
              ψ(2, 1, 1, 0) = 6 + 2 + 1 + 0 = 9                                   0   0 2 4   0 1 2

 • A state is unreachable if the path is not complete:                           2 0 1          1 1 6 0 1
                                                                                   0 1            0   0 3
              ψ(0, 2, 0, 0) = 0 + 0 + ? + ? = null
                                                                                          1 0     3 0 1 2
      lexicographic, not discovery, order!!!                                                0       0 1 2
16
                State indexing options: potential ψ vs. actual ψ


Once we know S :
                                           b
 • We can store the original N : S → 2S or its restriction N : S → 2S
 • We can store R : S × S → R or R : S × S → R
 • We can choose algorithms that use π : S → R or π : S → R



With strictly explicit methods: using actual R and π works best



With (even just partially) implicit methods, there are tradeoffs
 • Storing π instead of π is often unavoidable if we employ a full vector
 • Symbolic storage of R is much cheaper than that of R in terms of memory requirements
 • However, using R in conjunction with π complicates indexing...
 • ...at the very least, it forces us to store ψ : S → {0, 1, . . . , |S| − 1} ∪ {null}, hence S
17
     Saturation: an iteration strategy based on the model structure

MDD node p at level k is saturated if the set of states it encodes is a fixed point w.r.t.
any α s.t. Top(α) ≤ k                    (thus, all nodes below p are also saturated)



 • build the K -level MDD encoding of S init             (if |S init |   = 1, there is one node per level)
 • saturate each node at level 1: fire in them all events α s.t. Top(α) = 1
 • saturate each node at level 2: fire in them all events α s.t. Top(α) = 2
   (if this creates nodes at level 1, saturate them immediately upon creation)

 • saturate each node at level 3: fire in them all events α s.t. Top(α) = 3
   (if this creates nodes at levels 2 or 1, saturate them immediately upon creation)

 • ...
 • saturate the root node at level K : fire in it all events α s.t. Top(α) = K
   (if this creates nodes at levels K −1, K −2, . . . , 1, saturate them immediately upon creation)




                states are not discovered in breadth-first order
18
            Solution requirements: SMART vs. NuSMV (800MHz P-III)

Time and memory to generate S using saturation in SMART vs. breadth–first iterations in NuSMV


                             Final memory (kB)    Peak memory (kB)          Time (sec)
     N            |S|         SMART   NuSMV       SMART   NuSMV          SMART    NuSMV
     Dining Philosophers:    K=N
     50     2.23×1031            18     10,800        22       10,819     0.15         5.9
    200     2.47×10125           74     27,155        93       72,199     0.68    12,905.7
 10,000     4.26×106269       3,749         —      4,686           —    877.82          —
     Slotted Ring Network:   K=N
       10   8.29×109              4      5,287        28       10,819     0.13         5.5
       15   1.46×1015            10      9,386        80       13,573     0.39     2,039.5
      200   8.38×10211        1,729         —    120,316           —    902.11          —
     Round Robin Mutual Exclusion:    K =N +1
       20   4.72×107             18      7,300        20        7,306     0.07         0.8
      100   2.85×1032           356     16,228       372       26,628     3.81     2,475.3
      300   1.37×1093         3,063         —      3,109           —    140.98          —
     Flexible Manufacturing System:   K = 19
       10   4.28×106             16      1,707        26       11,238     0.05         9.4
       20   3.84×109             55     14,077       101       31,718     0.20     1,747.8
      250   3.47×1026        25,507         —     69,087           —    231.17          —
Decision–diagram–based CTMC encodings
20
                     Multiterminal multiway decision diagrams


From BDDs to MDDs: allow multiway choices at each nonterminal node, not just binary choices

From BDDs to MTBDDs: allow multiple terminal nodes, not just 0 and 1

From BDDs to MTMDDs combine both generalizations


We can use a quasi–reduced MTMDD to encode a real matrix A          :S ×S →R
 • Nodes are organized into 2K + 1 levels
     ◦ Variables {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } are mapped onto {2K, 2K −1, ..., 1}
     ◦ Let v(l) be the variable corresponding to level l ∈ {2K, 2K −1, ..., 1}
     ◦ Level 2K contains only one root node
     ◦ Levels 2K −1 through 1 contain one or more nodes, no duplicate nodes allowed
     ◦ Level 0 contains as many nodes as the different entries in A
 • A node at level l > 0, with v(l) = ik or jk , has |Sk | arcs pointing to nodes at level l−1


A[i, j] = x ⇔ path labelled {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } leads to node x at level 0
21
                   MTMDDs encoding of the transition rate matrix



In principle, the mapping v can be any permutation of the 2K variables

In practice, the order (iK , jK , iK−1 , jK−1 , ..., i1 , j1 ) is usually the best choice
(we still need to decide a good order for the K “from-to” pairs)




When using MTMDDs to store the transition rate matrix, we have a choice:

 • Store R : S × S → R                                     (note that   R[i, j] = 0 if i ∈ S and j ∈ S )
     ◦ a natural choice if we use a compositional approach
 • Store R : S × S → R                   (actually, S   × S → R, but R[i, j] = 0 if i ∈ S or j ∈ S )
     ◦ usually requires more MTMDD nodes
     ◦ can be built by enumerating the entries explicitly and storing them implicitly in an MTMDD
     ◦ or by zeroing the rows corresponding to S \ S in the MTMDD encoding of R
       i.e., premultiplying R by a filtering diagonal matrix F[i, i] = 1 if i ∈ S , 0 if i ∈ S \ S
22
                                             An example of MTMDD

S4 : {p1,p0} ≡ {0,1}         S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2}    S2 : {s0,s1} ≡ {0,1}            S1 : {t0,t1} ≡ {0,1}

                     p
                                                                      R     0 1
                 a
         q               s                                        0 1                      0 1


                                                       0 1 2               0 1 2                      0 1 2
     b       c       d
                                             0 1 2               0 1 2     0 1 2           0 1 2              0 1 2

         r           t
                 e                       0 1            0 1                 0 1               0 1             0 1

                                     0 1             0 1        0 1       0 1        0 1      0 1         0 1        0 1


                                     0 1                0 1                 0 1               0 1             0 1

                                 0 1        0 1      0 1        0 1       0 1        0 1      0 1             0 1

                                       µa                  µc                   µb               µd             µe
23
                             Definition of Kronecker product




Given K matrices Ak    ∈ Rnk ×nk , their Kronecker product is

                              A=             Ak ∈ RnK ···n1 ×nK ···n1
                                    K≥k≥1

where
 • A[i, j] = AK [iK , jK ] · AK−1 [iK−1 , jK−1 ] · · · A1 [i1 , j1 ]
 • using the mixed-base numbering scheme (indices start at 0)

          i = (...((iK ) · nK−1 + iK−1 ) · nK−2 · · · ) · n1 + i1 =             ik ·           nl
                                                                        K≥k≥1          k>l≥1




               nonzeros:          η                Ak        =           η(Ak )
                                        K≥k≥1                    K≥k≥1
24
                             Kronecker product by example
                                                                                      
                                                                     b00   b01   b02
                                   a00    a01                                         
Given the real matrices   A=                        and    B =  b10
                                                                          b11   b12   
                                                                                       
                                   a10    a11
                                                                 b20       b21   b22

                                                                                               
                                          a00 b00 a00 b01 a00 b02    a01 b00 a01 b01 a01 b02
                                                                                               
                                         a00 b10 a00 b11 a00 b12    a01 b10 a01 b11 a01 b12    
                                                                                               
                                                                                               
                 a00 B    a01 B          a00 b20 a00 b21 a00 b22    a01 b20 a01 b21 a01 b22    
                                                                                               
     A⊗B=                          =                                                           
                 a10 B    a11 B          a10 b00 a10 b01 a10 b02    a11 b00 a11 b01 a11 b02    
                                                                                               
                                                                                               
                                         a10 b10 a10 b11 a10 b12    a11 b10 a11 b11 a11 b12    
                                                                                               
                                          a10 b20 a10 b21 a10 b22    a11 b20 a11 b21 a11 b22


Kronecker product expresses contemporaneity or synchronization


If A and B are the transition probability matrices of independent discrete-time Markov chains
⇒ A ⊗ B is the transition probability matrix of their composition
25
           Kronecker-consistent decomposition of a CTMC model

A decomposition of a discrete-state model describing a CTMC is Kronecker-consistent if:

 • the potential transition rate matrix R is additively partitioned                   R=        α∈E   Rα

 • S = SK × · · · × S1 , a global state i consists of K local states                 i = (iK , . . . , i1 )

 • and, most importantly, we can multiplicatively partition each Rα , that is, we can write
     λα (i) = λK,α (iK ) · · · λ1,α (i1 )   and      ∆α (i, j) = ∆K,α (iK , jK ) · · · ∆1,α (i1 , j1 )



                               Rα = RK,α ⊗ · · · ⊗ R1,α

We encode the potential transition rate matrix R with |E| × K small matrices          Rk,α ∈ Rnk ×nk



for stochastic Petri nets with transition rates depending on at most one
      place, any partition of the places into K subsets is consistent
                 (even with inhibitor, reset, or probabilistic arcs)
26
      Kronecker description of the transition rate matrix of a CTMC

 • Parallel composition of K submodels with overall event set E (synchronizing vs. local)
 • Global state i is a K -tuple (iK , ..., i1 ) of local states           S ⊆ S = SK × · · · × S1
 • Transition rate matrix R = R[S, S] where R =                           Rk,α
                                                              α∈E K≥k≥1
                     
                      λk,α (ik ) · ∆k,α (ik , jk ) if α is in submodel k
                     
 • Rk,α [ik , jk ] =   1                            if α is not in submodel k and ik = jk
                     
                     
                       0                            if α is not in submodel k and ik = jk



                 encode a huge R with K                   · |E| “small” matrices
“On the stochastic structure of parallelism and synchronisation models for distributed algorithms”
Plateau (SIGMETRICS 1985)



      factor K slowdown, still needs a probability vector of size |S|
“Complexity of memory-efficient Kronecker operations with applications to the solution of Markov mod-
els” Buchholz, Ciardo, Donatelli, Kemper (INFORMS J. Comp., 2000)
27
                           Kronecker encoding of R                     (K = 5)
S5 = ?                  S4 = ?                S3 = ?                   S2 = ?                S1 = ?


      EVENTS →
L              0                                                   0
E    R5,a: ?           I            I          I         R5,e: ?                                 p
V
               1                                                   0
E
L              0         0         0                                                         a
     R4,a: ?     R4,b: ?   R4,c: ?             I             I                       q               s
S              0         0         1
↓
                             0         0                           0
          I        R3,b: ?     R3,c: ?         I         R3,e: ?
                             1         0                           1             b       c       d
               0                                     0
     R2,a: ?           I            I      R2,d: ?           I
               0                                     1                               r           t
                                                                                             e
                                                     0          0
          I            I            I      R1,d: ?     R1,e: ?
                                                     0         10



              we determine a priori from the model whether Rk,α = I
28
          Kronecker encoding of R               =             α∈{a,b,c,d,e}    5≥k≥1 Rk,α

S5 : {p1,p0}≡{0,1} S4 : {q 0,q 1}≡{0,1} S3 : {r0,r1}≡{0,1} S2 : {s0,s1}≡{0,1} S1 : {t0,t1}≡{0,1}

      EVENTS →
L             a
           0 γ5                                                         0 0
E    R5,a:              I                I           I          R5,e:    e                      p
V
           0 0                                                          γ5 0
E             a            b
L          0 γ4         0 γ4       0 0                                                      a
     R4,a:        R4,b:      R4,c: c                 I                  I           q               s
S          0 0          0 0       γ4 0
↓                                    c
                        0 0       0 γ3                                  0 0
          I       R3,b: b   R3,c:                    I          R3,e:    e
                       γ3 0       0 0                                   γ3 0    b       c       d
              a
           0 γ2                                       0 0
     R2,a:              I                I   R2,d:     d                I
           0 0                                       γ2 0                           r           t
                                                          d                                 e
                                                     0   γ1             0 0
          I             I                I   R1,d:              R1,e:    e
                                                     0 0                γ1 0


                           a    a    a
R[00i3 0i1 , 11i3 1i1 ] = γ5 · γ4 · γ2
                                             a         a
Not a canonical representation: changing to γ5 ·2 and γ4 /2 would describe the same R
29
                         Kronecker encoding of R                       (K = 4)
S4 = ?                         S3 = ?                          S2 = ?                              S1 = ?



      EVENTS →                                                                                         p
L
E    R4,a :?        I             I              I         R4,e :?
V                                                                                                  a
E                                                                                         q                s
L
S    R3,a :?     R3,b :?       R3,c :?           I         R3,e :?
↓
                                                                                      b        c       d
     R2,a :?        I             I          R2,d :?           I


       I            I             I          R1,d :?       R1,e :?                         r           t
                                                                                                   e




The matrices for b and c differ only at level 3: we can merge them into a single (local) event l


       we determine automatically from the model whether Rk,α = I
30
              Kronecker encoding of R                  =        α∈{a,l,d,e}           4≥k≥1 Rk,α

S4 : {p1,p0} ≡ {0,1}   S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2}   S2 : {s0,s1} ≡ {0,1}    S1 : {t0,t1} ≡ {0,1}


      EVENTS →
             0 µa                                                0 0                                     p
L    R4,a :      4       I                      I         R4,e: e
E            0 0                                                µ4 0
V                                                                                              a
E              a
            0 µ3 0      0 0 0                                   0 00                        q                s
L
S    R3,a:0 0 0 R3,l:0 0 µc 
                              3                 I        R3,e: 0 0 0
                           b
            0 0 0       0 µ3 0                                 µe 0 0
                                                                 3
↓
                                                                                        b        c       d
              0   µa
                   2                            0 0
      R2,a:                 I           R2,d:                   I
              0 0                               µd 0
                                                 2
                                                                                             r           t
                                              0 µd
                                                 1              0 0                                  e
              I             I           R1,d:             R1,e: e
                                              0 0              µ1 0




We have merged b and c into a single (local) event l
         The matrix R encoded by the Kronecker descriptor (K                                 = 4)
31




                                           00   00   00   00   00   00   11   11   11   11   11   11
                                           00   00   11   11   22   22   00   00   11   11   22   22
                      p                    00   11   00   11   00   11   00   11   00   11   00   11
                                           01   01   01   01   01   01   01   01   01   01   01   01
                                           •                                        •   •     •   •
                                   0000•   ··   ··   ··   ··   ··   ··   ··   ··   ··   a·   ··   ··
                  a                0001    ··   ··   ··   ··   ··   ··   ··   ··   ··   ·a   ··   ··
         q                s        0010    ·d   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   0011    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   0100    ··   ··   ··   ··   c·   ··   ··   ··   ··   ··   ··   ··
                                   0101    ··   ··   ··   ··   ·c   ··   ··   ··   ··   ··   ··   ··
                                   0110    ··   ··   ·d   ··   ··   c·   ··   ··   ··   ··   ··   ··
     b        c       d            0111    ··   ··   ··   ··   ··   ·c   ··   ··   ··   ··   ··   ··
                                   0200    ··   ··   b·   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   0201    ··   ··   ·b   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   0210    ··   ··   ··   b·   ·d   ··   ··   ··   ··   ··   ··   ··
                                   0211    ··   ··   ··   ·b   ··   ··   ··   ··   ··   ··   ··   ··
          r           t
                  e                1000    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   1001    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   1010    ··   ··   ··   ··   ··   ··   ·d   ··   ··   ··   ··   ··
              {p1,p0 }≡ {0,1}      1011    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
                                   1100    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   c·   ··
                                   1101•   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ·c   ··
{q 0 r0,q 1 r0,q 0 r1 }≡ {0,1,2}   1110•   ··   ··   ··   ··   ··   ··   ··   ··   ·d   ··   ··   c·
                                   1111    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ·c
              {s0,s1 }≡ {0,1}      1200    ··   ··   ··   ··   ··   ··   ··   ··   b·   ··   ··   ··
                                   1201•   e·   ··   ··   ··   ··   ··   ··   ··   ·b   ··   ··   ··
              {t0,t1 }≡ {0,1}      1210•   ··   ··   ··   ··   ··   ··   ··   ··   ··   b·   ·d   ··
                                   1211    ··   e·   ··   ··   ··   ··   ··   ··   ··   ·b   ··   ··
32
                                     Matrix diagrams (MXDs)

A generalization of the idea of Kronecker encoding of a matrix

Allows us to enforce knowledge of the reachable states S to the potential transition rate matrix R

                                              0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2
An example of (non–canonical) MXD:            0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
                                              0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
                                        000                                          70
                                        001                                       90 10
     R[001,210] = 5*2*9 = 90            002                                             20
                                        010                     56       40
     R[111,211] = 2*(2*5+1*1) = 22
                                        011                        35 42    25 30
                   0 1   2              012                        77       55
                 0   1   5              100
                 1   1   2              101
                 2 3     2              102
                                        110                             16            56    32 14
         0 1     0 1   0 1   0 1        111                                  10 12 72 8     18 22 24
     0       0       0   2 0            112                                  22          16    44 4
     1     7 1     2 1 1   1 4 21       200            42                                      28
                                        201          54 6                                   36 4
            0 1 2   0 1 2               202               12                                      8
          0 8     0   7                 210 24                                     16
          1   5 6 1 9 1                 211    15 18                                  10 12
          2   11  2     2               212    33                                     22
33
                                      How to build an MXD

Two methods for building MXDs in our tool SMART:


 • From the Kronecker encoding
     ◦ Build the matrices for the Kronecker encoding
     ◦ Insert the matrices in an MXD, removing duplicate (and redundant?) ones
     ◦ Since we start from a Kronecker encoding, we have the same limitations
     ◦ Similar memory requirements to a Kronecker encoding in practice
     ◦ We can use them to store R instead of R

 • From an explicit enumeration of the entries
     ◦ Requires the use of canonical MXDs [Miner PNPM’01]
     ◦ Entries are added individually (or in batches for greater efficiency)
     ◦ Requires more time (additional overhead prior to numerical solution) and memory
     ◦ They are fully general, can encode any matrix, do not need Kronecker consistency

             MXDs exploit the presence of identity matrices Rk,α
                                       (skipped levels)
34
                                        EVBDDs by example


[Lai et al. 1992] defined edge–valued binary decision diagrams

                                               0                           0                        -3
i3 0 0 0 0 1 1 1 1                          0 1                         0 1                      0 1
                                        0          2                1          1             2         3
i2 0 0 1 1 0 0 1 1                  0 1            0 1        0 1              0 1      0 1              0 1

                                    0       3 0          -1   0         1 2        -1   2        2 3         -1
i1 0 1 0 1 0 1 0 1                  0 1            0 1        0 1              0 1      0 1              0 1

                                        0 2       0 -1            -1 1       1 0            -1 1       2 1

 f 02322410                                   T                          T                         T




Canonicity: all nodes have 0–value on the 0–arc                         (only the first EVBDD is canonical)



In canonical form, the root incoming edge has value f (0 · · · 0)
                                          EV+MDDs by example
35




[Ciardo and Siminiceanu 2002] defined edge–valued positive multiway decision diagrams

From BDD to MDD: the usual extension
We allow ∞–edge values: can store partial arithmetic functions
New canonization rule: essential to encode partial arithmetic functions

                                  0                                                       0
i3 0 0 0 0 1 1 1 1              0 1              i3 0 0 0 0 1 1 1 1                     0 1
                            0         0                                             0         0
i2 0 0 1 1 0 0 1 1     0 1            0 1        i2 0 0 1 1 0 0 1 1             0 1           0 1
                       0        2 2        0                                0       3         4       0
i1 0 1 0 1 0 1 0 1     0 1            0 1        i1 0 1 0 1 0 1 0 1       0 1   0                 1   0 1
                           0 2       1 0                                    0 2 0            0 1 0
 f 02322410                      T                f 0 2 3 ∞∞ 4 1 0                       T

Canonicity: all edge values are non–negative and at least one is zero

In canonical form, the root incoming edge has value mini∈S f (i)
                                                         b


                     f (1, 0, 0) = ∞ but f (1, 0, 1) = 4
     the traditional EVMDD normalization cannot represent this function
                                    Definition of EV+MDDs
36




(differences from EVBDDs of [Lai et al. 1992] are in blue)


Given f : S   → Z ∪ {∞}, an EV+MDD encoding f is a DAG with labelled edges such that:

 • There is a single terminal node 0|T

 • There is a single root K|r with a “dangling” incoming edge having value ρ ∈ Z ∪{∞}

 • Non-terminal node k|p has nk edges, k|p [ik ].child has value k|p [ik ].val ∈ N ∪ {∞}

 • If k|p [ik ].val = ∞, the value of k|p [ik ].child is irrelevant

 • If k|p [ik ].val ∈ N, k|p [ik ].child is the index of a node at level k − 1

 • Each non-terminal node has at least one outgoing edge labelled with 0 (if not all ∞)

 • All nodes are unique (taking into account both k|p [ik ].child and k|p [ik ].val)


          analogous definition uses real values instead of integers

                         Theorem: EV+MDDs are canonical
              Using an EV+MDD to store the indexing function ψ
37




The “MDD with offsets” used to store and evaluate ψ can be formalized as an EV+MDD


                                                                                   0

       19 0 1 2 3                                                           0 1 2 3
          0 1 6 11                                                    0        16            11

     1 2 5 0 1 2 8 0 1 2                                     0 1 2            0 1 2           0 1 2
       0   0 2 4   0 1 2                                               0 0 2           4 0    1    2
                                        is equivalent to
      2 0 1      1 1 6 0 1                                            0 1      0 1           0 1
        0 1        0   0 3                                             01      0             0 3

           1 0      3 0 1 2                                           0 1 2             0 1 2
             0        0 1 2                                               0            01 2

                                                                               T


                                              ∈S
                           lexicographic order for i
                               ψ(i) = ∞ ⇔ i ∈ S
                                      EV∗MDDs by example
38




One way to think about EV∗MDDs is        “EV+MDD = − log(EV∗MDD)”:
                                      0 ⇔ 1
                edge values ∈ [0, +∞] ⇔ edge values ∈ [0, 1]
     root incoming edge ∈ (−∞, +∞] ⇔ root incoming edge ∈ [0, +∞)
              values add along the path ⇔ values multiply along the path


                                                                                7
              i3    0    0    0    0 1     1    1    1                        0 1
                                                                            1         1
              i2    0    0    1    1 0     0    1    1                  0 1           0 1
                                                                    1       0.3       0.4 1
              i1    0    1    0    1 0     1    0    1            0 1   0                 1   0 1
                                                                   0.5 1 1            1 1 0.2
              f    3.5 7 2.1 0 0 2.8 7 1.4                                        T


Canonicity: all edge values are in [0, 1] and at least one is 1


In canonical form, the root incoming edge has value maxi∈S f (i)
                                                         b
      Encoding R with an EV∗MDD: initial non-canonical EV∗MDDs
39




We can store R with a 2K –level EVMDD: consider the example of Kronecker encoding
R=      α∈{a,l,d,e}   Rα =        α∈{a,l,d,e}         4≥k≥1   Rk,α


          Ra    0 1         Rl           0 1            Rd       0 1                   Re    0 1

                0 1                0 1        0 1              0 1       0 1                 0 1
                  µa4                                                                       µe
                                                                                             4
               0 1 2                0 1 2                       0 1 2                       0 1 2

               0 1 2             0 1 2        0 1 2    0 1 2    0 1 2          0 1 2        0 1 2
                  µa
                   3                µc
                                     3         µb                                       µe
                                                                                         3
                                                3
                0 1                      0 1                     0 1                         0 1

                0 1                0 1        0 1                0 1                    0 1          0 1
                  µa2                                              µd
                                                                    2
                0 1                      0 1                     0 1                         0 1

            0 1       0 1          0 1        0 1                0 1                         0 1
                                                                    µd
                                                                     1                      µe
                                                                                             1
                  T                       T                          T                           T



                            note the shaded identity patterns!!!
              Encoding R with an EV∗MDD: canonical EV∗MDDs
40




(assume that µb is the largest rate in R)
              3

                µa µa µa
                 4 3 2                    µb
                                           3                     µd µd
                                                                  2 1                       µe µe µe
                                                                                             4 3 1

           Ra    0 1         Rl         0 1             Rd       0 1                   Re    0 1

                 0 1                0 1        0 1             0 1       0 1                 0 1


                0 1 2                  0 1 2                    0 1 2                       0 1 2
                              µc /µb
                               3 3
                0 1 2             0 1 2        0 1 2   0 1 2    0 1 2          0 1 2        0 1 2


                 0 1                    0 1                      0 1                         0 1

                 0 1                0 1        0 1               0 1                    0 1        0 1


                 0 1                    0 1                      0 1                         0 1

             0 1       0 1          0 1        0 1               0 1                         0 1

                   T                      T                          T                         T
                  Encoding R with an EV∗MDD: final EV∗MDD
41




Use a recursive algorithm to compute R      =      α∈{a,l,d,e}         Rα

                                                            µb
                                                             3
                                                  R      0 1

                                                0 1                    0 1
                             µa µa µa /µb
                              4 3 2 3                                           µe µe µe /µb
                                                                                 4 3 1 3

                            0 1 2                       0 1 2                         0 1 2
                                      µdµd/µb
                                       2 1 3          µc /µb
                                                       3 3
                    0 1 2     0 1 2                     0 1 2          0 1 2            0 1 2
                                                 µdµd/µb
                                                  2 1 3                     µdµd/µb
                                                                             2 1 3
                             0 1                      0 1       0 1                    0 1

                                       0 1            0 1        0 1           0 1      0 1

                                                0 1      0 1          0 1

                                                      0 1        0 1

                                                            T



                         hidden identity patterns remain!!!
42
                                    Empirical comparison

Memory consumption in bytes for:
S (MDD), R (Sparse), R (Kronecker), R and R (Pot/Act MXD), R and R (Pot/Act MTMDD)

 Model     N       |S|      |S| MDD         Sparse Kron        Pot     Act      Pot       Act
                                                              MXD     MXD    MTMDD    MTMDD
       2
     qn4        324         324     333      14256     772     586     722    22784     22784
       6      38416       38416     499    2524480    3092    2494    2870    36864     36864
       10    527076      527076     905   38524464    7076    5778    6522    62720     62720
 qn8 2         6561         324     681      14256    1204     738    1688    43776     49152
       6    5764801       38416    1119    2524480    2404    1674    5872    55040     70912
       10 214358881      527076    1953   38524464    3604    2610   12040    66304     98560
mserv2 3       1485         495     705      23352    4124    3246    3952    34560     40704
       6       6345        2115    3176     111408   17468   13998   16432   111104    135168
       10     18495        6165    8846     342720   52228   42278   49032   306560    378460
mserv4 3      14256         495    1174      23352    5568    4098    4916    68864     79616
       6     106596        2115    8453     111408   22920   17502   20054   254360    298856
       10    488268        6165   33739     342720   67560   52342   58934   873896    998552
mserv6 3      32076         495    1333      23352    5724    4066    5316    86784    101376
       6     239841        2115    8614     111408   23076   17470   20238   298596    347956
       10   1098603        6165   33900     342720   67716   52310   59118   982396   1112684
 Model     N        |S|     |S| MDD          Sparse Kron        Pot      Act      Pot       Act
                                                               MXD      MXD    MTMDD    MTMDD
molloy4    5       4536      91     660        4204    1316    1148     2534    23552     28160
           8      32805     285    1215       14676    2528    2300     5216    27648     38656
           10     87846     506    1766       27104    3556    3288     7504    31232     47360
molloy5    5       7776      91     846        4204    1100     792     4298    28416     37120
           8      59049     285    1545       14676    1592    1188     9356    31232     50944
           10    161051     506    2223       27104    1920    1452    13778    33280     61952
kanban3    1        160     160     264        8032     500     412      544    18432     18432
           3      58400   58400     937     5590400    7572    6786     8134    66816     67072
           5    2546432 2546432    5646   303705920   45660   41816    48780   303776    303776
kanban4    1        256     160     332        8032     420     354      602    23552     24576
           3     160000   58400     628     5590400    2500    2216     3284    44032     50176
           5    9834496 2546432    1532   303705920    7940    7118     9950    92928    110592
kanban16   1      65536     160    1275        8032    2148     866     3000    95232    107520
           3    Overflow   58400    1902     5590400    3236    1746    10566   115456    151808
           5    Overflow 2546432    3149   303705920    4324    2626    24106   135168    216320
  fms5     1       2100      84     535        3228    1456     604     1808    36096     40960
           3    9432500   20600    3294     1554080    8304    5224    24320   151296    247040
           5 2016379008 852012    30490    82727748   34484   24664   138244   654892   1255108
 fms21     1    4194304      84    2050        3228    3132    1132     7396   126976    148224
           3    Overflow   20600    6777     1554080    5028    2328    68762   176896    437760
           5    Overflow 852012    22038    82727748    6924    3524   255988   235008   1393932
44
                                          Conclusions

MTMDDs work best when many nonzero entries in R have the same value
Extensive presence of state–dependent rates can make MTMDDs inefficient
Kronecker, MXDs, and EV∗MDD remain instead efficient if the model is Kronecker–consistent

Size of Kronecker is not affected by variable ordering
MTMDDs, MXDs, EV∗MDDs, and MDDs are instead affected by variable ordering

Kronecker and MXDs are restricted to contiguous “(ik , jk )” or “(jk , ik )” variable ordering
MTMDDs and EV∗MDDs can instead have any ordering (but is this generality useful in practice?)

Kronecker is quite efficient but more restrictive
MXDs and EV∗MDDs are fully general and their size is similar to Kronecker, if it exists
MTMDDs are also fully general, their size tends to be larger than Kronecker, MXDs, and EV∗MDDs


MTMDDs, MXDs, and EV∗MDDs can encode R instead of R, but memory requirements increase
Kronecker must rely on an external (MDD) representation of S to zero unreachable rows

Fundamental advantage of Kronecker, MXDs over MTMDDs:
they exploit the presence of numerous identity matrices in the description of R (also
true when encoding just N , e.g., in symbolic model-checking)
As presented, EV∗MDDs do not exploit identities, but we are working on that
Solution algorithms
46
                  Potential Kronecker: the shuffle algorithm PSh

First algorithm to be proposed for Kronecker solution [Plateau SIGMETRICS 1985]

PSh computes y ← x · ⊗K≥k≥1 Ak

PSh + computes y ← x · InK ···nk+1 ⊗ Ak ⊗ Ink−1 ···n1

Based on the equality [Davio 1981]

               Ak =             ST K ···nk+1 ,nk ···n1 ) · (I|S|/nk ⊗ Ak ) · S(nK ···nk+1 ,nk ···n1 )
                                 (n                           b
       K≥k≥1           K≥k≥1

where S(a,b)   ∈ {0, 1}a·b×a·b is an (a, b)-perfect shuffle permutation:

                                             1 if j = (i mod a) · b + (i div a)
                       S(a,b) [i, j] =
                                             0 otherwise

Requires
 • K vector permutations and
 • K multiplications x · (I|S|/nk ⊗ Ak ).
                            b


Complexity of the k -th multiplication:   O(|S|/nk · η[Ak ]).
47
                   Potential Kronecker: the shuffle algorithm PSh

                                               b b
 PSh (in: nK , ..., n1 , AK , ..., A1 ; inout: x, y);
  1.     nleft ← 1;
  2.     nright ← nK−1 · · · n1 ;
  3.     for k = K down to 1
  4.         base ← 0;
  5.         jump ← nk · nright ;
  6.         if Ak = I then
  7.             for block = 0 to nleft − 1
  8.                  for offset = 0 to nright − 1
  9.                      index ← base + offset ;
 10.                      for h = 0 to nk − 1
 11.                          zh ← xindex ;
                                      b
 12.                          index ← index + nright ;
 13.                      z ← z · Ak ;
 14.                      index ← base + offset ;
 15.                      for h = 0 to nk − 1
 16.                          yindex ← zh ;
                              b
 17.                          index ← index + nright ;
 18.                  base ← base + jump ;
 19.             x ← y;
                 b        b
 20.         nleft ← nleft · nk ;
 21.         nright ← nright /nk−1 ;                          Let n0 be 1
48
                                 Example of shuffle computation

y ← x · (A ⊗ B)                                  Follow the entries marked with a diamond to obtain y2

                   x                 A   B                x

                                 a 00B   a 01B                     T
                                                                  S2,3
                                                                             I3       A
                                                                         A        0       0
                                 a 10B   a 11B            u
                                                                         0        A       0

                            y                                            0        0       A


y ← x · ST ·(I3 ⊗ A) ·S2,3 ·ST · (I2 ⊗ B) · S6,1
         2,3                 6,1
                                                                   v
    | {z }
                                                                                              S2,3
          u                                                                                              I2   B
      |       {z        }                                          w
              v
      |            {z           }                                                                    B            0
                   w
      |                         {z                  }
                                y
y2 ←                                                                                                 0        B
B02 w0 +B12 w1 +B22 w2 =
B02 v0 +B12 v2 +B22 v4 =
                                                                                                y
B02 (u0 A00 +u1 A10 )+B12 (u2 A00 +u3 A10 )+B22 (u4 A00 +u5 A10 ) =
B02 (x0 A00 +x3 A10 )+B12 (x1 A00 +x4 A10 )+B22 (x2 A00 +x5 A10 ) =
A00 B02 x0+A00 B12 x1+A00 B22 x2+A10 B02 x3+A10 B12 x4+A10 B22 x5
49                                                                         +
                              Complexity of PSh and PSh
                                                   

PSh has complexity O              |S|/nk · η[Ak ] = O |S| · K · α
                          K≥k≥1


Even when S   = S , PSh is faster than Ordinary explicit multiplication only if
                                                                      1
                                                    K
                            |S| · K · α < |S| · α         ⇔ α>K      K−1




PSh + has complexity O |S|/nk · η[Ak ] = O |S| · α

Complexity of computing y   ← y+x·          K≥k≥1       Ak :
                                                                  
                                                                η[Ak ] 
       O             |S|/nk · η[Ak ] = O |S|                          = O |S| · K · α
                                                                  nk
              K≥k≥1                                     K≥k≥1




                  Ordinary is faster than PSh if α ≤ 1
              PSh + saves space, but not time, w.r.t. Ordinary
                                                                           PRw and PRw +
50
                                    Potential Kronecker:

                                                       b
 PRwEl (in: i, x, nK , ..., n1 , AK , ..., A1 ; inout: y)
     1.         for each jK s.t. AK [iK , jK ] > 0
     2.              jK ← jK ; aK ← AK [iK , jK ];
     3.              for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
     4.                   jK−1 ← jK · nK−1 + jK−1 ; aK−1 ←                  aK · AK−1 [iK−1 , jK−1 ];
          ...
     5.                  for each j1 s.t. A1 [i1 , j1 ] >   0
     6.                       j1 ← j2 · n1 + j1 ; a1        ← a2 · A1 [i1 , j1 ];
     7.                       yj ← yj + x · a1 ;
                              b 1
                                       b  1

          b                                       b
 PRw (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
 1.                    b
       for i = 0 to |S| − 1
 2.                      b                              b
            PRwEl (i, xi , nK , ..., n1 , AK , .., A1 , y);
 PRwEl + (in: nk , nk−1 · · · n1 , i− , ik , i+ , x, Ak ; inout: y)
                                    k         k                  b
     1.                                      >0
                for each jk s.t. Ak [ik , jk ]
     2.               j ← (i− ·
                            k       nk + jk ) · nk−1 · · · n1 + i+ ;
                                                                 k
     3.               yj ← yj
                      b     b       + x · Ak [ik , jk ];

 PRw + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
            b                                                   b
     1.         for   i ≡ (i− , ik , i+ ) = 0 to nK · · · nk+1 · nk · nk−1 · · · n1 − 1
                            k         k
     2.               PRwEl + (nk , nk−1 · · · n1 , i− , ik , i+ , xi , Ak , y);
                                                     k         k b           b
51                                                                       +
                            Complexity of PRw and PRw

PRw computes y ← y + x · A, according to the definition of Kronecker product
Requires sparse row-wise format for each Ak

PRwEl computes the contribution of xi to each entry of y as
                                     y ← y + xi · Ai,S
                                                     b

PRwEl reaches statement      ak ← ak−1 · Ak [ik , jk ]   O(αk ) times.

PRw makes |S| calls to PRwEl , hence has complexity
                 
  |S| ·         k       O |S| · K = O(K · η[A])             if α   ≤1
O              α      =
         K≥k≥1            O(|S| · αK ) = O(η[A])              if α   >1


      +                          η[Ak ]
PRw       has complexity O |S| ·          = O |S| · α
                                   nk

Complexity of computing y   ← y+x·        K≥k≥1 Ak using PRw + : O |S| · K · α

     PRw amortizes the multiplications for aK−1 ,...,a2 only if α                1
         PRw + saves space, but not time, w.r.t. Ordinary
                                                               PRwCl and PRwCl +
52
                       Potential Kronecker:

            b                                       b
 PRwCl (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
  1.       for iK = 0 to nK − 1
  2.            for each jK s.t. AK [iK , jK ] > 0
  3.                 jK ← jK ; aK ← AK [iK , jK ];
  4.                 for iK−1 = 0 to nK−1 − 1
  5.                      for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
  6. . . .                     jK−1 ← jK · nK−1 + jK−1 ; aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
  7.                           for i1 = 0 to n1 − 1
  8.                                for each j1 s.t. A1 [i1 , j1 ] > 0
  9.                                     j1 ← j2 · n1 + j1 ; a1 ← a2 · A1 [i1 , j1 ];
 10.
                                     1
                                         yj ← yj + xi · a1 ;
                                         b     1
                                                  b       b

The overall complexity is O       |S| · αK
 PRwCl + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
              b                                                   b
              −
     1.   for ik = 0 to nK · · · nk+1 − 1
     2.        for ik = 0 to nk − 1
     3.             for each jk s.t. Ak [ik , jk ] > 0
     4.                  jk ← i− · nk + jk ;
                                 k
                              +
     5.                  for ik = 0 to nk−1 · · · n1 − 1
     6.                   jK ← jk · nk−1 · · · n1 + i+ ;
                                                     k
     7.                   yjK ← yjK + x(i− ,i ,i+ ) · Ak [ik , jk ];
                          b     b      b
                                                   k   k   k
53
                                           Potential Kronecker:                                  PRwCl
                                                                             a0         a1                      b0      b1                c0   c1
                                                x·A=x·                                                ⊗                            ⊗
                                                                             a2         a3                      b2      b3                c2   c3
PRw                 1
              a0 b0 c0
                         2
                         c1
                                 3
                              b1 c0
                                      4
                                      c1
                                                 5
                                           a1 b0 c0
                                                      6
                                                      c1
                                                              7
                                                           b1 c0
                                                                   8
                                                                   c1
                                                                        a0
                                                                             b0             b1
                                                                                                           a1
                                                                                                                b0           b1
                                                                                                                                               PRwCl
                                                                                  1    2         5    6              17 18        21 22
                    9    10    11 12             13 14    15 16                   c0   c1        c0   c1             c0 c1        c0 c1
              a0 b0 c2   c3 b1 c2 c3       a1 b0 c2 c3 b1 c2 c3                   3    4         7    8              19 20        23 24
                                                                                  c2   c3        c2   c3             c2 c3        c2 c3
                    17 18    19 20               21 22    23 24
              a0 b2 c0 c1 b3 c0 c1         a1 b2 c0 c1 b3 c0 c1              b2             b3                  b2           b3
                                                                                  9 10           13 14               25 26        29 30
                                                                                  c0 c1          c0 c1               c0 c1        c0 c1
                    25 26    27 28               29 30    31 32                   11 12          15 16               27 28        31 32
              a0 b2 c2 c3 b3 c2 c3         a1 b2 c2 c3 b3 c2 c3                   c2 c3          c2 c3               c2 c3        c2 c3

                    33 34    35 36               37 38    39 40         a2                                 a3
              a2 b0 c0 c1 b1 c0 c1         a3 b0 c0 c1 b1 c0 c1              b0             b1                  b0           b1
                                                                                  33 34          37 38               49 50        53 54
                                                                                  c0 c1          c0 c1               c0 c1        c0 c1
                    41 42    43 44               45 46    47 48
              a2 b0 c2 c3 b1 c2 c3         a3 b0 c2 c3 b1 c2 c3                   35 36          39 40               51 52        55 56
                                                                                  c2 c3          c2 c3               c2 c3        c2 c3
                    49 50    51 52               53 54    55 56              b2             b3                  b2           b3
              a2 b2 c0 c1 b3 c0 c1         a3 b2 c0 c1 b3 c0 c1                   41 42          45 46               57 58        61 62
                                                                                  c0 c1          c0 c1               c0 c1        c0 c1
                    57 58    59 60               61 62    63 64                   43 44          47 48               59 60        63 64
              a2 b2 c2 c3 b3 c2 c3         a3 b2 c2 c3 b3 c2 c3                   c2 c3          c2 c3               c2 c3        c2 c3



Each “b” and “c” box corresponds to one multiplication: 8 × 8 = 64 entries of the form ai bj cl
• Computing each entry from scratch: 64 × 2 = 128 multiplications
• Using PRw : 64 + 32 = 96 multiplications
• Using PRwCl : 64 + 16 = 80 multiplications: interleaving helps!

       same complexity as Ordinary regardless the sparsity level
           but the entries of A are not generated in column order
54
                                                Actual Kronecker:     ARw

 ARw (in: x, AK , ..., A1 , S ; inout: y)
     1.         for each i  ∈S
     2.             I ← ψ(i);
     3.             for each jK s.t. AK [iK , jK ] > 0
     4.                  aK ← AK [iK , jK ];
     5.                  for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
     6.                       aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
          ...
  7.                                                  >0
                             for each j1 s.t. A1 [i1 , j1 ]
  8.                             a1 ← a2 · A1 [i1 , j1 ];
  9.                             J ← ψ(j);
 10.                             yJ ← yJ + xI · a1 ;



                          = ψ(j) of state j in the array y.
Statement 9 computes the index J
                           
                k  K
                                     O (|S|·(K +log |S|)) if α ≤ 1
O |S|·       α +α · log |S| =
         K≥k≥1                       O |S|·αK ·log |S|      if α > 1




          if K     < log |S|: ARw has a log |S| overhead w.r.t. Ordinary
                                                        ARwCl and ARwCl +
55
                           Actual Kronecker:

 ARwCl (in: x, AK , ..., A1 , S ; inout: y)
  1.         for each iK  ∈ SK                                                            all local states iK
  2.            IK ← ψK (iK );
  3.            for each jK s.t. AK [iK , jK ] > 0
  4.                 JK ← ψK (jK );
  5.                 if JK = null then
  6.                     aK ← AK [iK , jK ];
  7.                     for each iK−1 ∈ SK−1 (iK )                           all iK−1 compatible with iK
  8.                          IK−1 ← ψK−1 (IK , iK−1 );
  9.                          for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
 10.                               JK−1 ← ψK−1 (JK , jK−1 );
 11.                               if JK−1 = null then
 12.                                   aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
       ...
 13.                              for each i1  ∈ S1 (iK , ..., i2 )        all i1 compatible with iK , ..., i2
 14.                                 I1 ← ψ1 (I2 , i1 );
 15.                                 for each j1 s.t. A1 [i1 , j1 ] > 0
 16.                                      J1 ← ψ1 (J2 , j1 );
 17.                                      if J1 = null then
 18.                                           a1 ← a2 · A1 [i1 , j1 ];
 19.                                           yJ1 ← yJ1 + xI1 · a1 ;



                 note the need for EV+MDDs to index the state space
56
                                     Actual Kronecker:            ARwCl
                                                                      

Complexity of ARwCl :         O           |S1 | · · · |Sk | · αk · log nk  = O(|S| · αK · log nK )
                                   K≥k≥1


(assuming that |S1 | · · · |SK−1 |     |S|)



                          +
Complexity of ARwCl :         O(|S| · α · log nK )

regardless of k , and the resulting complexity of computing
                                                                  

                                    y ← y+x·                   Ak 
                                                        K≥k≥1
                                                                       S,S

               +
using ARwCl        is   O (K · |S| · α · log nK )




       only log nK overhead w.r.t. Ordinary for any sparsity level
                but it cannot be used in a Gauss-Seidel iteration
57
                           Results for the numerical solution



Matrix diagrams achieve the same efficiency as multiplication by blocks (ARwCl ). . .

. . . but can provide access by columns as required by Gauss–Seidel




Time requirements for the Kanban model (450MHz Pentium workstation with 384 Mbytes RAM)

                       number              MXDs                  Kronecker                  Explicit
     N      |S|        of arcs     Gauss–Seidel          Gauss–Seidel        Jacobi       Gauss–Seidel
                        in R       Iters     sec/iter Iters    sec/iter Iters sec/iter Iters    sec/iter
     2       4,600       28,120      40           0.11    55     0.17   134       0.09     55      0.02
     3      58,400      446,400      67           1.46    97     2.56   240       1.34     97      0.34
     4     454,475     3,979,850     99       12.33      149    23.69   370      11.99    149      3.04
     5    2,546,432   24,460,016   139        73.09      214   147.70   527      74.09    214    18.51
     6   11,261,376 115,708,992    185       336.21      289   723.30   713 359.15         —           —
     7   41,644,800 450,455,040    238 1,289.91          374 2,922.80     —           —    —           —

						
Related docs