# Modeling and Analysis of Markov Chains Using Decision Diagrams

W
Shared by:
Categories
-
Stats
views:
10
posted:
1/11/2010
language:
English
pages:
57
Document Sample

```							Modeling and Analysis of Markov Chains
Using Decision Diagrams

Gianfranco Ciardo
Department of Computer Science and Engineering
University of California at Riverside
Riverside, CA 92521, USA
ciardo@cs.ucr.edu

(partially based on work with Andrew S. Miner and Paul L. E. Grieco)
2
Outline
• Background on CTMCs
◦ Continuous–time Markov chains and high–level CTMC models
◦ Matrix–vector description and operations for CTMC solution
◦ Sparse matrices
• Structured models
◦ High–level CTMC models structured into submodels
◦ State space and state indexing: potential vs. actual indices
◦ Multiway decision diagrams (MDDs)
• Decision–diagram–based CTMC encodings
◦ Multiterminal multiway decision diagrams (MTMDDs)
◦ Kronecker descriptors
◦ Matrix diagrams (MXDs)
◦ Edge–valued multiway decision diagrams (EVMDDs)
• Solution algorithms
◦ Vector–matrix multiplication with Kronecker–based encodings
◦ Jacobi vs. Gauss–Seidel style iterations
Background on CTMCs
4
Discrete-state models

A discrete state model is fully speciﬁed by:

• a potential state space S                                                     the “type” of the states

• a set of initial states S init ⊆ S                           often there is a single initial state sinit
b
• a next-state function N : S → 2S                naturally extended to sets:   N (X ) =      i∈X   N (i)

The state space S of the model is the smallest set containing S init and satisfying:
• the recursive deﬁnition                                              i ∈ S ∧ j ∈ N (i) ⇒ j ∈ S
• or the ﬁxed-point equation                                                         X = X ∪ N (X )

S = S init ∪ N (S init ) ∪ N 2 (S init ) ∪ N 3 (S init ) ∪ · · · = N ∗ (S init )
5
Deﬁnition of continuous-time Markov chain

A stochastic process {X(t)    : t ≥ 0} is a collection of r.v.’s indexed by a time parameter t

We say that X(t) is the state of the process at time t

The possible values X(t) can ever assume for any t is (a subset of) the state space S

{X(t) : t ≥ 0} over a discrete S is a continuous-time Markov chain (CTMC) if
Pr {X(tn+1 )     = in+1 | X(tn ) = in ∧ X(tn−1 ) = in−1 ∧ . . . ∧ X(t0 ) = i0 }

= Pr {X(tn+1 ) = in+1 | X(tn ) = in }
for any times 0   ≤ t0 ≤ . . . ≤ tn−1 ≤ tn ≤ tn+1 and states {i0 , . . . , in−1 , in , jn+1 } ⊆ S

Markov property:
“given the present state, the future is independent of the past”
“the most recent knowledge about the state is all we need”
6
Markov chain description and analysis

A continuous-time Markov chain (CTMC) {X(t)            : t ≥ 0} with state space S is described by
• its inﬁnitesimal generator                Q = R − diag(R · 1) = R − diag(h)−1 ∈ R|S|×|S|
• its initial probability vector                                                           π(0) ∈ R|S|

where
• R is the transition rate matrix:                 R[i, j] is the rate of going to state j when in state i
• h is the expected holding time vector :                                      h[i] = 1/     j∈S   R[i, j]
• π(0)[i] = Pr {chain is in state i at time 0, i.e., initially}

Transient probability vector     π(t) ∈ R|S| :                                   π(t)[i] = Pr {X(t) = i}
dπ(t)
• π(t) is the solution of            = π(t) · Q with initial condition π(0)
dt

Steady-state probability vector        π ∈ R|S| :                           π[i] = limt→∞ Pr {X(t) = i}
• π is the solution of π · Q = 0 subject to            i∈S   π[i] = 1              (Q must be ergodic)
7
Storing a CTMC explicitly

We need to store:
• R       a real matrix of size |S| × |S|
• h       a real vector of size |S|

We focus on R, for which we can employ sparse storage:
• Requires memory proportional to η(R), the number of nonzeros in R, instead of |S|2
• Allows iterative numerical methods to run in time proportional to η(R), instead of |S|2

0    1   2    3     4
α
0                 1         0         α   β                      0      1   α    2        β
γ
ε                     1     γ                              1      0   γ

β         4                 2                 α                  2      3   α

δ             3             γ         δ            3      2    γ   4    δ
α
2                 3         4    ε                               4      0    ε
γ
8
Exploiting common entries with explicit storage

We can map the real values in R to (small) integer indices:
• Causes a small runtime overhead
• Reduces memory by a constant factor if R contains several entries with the same value
• If all nonzero entries have the same value (e.g., 1), this is essentially the reachability graph

α
0                1         0      1   α    2   β            0       10 2 1
γ
ε                      1      0   γ                     1       0 2

β        4                  2      3   α                     2       3 0

δ             3      2    γ   4   δ            3       2 2 43
α
2                3         4      0    ε                    4       0 4
γ
0    1     2     3     4
α    β     γ     δ      ε
Structured models
10
Structured discrete-state models

A structured discrete state model is speciﬁed by

• a potential state space S = SK × · · · × S1 = ×K≥k≥1 Sk
◦ the “type” of the (global) state
◦ Sk is the (discrete) local state space for submodel k
• a set of initial states S init ⊆ S
◦ often there is a single initial state sinit
• a set of events E deﬁning a disjunctively-partitioned next-state function
b
◦ N α : S → 2S                       j ∈ Nα (i) iff state j can be reached by ﬁring event α in state i
b
◦ N : S → 2S is deﬁned by N (i) =               α∈E   Nα (i)
◦ we can extend N to take sets of states as argument N (X ) =              i∈X   N (i)
◦ α is enabled in i iff Nα (i) = ∅, otherwise it is disabled
◦ i is absorbing, or a trap, or dead iff N (i) = ∅
Petri nets and their state space S : ﬁnite case
11

e fires            p

a
q               s

p                                  p                          b       c       d                         p

es

d
fir

fir
a                                 a                                                                    a

es
c
es
q                s                 q               s                  r           t                     q               s

fir
e

b
a fires                                              p
b        c       d                 b       c       d                                                    b       c       d

s
e
fir
a

c
r           t                     r           t         d            q               s                 r           t

es
fir
e                                 e                 es                                                 e

fir
b
b       c       d

r           t
e

init                                  (N + 1)(N + 2)(2N + 3)
If the initial state is s           = (N , 0, 0, 0, 0), S contains                        states
6
12
State indexing

Let the size of the k th local state space be nk    = |Sk | and map Sk to {0, 1, . . . , nk −1}

S contains |S| = nK · nK−1 · · · n1 states, but not all of them are actually reachable

A potential indexing of a (global) state i   = (iK , . . . , i1 ) is simply the mixed-base value of i:
                 

ψ(i) =               i k           nl 
K≥k≥1           k>l≥1

An actual indexing for the reachable states is instead harder to deﬁne, store, and compute

ψ : S → {0, 1, . . . , |S|−1} ∪ {null}

in reality, we often have no a priori knowledge of Sk
Explicit generation of S and R
13

Explore(in: S init , N ; out: S, R, ψ) is
1.   n ← 0;                                                                            state indices start at 0
2.   S ← ∅;                                                              S contains the states explored so far
3.   U ← S init ;                                             U   contains the unexplored states known so far
init
4.   for each i ∈ S      do
5.        ψ(i) ← n++;                                    assign to i the next available index and increment n
6.   end for
7.   while U = ∅ do
8.       choose a state i in U and move it from U to S ;
9.       for each event α ∈ E and each state j ∈ Nα (i) do
10.            if j ∈ S ∪ U then                                 search to determine whether j is a new state
11.                 ψ(j) ← n++;                           assign to j the next available index and increment n
12.                 U ← U ∪ {j};                                                    remember to explore j later
13.            end if;
14.            R[ψ(i), ψ(j)] ← R[ψ(i), ψ(j)] + λα (i)∆α (i, j);                            ψ is used to index R
15.       end for;
16.   end while;

ψ : S → {0, ..., |S|−1} ∪ {null} is a state indexing function                              (e.g., discovery order)
λα (i) is the rate at which event α ﬁres in state i
∆α (i, j) is the probability that, if event α ﬁres in state i, the next state is j
14
(Quasi–reduced ordered) multiway decision diagrams
• Nodes are organized into K + 1 levels
◦ Level K contains only one root node
◦ Levels K −1 through 1 contain one or more nodes, NO DUPLICATES
◦ Level 0 contains only the two terminal nodes, 0 and 1 (false and true).
• For k > 0, a node at level k has |Sk | arcs pointing to nodes at level k−1

S4 = {0, 1, 2, 3}                      0 1 2 3

S3 = {0, 1, 2}          0 1 2              0 1 2           0 1 2

S2 = {0, 1}             0 1           0 1        0 1           0 1

S1 = {0, 1, 2}          0 1 2              0 1 2           0 1 2

0                    1
                                                                                                   
 0      1    1   1    1   1     2    2    2        2     2     3       3    3    3   3   3   3   3 
S=   2      0    0   1    1   2     0    0    1        1     2     0       1    2    2   2   2   2   2
 1      0    1   0    1   1     0    1    0        1     1     1       1    0    0   0   1   1   1 
0      0    0   0    0   0     0    0    0        0     0     0       0    0    1   2   0   1   2

[Kam et al. 1998] deﬁned fully–reduced ordered MDDs as an interface to BDDs
15
How to deﬁne and store the indexing function ψ using MDDs
S4 = {0, 1, 2, 3}                   0 1 2 3

S3 = {0, 1, 2}         0 1 2           0 1 2           0 1 2

S2 = {0, 1}            0 1         0 1       0 1           0 1

S1 = {0, 1, 2}         0 1 2           0 1 2           0 1 2

0                   1
                                                                                                  
 0     1   1    1   1    1    2   2    2       2     2   3         3    3    3     3    3   3   3 
S=   2     0   0    1   1    2    0   0    1       1     2   0         1    2    2     2    2   2   2
 1     0   1    0   1    1    0   1    0       1     1   1         1    0    0     0    1   1   1 
0     0   0    0   0    0    0   0    0       0     0   0         0    0    1     2    0   1   2

To compute the index of a state, use edge–value offsets:                          19 0 1 2 3
0 1 6 11
• Sum the offsets found on the corresponding path:
1 2 5 0 1 2 8 0 1 2
ψ(2, 1, 1, 0) = 6 + 2 + 1 + 0 = 9                                   0   0 2 4   0 1 2

• A state is unreachable if the path is not complete:                           2 0 1          1 1 6 0 1
0 1            0   0 3
ψ(0, 2, 0, 0) = 0 + 0 + ? + ? = null
1 0     3 0 1 2
lexicographic, not discovery, order!!!                                                0       0 1 2
16
State indexing options: potential ψ vs. actual ψ

Once we know S :
b
• We can store the original N : S → 2S or its restriction N : S → 2S
• We can store R : S × S → R or R : S × S → R
• We can choose algorithms that use π : S → R or π : S → R

With strictly explicit methods: using actual R and π works best

With (even just partially) implicit methods, there are tradeoffs
• Storing π instead of π is often unavoidable if we employ a full vector
• Symbolic storage of R is much cheaper than that of R in terms of memory requirements
• However, using R in conjunction with π complicates indexing...
• ...at the very least, it forces us to store ψ : S → {0, 1, . . . , |S| − 1} ∪ {null}, hence S
17
Saturation: an iteration strategy based on the model structure

MDD node p at level k is saturated if the set of states it encodes is a ﬁxed point w.r.t.
any α s.t. Top(α) ≤ k                    (thus, all nodes below p are also saturated)

• build the K -level MDD encoding of S init             (if |S init |   = 1, there is one node per level)
• saturate each node at level 1: ﬁre in them all events α s.t. Top(α) = 1
• saturate each node at level 2: ﬁre in them all events α s.t. Top(α) = 2
(if this creates nodes at level 1, saturate them immediately upon creation)

• saturate each node at level 3: ﬁre in them all events α s.t. Top(α) = 3
(if this creates nodes at levels 2 or 1, saturate them immediately upon creation)

• ...
• saturate the root node at level K : ﬁre in it all events α s.t. Top(α) = K
(if this creates nodes at levels K −1, K −2, . . . , 1, saturate them immediately upon creation)

states are not discovered in breadth-ﬁrst order
18
Solution requirements: SMART vs. NuSMV (800MHz P-III)

Time and memory to generate S using saturation in SMART vs. breadth–ﬁrst iterations in NuSMV

Final memory (kB)    Peak memory (kB)          Time (sec)
N            |S|         SMART   NuSMV       SMART   NuSMV          SMART    NuSMV
Dining Philosophers:    K=N
50     2.23×1031            18     10,800        22       10,819     0.15         5.9
200     2.47×10125           74     27,155        93       72,199     0.68    12,905.7
10,000     4.26×106269       3,749         —      4,686           —    877.82          —
Slotted Ring Network:   K=N
10   8.29×109              4      5,287        28       10,819     0.13         5.5
15   1.46×1015            10      9,386        80       13,573     0.39     2,039.5
200   8.38×10211        1,729         —    120,316           —    902.11          —
Round Robin Mutual Exclusion:    K =N +1
20   4.72×107             18      7,300        20        7,306     0.07         0.8
100   2.85×1032           356     16,228       372       26,628     3.81     2,475.3
300   1.37×1093         3,063         —      3,109           —    140.98          —
Flexible Manufacturing System:   K = 19
10   4.28×106             16      1,707        26       11,238     0.05         9.4
20   3.84×109             55     14,077       101       31,718     0.20     1,747.8
250   3.47×1026        25,507         —     69,087           —    231.17          —
Decision–diagram–based CTMC encodings
20
Multiterminal multiway decision diagrams

From BDDs to MDDs: allow multiway choices at each nonterminal node, not just binary choices

From BDDs to MTBDDs: allow multiple terminal nodes, not just 0 and 1

From BDDs to MTMDDs combine both generalizations

We can use a quasi–reduced MTMDD to encode a real matrix A          :S ×S →R
• Nodes are organized into 2K + 1 levels
◦ Variables {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } are mapped onto {2K, 2K −1, ..., 1}
◦ Let v(l) be the variable corresponding to level l ∈ {2K, 2K −1, ..., 1}
◦ Level 2K contains only one root node
◦ Levels 2K −1 through 1 contain one or more nodes, no duplicate nodes allowed
◦ Level 0 contains as many nodes as the different entries in A
• A node at level l > 0, with v(l) = ik or jk , has |Sk | arcs pointing to nodes at level l−1

A[i, j] = x ⇔ path labelled {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } leads to node x at level 0
21
MTMDDs encoding of the transition rate matrix

In principle, the mapping v can be any permutation of the 2K variables

In practice, the order (iK , jK , iK−1 , jK−1 , ..., i1 , j1 ) is usually the best choice
(we still need to decide a good order for the K “from-to” pairs)

When using MTMDDs to store the transition rate matrix, we have a choice:

• Store R : S × S → R                                     (note that   R[i, j] = 0 if i ∈ S and j ∈ S )
◦ a natural choice if we use a compositional approach
• Store R : S × S → R                   (actually, S   × S → R, but R[i, j] = 0 if i ∈ S or j ∈ S )
◦ usually requires more MTMDD nodes
◦ can be built by enumerating the entries explicitly and storing them implicitly in an MTMDD
◦ or by zeroing the rows corresponding to S \ S in the MTMDD encoding of R
i.e., premultiplying R by a ﬁltering diagonal matrix F[i, i] = 1 if i ∈ S , 0 if i ∈ S \ S
22
An example of MTMDD

S4 : {p1,p0} ≡ {0,1}         S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2}    S2 : {s0,s1} ≡ {0,1}            S1 : {t0,t1} ≡ {0,1}

p
R     0 1
a
q               s                                        0 1                      0 1

0 1 2               0 1 2                      0 1 2
b       c       d
0 1 2               0 1 2     0 1 2           0 1 2              0 1 2

r           t
e                       0 1            0 1                 0 1               0 1             0 1

0 1             0 1        0 1       0 1        0 1      0 1         0 1        0 1

0 1                0 1                 0 1               0 1             0 1

0 1        0 1      0 1        0 1       0 1        0 1      0 1             0 1

µa                  µc                   µb               µd             µe
23
Deﬁnition of Kronecker product

Given K matrices Ak    ∈ Rnk ×nk , their Kronecker product is

A=             Ak ∈ RnK ···n1 ×nK ···n1
K≥k≥1

where
• A[i, j] = AK [iK , jK ] · AK−1 [iK−1 , jK−1 ] · · · A1 [i1 , j1 ]
• using the mixed-base numbering scheme (indices start at 0)

i = (...((iK ) · nK−1 + iK−1 ) · nK−2 · · · ) · n1 + i1 =             ik ·           nl
K≥k≥1          k>l≥1

nonzeros:          η                Ak        =           η(Ak )
K≥k≥1                    K≥k≥1
24
Kronecker product by example
                     
b00   b01   b02
a00    a01                                         
Given the real matrices   A=                        and    B =  b10
           b11   b12   

a10    a11
b20       b21   b22

                                                         
a00 b00 a00 b01 a00 b02    a01 b00 a01 b01 a01 b02
                                                           
     a00 b10 a00 b11 a00 b12    a01 b10 a01 b11 a01 b12    
                                                           
                                                           
a00 B    a01 B          a00 b20 a00 b21 a00 b22    a01 b20 a01 b21 a01 b22    
                                                           
A⊗B=                          =                                                           
a10 B    a11 B          a10 b00 a10 b01 a10 b02    a11 b00 a11 b01 a11 b02    
                                                           
                                                           
     a10 b10 a10 b11 a10 b12    a11 b10 a11 b11 a11 b12    
                                                           
a10 b20 a10 b21 a10 b22    a11 b20 a11 b21 a11 b22

Kronecker product expresses contemporaneity or synchronization

If A and B are the transition probability matrices of independent discrete-time Markov chains
⇒ A ⊗ B is the transition probability matrix of their composition
25
Kronecker-consistent decomposition of a CTMC model

A decomposition of a discrete-state model describing a CTMC is Kronecker-consistent if:

• the potential transition rate matrix R is additively partitioned                   R=        α∈E   Rα

• S = SK × · · · × S1 , a global state i consists of K local states                 i = (iK , . . . , i1 )

• and, most importantly, we can multiplicatively partition each Rα , that is, we can write
λα (i) = λK,α (iK ) · · · λ1,α (i1 )   and      ∆α (i, j) = ∆K,α (iK , jK ) · · · ∆1,α (i1 , j1 )

Rα = RK,α ⊗ · · · ⊗ R1,α

We encode the potential transition rate matrix R with |E| × K small matrices          Rk,α ∈ Rnk ×nk

for stochastic Petri nets with transition rates depending on at most one
place, any partition of the places into K subsets is consistent
(even with inhibitor, reset, or probabilistic arcs)
26
Kronecker description of the transition rate matrix of a CTMC

• Parallel composition of K submodels with overall event set E (synchronizing vs. local)
• Global state i is a K -tuple (iK , ..., i1 ) of local states           S ⊆ S = SK × · · · × S1
• Transition rate matrix R = R[S, S] where R =                           Rk,α
α∈E K≥k≥1

 λk,α (ik ) · ∆k,α (ik , jk ) if α is in submodel k

• Rk,α [ik , jk ] =   1                            if α is not in submodel k and ik = jk


0                            if α is not in submodel k and ik = jk

encode a huge R with K                   · |E| “small” matrices
“On the stochastic structure of parallelism and synchronisation models for distributed algorithms”
Plateau (SIGMETRICS 1985)

factor K slowdown, still needs a probability vector of size |S|
“Complexity of memory-efﬁcient Kronecker operations with applications to the solution of Markov mod-
els” Buchholz, Ciardo, Donatelli, Kemper (INFORMS J. Comp., 2000)
27
Kronecker encoding of R                     (K = 5)
S5 = ?                  S4 = ?                S3 = ?                   S2 = ?                S1 = ?

EVENTS →
L              0                                                   0
E    R5,a: ?           I            I          I         R5,e: ?                                 p
V
1                                                   0
E
L              0         0         0                                                         a
R4,a: ?     R4,b: ?   R4,c: ?             I             I                       q               s
S              0         0         1
↓
0         0                           0
I        R3,b: ?     R3,c: ?         I         R3,e: ?
1         0                           1             b       c       d
0                                     0
R2,a: ?           I            I      R2,d: ?           I
0                                     1                               r           t
e
0          0
I            I            I      R1,d: ?     R1,e: ?
0         10

we determine a priori from the model whether Rk,α = I
28
Kronecker encoding of R               =             α∈{a,b,c,d,e}    5≥k≥1 Rk,α

S5 : {p1,p0}≡{0,1} S4 : {q 0,q 1}≡{0,1} S3 : {r0,r1}≡{0,1} S2 : {s0,s1}≡{0,1} S1 : {t0,t1}≡{0,1}

EVENTS →
L             a
0 γ5                                                         0 0
E    R5,a:              I                I           I          R5,e:    e                      p
V
0 0                                                          γ5 0
E             a            b
L          0 γ4         0 γ4       0 0                                                      a
R4,a:        R4,b:      R4,c: c                 I                  I           q               s
S          0 0          0 0       γ4 0
↓                                    c
0 0       0 γ3                                  0 0
I       R3,b: b   R3,c:                    I          R3,e:    e
γ3 0       0 0                                   γ3 0    b       c       d
a
0 γ2                                       0 0
R2,a:              I                I   R2,d:     d                I
0 0                                       γ2 0                           r           t
d                                 e
0   γ1             0 0
I             I                I   R1,d:              R1,e:    e
0 0                γ1 0

a    a    a
R[00i3 0i1 , 11i3 1i1 ] = γ5 · γ4 · γ2
a         a
Not a canonical representation: changing to γ5 ·2 and γ4 /2 would describe the same R
29
Kronecker encoding of R                       (K = 4)
S4 = ?                         S3 = ?                          S2 = ?                              S1 = ?

EVENTS →                                                                                         p
L
E    R4,a :?        I             I              I         R4,e :?
V                                                                                                  a
E                                                                                         q                s
L
S    R3,a :?     R3,b :?       R3,c :?           I         R3,e :?
↓
b        c       d
R2,a :?        I             I          R2,d :?           I

I            I             I          R1,d :?       R1,e :?                         r           t
e

The matrices for b and c differ only at level 3: we can merge them into a single (local) event l

we determine automatically from the model whether Rk,α = I
30
Kronecker encoding of R                  =        α∈{a,l,d,e}           4≥k≥1 Rk,α

S4 : {p1,p0} ≡ {0,1}   S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2}   S2 : {s0,s1} ≡ {0,1}    S1 : {t0,t1} ≡ {0,1}

EVENTS →
0 µa                                                0 0                                     p
L    R4,a :      4       I                      I         R4,e: e
E            0 0                                                µ4 0
V                                                                                              a
E              a
0 µ3 0      0 0 0                                   0 00                        q                s
L
S    R3,a:0 0 0 R3,l:0 0 µc 
3                 I        R3,e: 0 0 0
b
0 0 0       0 µ3 0                                 µe 0 0
3
↓
b        c       d
0   µa
2                            0 0
R2,a:                 I           R2,d:                   I
0 0                               µd 0
2
r           t
0 µd
1              0 0                                  e
I             I           R1,d:             R1,e: e
0 0              µ1 0

We have merged b and c into a single (local) event l
The matrix R encoded by the Kronecker descriptor (K                                 = 4)
31

00   00   00   00   00   00   11   11   11   11   11   11
00   00   11   11   22   22   00   00   11   11   22   22
p                    00   11   00   11   00   11   00   11   00   11   00   11
01   01   01   01   01   01   01   01   01   01   01   01
•                                        •   •     •   •
0000•   ··   ··   ··   ··   ··   ··   ··   ··   ··   a·   ··   ··
a                0001    ··   ··   ··   ··   ··   ··   ··   ··   ··   ·a   ··   ··
q                s        0010    ·d   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
0011    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
0100    ··   ··   ··   ··   c·   ··   ··   ··   ··   ··   ··   ··
0101    ··   ··   ··   ··   ·c   ··   ··   ··   ··   ··   ··   ··
0110    ··   ··   ·d   ··   ··   c·   ··   ··   ··   ··   ··   ··
b        c       d            0111    ··   ··   ··   ··   ··   ·c   ··   ··   ··   ··   ··   ··
0200    ··   ··   b·   ··   ··   ··   ··   ··   ··   ··   ··   ··
0201    ··   ··   ·b   ··   ··   ··   ··   ··   ··   ··   ··   ··
0210    ··   ··   ··   b·   ·d   ··   ··   ··   ··   ··   ··   ··
0211    ··   ··   ··   ·b   ··   ··   ··   ··   ··   ··   ··   ··
r           t
e                1000    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
1001    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
1010    ··   ··   ··   ··   ··   ··   ·d   ··   ··   ··   ··   ··
{p1,p0 }≡ {0,1}      1011    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··
1100    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   c·   ··
1101•   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ·c   ··
{q 0 r0,q 1 r0,q 0 r1 }≡ {0,1,2}   1110•   ··   ··   ··   ··   ··   ··   ··   ··   ·d   ··   ··   c·
1111    ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ··   ·c
{s0,s1 }≡ {0,1}      1200    ··   ··   ··   ··   ··   ··   ··   ··   b·   ··   ··   ··
1201•   e·   ··   ··   ··   ··   ··   ··   ··   ·b   ··   ··   ··
{t0,t1 }≡ {0,1}      1210•   ··   ··   ··   ··   ··   ··   ··   ··   ··   b·   ·d   ··
1211    ··   e·   ··   ··   ··   ··   ··   ··   ··   ·b   ··   ··
32
Matrix diagrams (MXDs)

A generalization of the idea of Kronecker encoding of a matrix

Allows us to enforce knowledge of the reachable states S to the potential transition rate matrix R

0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2
An example of (non–canonical) MXD:            0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
000                                          70
001                                       90 10
R[001,210] = 5*2*9 = 90            002                                             20
010                     56       40
R[111,211] = 2*(2*5+1*1) = 22
011                        35 42    25 30
0 1   2              012                        77       55
0   1   5              100
1   1   2              101
2 3     2              102
110                             16            56    32 14
0 1     0 1   0 1   0 1        111                                  10 12 72 8     18 22 24
0       0       0   2 0            112                                  22          16    44 4
1     7 1     2 1 1   1 4 21       200            42                                      28
201          54 6                                   36 4
0 1 2   0 1 2               202               12                                      8
0 8     0   7                 210 24                                     16
1   5 6 1 9 1                 211    15 18                                  10 12
2   11  2     2               212    33                                     22
33
How to build an MXD

Two methods for building MXDs in our tool SMART:

• From the Kronecker encoding
◦ Build the matrices for the Kronecker encoding
◦ Insert the matrices in an MXD, removing duplicate (and redundant?) ones
◦ Since we start from a Kronecker encoding, we have the same limitations
◦ Similar memory requirements to a Kronecker encoding in practice
◦ We can use them to store R instead of R

• From an explicit enumeration of the entries
◦ Requires the use of canonical MXDs [Miner PNPM’01]
◦ Entries are added individually (or in batches for greater efﬁciency)
◦ Requires more time (additional overhead prior to numerical solution) and memory
◦ They are fully general, can encode any matrix, do not need Kronecker consistency

MXDs exploit the presence of identity matrices Rk,α
(skipped levels)
34
EVBDDs by example

[Lai et al. 1992] deﬁned edge–valued binary decision diagrams

0                           0                        -3
i3 0 0 0 0 1 1 1 1                          0 1                         0 1                      0 1
0          2                1          1             2         3
i2 0 0 1 1 0 0 1 1                  0 1            0 1        0 1              0 1      0 1              0 1

0       3 0          -1   0         1 2        -1   2        2 3         -1
i1 0 1 0 1 0 1 0 1                  0 1            0 1        0 1              0 1      0 1              0 1

0 2       0 -1            -1 1       1 0            -1 1       2 1

f 02322410                                   T                          T                         T

Canonicity: all nodes have 0–value on the 0–arc                         (only the ﬁrst EVBDD is canonical)

In canonical form, the root incoming edge has value f (0 · · · 0)
EV+MDDs by example
35

[Ciardo and Siminiceanu 2002] deﬁned edge–valued positive multiway decision diagrams

From BDD to MDD: the usual extension
We allow ∞–edge values: can store partial arithmetic functions
New canonization rule: essential to encode partial arithmetic functions

0                                                       0
i3 0 0 0 0 1 1 1 1              0 1              i3 0 0 0 0 1 1 1 1                     0 1
0         0                                             0         0
i2 0 0 1 1 0 0 1 1     0 1            0 1        i2 0 0 1 1 0 0 1 1             0 1           0 1
0        2 2        0                                0       3         4       0
i1 0 1 0 1 0 1 0 1     0 1            0 1        i1 0 1 0 1 0 1 0 1       0 1   0                 1   0 1
0 2       1 0                                    0 2 0            0 1 0
f 02322410                      T                f 0 2 3 ∞∞ 4 1 0                       T

Canonicity: all edge values are non–negative and at least one is zero

In canonical form, the root incoming edge has value mini∈S f (i)
b

f (1, 0, 0) = ∞ but f (1, 0, 1) = 4
the traditional EVMDD normalization cannot represent this function
Deﬁnition of EV+MDDs
36

(differences from EVBDDs of [Lai et al. 1992] are in blue)

Given f : S   → Z ∪ {∞}, an EV+MDD encoding f is a DAG with labelled edges such that:

• There is a single terminal node 0|T

• There is a single root K|r with a “dangling” incoming edge having value ρ ∈ Z ∪{∞}

• Non-terminal node k|p has nk edges, k|p [ik ].child has value k|p [ik ].val ∈ N ∪ {∞}

• If k|p [ik ].val = ∞, the value of k|p [ik ].child is irrelevant

• If k|p [ik ].val ∈ N, k|p [ik ].child is the index of a node at level k − 1

• Each non-terminal node has at least one outgoing edge labelled with 0 (if not all ∞)

• All nodes are unique (taking into account both k|p [ik ].child and k|p [ik ].val)

analogous deﬁnition uses real values instead of integers

Theorem: EV+MDDs are canonical
Using an EV+MDD to store the indexing function ψ
37

The “MDD with offsets” used to store and evaluate ψ can be formalized as an EV+MDD

0

19 0 1 2 3                                                           0 1 2 3
0 1 6 11                                                    0        16            11

1 2 5 0 1 2 8 0 1 2                                     0 1 2            0 1 2           0 1 2
0   0 2 4   0 1 2                                               0 0 2           4 0    1    2
is equivalent to
2 0 1      1 1 6 0 1                                            0 1      0 1           0 1
0 1        0   0 3                                             01      0             0 3

1 0      3 0 1 2                                           0 1 2             0 1 2
0        0 1 2                                               0            01 2

T

∈S
lexicographic order for i
ψ(i) = ∞ ⇔ i ∈ S
EV∗MDDs by example
38

One way to think about EV∗MDDs is        “EV+MDD = − log(EV∗MDD)”:
0 ⇔ 1
edge values ∈ [0, +∞] ⇔ edge values ∈ [0, 1]
root incoming edge ∈ (−∞, +∞] ⇔ root incoming edge ∈ [0, +∞)
values add along the path ⇔ values multiply along the path

7
i3    0    0    0    0 1     1    1    1                        0 1
1         1
i2    0    0    1    1 0     0    1    1                  0 1           0 1
1       0.3       0.4 1
i1    0    1    0    1 0     1    0    1            0 1   0                 1   0 1
0.5 1 1            1 1 0.2
f    3.5 7 2.1 0 0 2.8 7 1.4                                        T

Canonicity: all edge values are in [0, 1] and at least one is 1

In canonical form, the root incoming edge has value maxi∈S f (i)
b
Encoding R with an EV∗MDD: initial non-canonical EV∗MDDs
39

We can store R with a 2K –level EVMDD: consider the example of Kronecker encoding
R=      α∈{a,l,d,e}   Rα =        α∈{a,l,d,e}         4≥k≥1   Rk,α

Ra    0 1         Rl           0 1            Rd       0 1                   Re    0 1

0 1                0 1        0 1              0 1       0 1                 0 1
µa4                                                                       µe
4
0 1 2                0 1 2                       0 1 2                       0 1 2

0 1 2             0 1 2        0 1 2    0 1 2    0 1 2          0 1 2        0 1 2
µa
3                µc
3         µb                                       µe
3
3
0 1                      0 1                     0 1                         0 1

0 1                0 1        0 1                0 1                    0 1          0 1
µa2                                              µd
2
0 1                      0 1                     0 1                         0 1

0 1       0 1          0 1        0 1                0 1                         0 1
µd
1                      µe
1
T                       T                          T                           T

Encoding R with an EV∗MDD: canonical EV∗MDDs
40

(assume that µb is the largest rate in R)
3

µa µa µa
4 3 2                    µb
3                     µd µd
2 1                       µe µe µe
4 3 1

Ra    0 1         Rl         0 1             Rd       0 1                   Re    0 1

0 1                0 1        0 1             0 1       0 1                 0 1

0 1 2                  0 1 2                    0 1 2                       0 1 2
µc /µb
3 3
0 1 2             0 1 2        0 1 2   0 1 2    0 1 2          0 1 2        0 1 2

0 1                    0 1                      0 1                         0 1

0 1                0 1        0 1               0 1                    0 1        0 1

0 1                    0 1                      0 1                         0 1

0 1       0 1          0 1        0 1               0 1                         0 1

T                      T                          T                         T
Encoding R with an EV∗MDD: ﬁnal EV∗MDD
41

Use a recursive algorithm to compute R      =      α∈{a,l,d,e}         Rα

µb
3
R      0 1

0 1                    0 1
µa µa µa /µb
4 3 2 3                                           µe µe µe /µb
4 3 1 3

0 1 2                       0 1 2                         0 1 2
µdµd/µb
2 1 3          µc /µb
3 3
0 1 2     0 1 2                     0 1 2          0 1 2            0 1 2
µdµd/µb
2 1 3                     µdµd/µb
2 1 3
0 1                      0 1       0 1                    0 1

0 1            0 1        0 1           0 1      0 1

0 1      0 1          0 1

0 1        0 1

T

hidden identity patterns remain!!!
42
Empirical comparison

Memory consumption in bytes for:
S (MDD), R (Sparse), R (Kronecker), R and R (Pot/Act MXD), R and R (Pot/Act MTMDD)

Model     N       |S|      |S| MDD         Sparse Kron        Pot     Act      Pot       Act
MXD     MXD    MTMDD    MTMDD
2
qn4        324         324     333      14256     772     586     722    22784     22784
6      38416       38416     499    2524480    3092    2494    2870    36864     36864
10    527076      527076     905   38524464    7076    5778    6522    62720     62720
qn8 2         6561         324     681      14256    1204     738    1688    43776     49152
6    5764801       38416    1119    2524480    2404    1674    5872    55040     70912
10 214358881      527076    1953   38524464    3604    2610   12040    66304     98560
mserv2 3       1485         495     705      23352    4124    3246    3952    34560     40704
6       6345        2115    3176     111408   17468   13998   16432   111104    135168
10     18495        6165    8846     342720   52228   42278   49032   306560    378460
mserv4 3      14256         495    1174      23352    5568    4098    4916    68864     79616
6     106596        2115    8453     111408   22920   17502   20054   254360    298856
10    488268        6165   33739     342720   67560   52342   58934   873896    998552
mserv6 3      32076         495    1333      23352    5724    4066    5316    86784    101376
6     239841        2115    8614     111408   23076   17470   20238   298596    347956
10   1098603        6165   33900     342720   67716   52310   59118   982396   1112684
Model     N        |S|     |S| MDD          Sparse Kron        Pot      Act      Pot       Act
MXD      MXD    MTMDD    MTMDD
molloy4    5       4536      91     660        4204    1316    1148     2534    23552     28160
8      32805     285    1215       14676    2528    2300     5216    27648     38656
10     87846     506    1766       27104    3556    3288     7504    31232     47360
molloy5    5       7776      91     846        4204    1100     792     4298    28416     37120
8      59049     285    1545       14676    1592    1188     9356    31232     50944
10    161051     506    2223       27104    1920    1452    13778    33280     61952
kanban3    1        160     160     264        8032     500     412      544    18432     18432
3      58400   58400     937     5590400    7572    6786     8134    66816     67072
5    2546432 2546432    5646   303705920   45660   41816    48780   303776    303776
kanban4    1        256     160     332        8032     420     354      602    23552     24576
3     160000   58400     628     5590400    2500    2216     3284    44032     50176
5    9834496 2546432    1532   303705920    7940    7118     9950    92928    110592
kanban16   1      65536     160    1275        8032    2148     866     3000    95232    107520
3    Overﬂow   58400    1902     5590400    3236    1746    10566   115456    151808
5    Overﬂow 2546432    3149   303705920    4324    2626    24106   135168    216320
fms5     1       2100      84     535        3228    1456     604     1808    36096     40960
3    9432500   20600    3294     1554080    8304    5224    24320   151296    247040
5 2016379008 852012    30490    82727748   34484   24664   138244   654892   1255108
fms21     1    4194304      84    2050        3228    3132    1132     7396   126976    148224
3    Overﬂow   20600    6777     1554080    5028    2328    68762   176896    437760
5    Overﬂow 852012    22038    82727748    6924    3524   255988   235008   1393932
44
Conclusions

MTMDDs work best when many nonzero entries in R have the same value
Extensive presence of state–dependent rates can make MTMDDs inefﬁcient
Kronecker, MXDs, and EV∗MDD remain instead efﬁcient if the model is Kronecker–consistent

Size of Kronecker is not affected by variable ordering
MTMDDs, MXDs, EV∗MDDs, and MDDs are instead affected by variable ordering

Kronecker and MXDs are restricted to contiguous “(ik , jk )” or “(jk , ik )” variable ordering
MTMDDs and EV∗MDDs can instead have any ordering (but is this generality useful in practice?)

Kronecker is quite efﬁcient but more restrictive
MXDs and EV∗MDDs are fully general and their size is similar to Kronecker, if it exists
MTMDDs are also fully general, their size tends to be larger than Kronecker, MXDs, and EV∗MDDs

MTMDDs, MXDs, and EV∗MDDs can encode R instead of R, but memory requirements increase
Kronecker must rely on an external (MDD) representation of S to zero unreachable rows

Fundamental advantage of Kronecker, MXDs over MTMDDs:
they exploit the presence of numerous identity matrices in the description of R (also
true when encoding just N , e.g., in symbolic model-checking)
As presented, EV∗MDDs do not exploit identities, but we are working on that
Solution algorithms
46
Potential Kronecker: the shufﬂe algorithm PSh

First algorithm to be proposed for Kronecker solution [Plateau SIGMETRICS 1985]

PSh computes y ← x · ⊗K≥k≥1 Ak

PSh + computes y ← x · InK ···nk+1 ⊗ Ak ⊗ Ink−1 ···n1

Based on the equality [Davio 1981]

Ak =             ST K ···nk+1 ,nk ···n1 ) · (I|S|/nk ⊗ Ak ) · S(nK ···nk+1 ,nk ···n1 )
(n                           b
K≥k≥1           K≥k≥1

where S(a,b)   ∈ {0, 1}a·b×a·b is an (a, b)-perfect shufﬂe permutation:

1 if j = (i mod a) · b + (i div a)
S(a,b) [i, j] =
0 otherwise

Requires
• K vector permutations and
• K multiplications x · (I|S|/nk ⊗ Ak ).
b

Complexity of the k -th multiplication:   O(|S|/nk · η[Ak ]).
47
Potential Kronecker: the shufﬂe algorithm PSh

b b
PSh (in: nK , ..., n1 , AK , ..., A1 ; inout: x, y);
1.     nleft ← 1;
2.     nright ← nK−1 · · · n1 ;
3.     for k = K down to 1
4.         base ← 0;
5.         jump ← nk · nright ;
6.         if Ak = I then
7.             for block = 0 to nleft − 1
8.                  for oﬀset = 0 to nright − 1
9.                      index ← base + oﬀset ;
10.                      for h = 0 to nk − 1
11.                          zh ← xindex ;
b
12.                          index ← index + nright ;
13.                      z ← z · Ak ;
14.                      index ← base + oﬀset ;
15.                      for h = 0 to nk − 1
16.                          yindex ← zh ;
b
17.                          index ← index + nright ;
18.                  base ← base + jump ;
19.             x ← y;
b        b
20.         nleft ← nleft · nk ;
21.         nright ← nright /nk−1 ;                          Let n0 be 1
48
Example of shufﬂe computation

y ← x · (A ⊗ B)                                  Follow the entries marked with a diamond to obtain y2

x                 A   B                x

a 00B   a 01B                     T
S2,3
I3       A
A        0       0
a 10B   a 11B            u
0        A       0

y                                            0        0       A

y ← x · ST ·(I3 ⊗ A) ·S2,3 ·ST · (I2 ⊗ B) · S6,1
2,3                 6,1
v
| {z }
S2,3
u                                                                                              I2   B
|       {z        }                                          w
v
|            {z           }                                                                    B            0
w
|                         {z                  }
y
y2 ←                                                                                                 0        B
B02 w0 +B12 w1 +B22 w2 =
B02 v0 +B12 v2 +B22 v4 =
y
B02 (u0 A00 +u1 A10 )+B12 (u2 A00 +u3 A10 )+B22 (u4 A00 +u5 A10 ) =
B02 (x0 A00 +x3 A10 )+B12 (x1 A00 +x4 A10 )+B22 (x2 A00 +x5 A10 ) =
A00 B02 x0+A00 B12 x1+A00 B22 x2+A10 B02 x3+A10 B12 x4+A10 B22 x5
49                                                                         +
Complexity of PSh and PSh
                           

PSh has complexity O              |S|/nk · η[Ak ] = O |S| · K · α
K≥k≥1

Even when S   = S , PSh is faster than Ordinary explicit multiplication only if
1
K
|S| · K · α < |S| · α         ⇔ α>K      K−1

PSh + has complexity O |S|/nk · η[Ak ] = O |S| · α

Complexity of computing y   ← y+x·          K≥k≥1       Ak :
                                                        
η[Ak ] 
O             |S|/nk · η[Ak ] = O |S|                          = O |S| · K · α
nk
K≥k≥1                                     K≥k≥1

Ordinary is faster than PSh if α ≤ 1
PSh + saves space, but not time, w.r.t. Ordinary
PRw and PRw +
50
Potential Kronecker:

b
PRwEl (in: i, x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1.         for each jK s.t. AK [iK , jK ] > 0
2.              jK ← jK ; aK ← AK [iK , jK ];
3.              for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
4.                   jK−1 ← jK · nK−1 + jK−1 ; aK−1 ←                  aK · AK−1 [iK−1 , jK−1 ];
...
5.                  for each j1 s.t. A1 [i1 , j1 ] >   0
6.                       j1 ← j2 · n1 + j1 ; a1        ← a2 · A1 [i1 , j1 ];
7.                       yj ← yj + x · a1 ;
b 1
b  1

b                                       b
PRw (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1.                    b
for i = 0 to |S| − 1
2.                      b                              b
PRwEl (i, xi , nK , ..., n1 , AK , .., A1 , y);
PRwEl + (in: nk , nk−1 · · · n1 , i− , ik , i+ , x, Ak ; inout: y)
k         k                  b
1.                                      >0
for each jk s.t. Ak [ik , jk ]
2.               j ← (i− ·
k       nk + jk ) · nk−1 · · · n1 + i+ ;
k
3.               yj ← yj
b     b       + x · Ak [ik , jk ];

PRw + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
b                                                   b
1.         for   i ≡ (i− , ik , i+ ) = 0 to nK · · · nk+1 · nk · nk−1 · · · n1 − 1
k         k
2.               PRwEl + (nk , nk−1 · · · n1 , i− , ik , i+ , xi , Ak , y);
k         k b           b
51                                                                       +
Complexity of PRw and PRw

PRw computes y ← y + x · A, according to the deﬁnition of Kronecker product
Requires sparse row-wise format for each Ak

PRwEl computes the contribution of xi to each entry of y as
y ← y + xi · Ai,S
b

PRwEl reaches statement      ak ← ak−1 · Ak [ik , jk ]   O(αk ) times.

PRw makes |S| calls to PRwEl , hence has complexity
               
|S| ·         k       O |S| · K = O(K · η[A])             if α   ≤1
O              α      =
K≥k≥1            O(|S| · αK ) = O(η[A])              if α   >1

+                          η[Ak ]
PRw       has complexity O |S| ·          = O |S| · α
nk

Complexity of computing y   ← y+x·        K≥k≥1 Ak using PRw + : O |S| · K · α

PRw amortizes the multiplications for aK−1 ,...,a2 only if α                1
PRw + saves space, but not time, w.r.t. Ordinary
PRwCl and PRwCl +
52
Potential Kronecker:

b                                       b
PRwCl (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1.       for iK = 0 to nK − 1
2.            for each jK s.t. AK [iK , jK ] > 0
3.                 jK ← jK ; aK ← AK [iK , jK ];
4.                 for iK−1 = 0 to nK−1 − 1
5.                      for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
6. . . .                     jK−1 ← jK · nK−1 + jK−1 ; aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
7.                           for i1 = 0 to n1 − 1
8.                                for each j1 s.t. A1 [i1 , j1 ] > 0
9.                                     j1 ← j2 · n1 + j1 ; a1 ← a2 · A1 [i1 , j1 ];
10.
1
yj ← yj + xi · a1 ;
b     1
b       b

The overall complexity is O       |S| · αK
PRwCl + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
b                                                   b
−
1.   for ik = 0 to nK · · · nk+1 − 1
2.        for ik = 0 to nk − 1
3.             for each jk s.t. Ak [ik , jk ] > 0
4.                  jk ← i− · nk + jk ;
k
+
5.                  for ik = 0 to nk−1 · · · n1 − 1
6.                   jK ← jk · nk−1 · · · n1 + i+ ;
k
7.                   yjK ← yjK + x(i− ,i ,i+ ) · Ak [ik , jk ];
b     b      b
k   k   k
53
Potential Kronecker:                                  PRwCl
a0         a1                      b0      b1                c0   c1
x·A=x·                                                ⊗                            ⊗
a2         a3                      b2      b3                c2   c3
PRw                 1
a0 b0 c0
2
c1
3
b1 c0
4
c1
5
a1 b0 c0
6
c1
7
b1 c0
8
c1
a0
b0             b1
a1
b0           b1
PRwCl
1    2         5    6              17 18        21 22
9    10    11 12             13 14    15 16                   c0   c1        c0   c1             c0 c1        c0 c1
a0 b0 c2   c3 b1 c2 c3       a1 b0 c2 c3 b1 c2 c3                   3    4         7    8              19 20        23 24
c2   c3        c2   c3             c2 c3        c2 c3
17 18    19 20               21 22    23 24
a0 b2 c0 c1 b3 c0 c1         a1 b2 c0 c1 b3 c0 c1              b2             b3                  b2           b3
9 10           13 14               25 26        29 30
c0 c1          c0 c1               c0 c1        c0 c1
25 26    27 28               29 30    31 32                   11 12          15 16               27 28        31 32
a0 b2 c2 c3 b3 c2 c3         a1 b2 c2 c3 b3 c2 c3                   c2 c3          c2 c3               c2 c3        c2 c3

33 34    35 36               37 38    39 40         a2                                 a3
a2 b0 c0 c1 b1 c0 c1         a3 b0 c0 c1 b1 c0 c1              b0             b1                  b0           b1
33 34          37 38               49 50        53 54
c0 c1          c0 c1               c0 c1        c0 c1
41 42    43 44               45 46    47 48
a2 b0 c2 c3 b1 c2 c3         a3 b0 c2 c3 b1 c2 c3                   35 36          39 40               51 52        55 56
c2 c3          c2 c3               c2 c3        c2 c3
49 50    51 52               53 54    55 56              b2             b3                  b2           b3
a2 b2 c0 c1 b3 c0 c1         a3 b2 c0 c1 b3 c0 c1                   41 42          45 46               57 58        61 62
c0 c1          c0 c1               c0 c1        c0 c1
57 58    59 60               61 62    63 64                   43 44          47 48               59 60        63 64
a2 b2 c2 c3 b3 c2 c3         a3 b2 c2 c3 b3 c2 c3                   c2 c3          c2 c3               c2 c3        c2 c3

Each “b” and “c” box corresponds to one multiplication: 8 × 8 = 64 entries of the form ai bj cl
• Computing each entry from scratch: 64 × 2 = 128 multiplications
• Using PRw : 64 + 32 = 96 multiplications
• Using PRwCl : 64 + 16 = 80 multiplications: interleaving helps!

same complexity as Ordinary regardless the sparsity level
but the entries of A are not generated in column order
54
Actual Kronecker:     ARw

ARw (in: x, AK , ..., A1 , S ; inout: y)
1.         for each i  ∈S
2.             I ← ψ(i);
3.             for each jK s.t. AK [iK , jK ] > 0
4.                  aK ← AK [iK , jK ];
5.                  for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
6.                       aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
...
7.                                                  >0
for each j1 s.t. A1 [i1 , j1 ]
8.                             a1 ← a2 · A1 [i1 , j1 ];
9.                             J ← ψ(j);
10.                             yJ ← yJ + xI · a1 ;

= ψ(j) of state j in the array y.
Statement 9 computes the index J
                         
k  K
O (|S|·(K +log |S|)) if α ≤ 1
O |S|·       α +α · log |S| =
K≥k≥1                       O |S|·αK ·log |S|      if α > 1

if K     < log |S|: ARw has a log |S| overhead w.r.t. Ordinary
ARwCl and ARwCl +
55
Actual Kronecker:

ARwCl (in: x, AK , ..., A1 , S ; inout: y)
1.         for each iK  ∈ SK                                                            all local states iK
2.            IK ← ψK (iK );
3.            for each jK s.t. AK [iK , jK ] > 0
4.                 JK ← ψK (jK );
5.                 if JK = null then
6.                     aK ← AK [iK , jK ];
7.                     for each iK−1 ∈ SK−1 (iK )                           all iK−1 compatible with iK
8.                          IK−1 ← ψK−1 (IK , iK−1 );
9.                          for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
10.                               JK−1 ← ψK−1 (JK , jK−1 );
11.                               if JK−1 = null then
12.                                   aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
...
13.                              for each i1  ∈ S1 (iK , ..., i2 )        all i1 compatible with iK , ..., i2
14.                                 I1 ← ψ1 (I2 , i1 );
15.                                 for each j1 s.t. A1 [i1 , j1 ] > 0
16.                                      J1 ← ψ1 (J2 , j1 );
17.                                      if J1 = null then
18.                                           a1 ← a2 · A1 [i1 , j1 ];
19.                                           yJ1 ← yJ1 + xI1 · a1 ;

note the need for EV+MDDs to index the state space
56
Actual Kronecker:            ARwCl
                                       

Complexity of ARwCl :         O           |S1 | · · · |Sk | · αk · log nk  = O(|S| · αK · log nK )
K≥k≥1

(assuming that |S1 | · · · |SK−1 |     |S|)

+
Complexity of ARwCl :         O(|S| · α · log nK )

regardless of k , and the resulting complexity of computing
              

y ← y+x·                   Ak 
K≥k≥1
S,S

+
using ARwCl        is   O (K · |S| · α · log nK )

only log nK overhead w.r.t. Ordinary for any sparsity level
but it cannot be used in a Gauss-Seidel iteration
57
Results for the numerical solution

Matrix diagrams achieve the same efﬁciency as multiplication by blocks (ARwCl ). . .

. . . but can provide access by columns as required by Gauss–Seidel

Time requirements for the Kanban model (450MHz Pentium workstation with 384 Mbytes RAM)

number              MXDs                  Kronecker                  Explicit
N      |S|        of arcs     Gauss–Seidel          Gauss–Seidel        Jacobi       Gauss–Seidel
in R       Iters     sec/iter Iters    sec/iter Iters sec/iter Iters    sec/iter
2       4,600       28,120      40           0.11    55     0.17   134       0.09     55      0.02
3      58,400      446,400      67           1.46    97     2.56   240       1.34     97      0.34
4     454,475     3,979,850     99       12.33      149    23.69   370      11.99    149      3.04
5    2,546,432   24,460,016   139        73.09      214   147.70   527      74.09    214    18.51
6   11,261,376 115,708,992    185       336.21      289   723.30   713 359.15         —           —
7   41,644,800 450,455,040    238 1,289.91          374 2,922.80     —           —    —           —

```
Related docs