Modeling and Analysis of Markov Chains Using Decision Diagrams
Document Sample


Modeling and Analysis of Markov Chains
Using Decision Diagrams
Gianfranco Ciardo
Department of Computer Science and Engineering
University of California at Riverside
Riverside, CA 92521, USA
ciardo@cs.ucr.edu
(partially based on work with Andrew S. Miner and Paul L. E. Grieco)
2
Outline
• Background on CTMCs
◦ Continuous–time Markov chains and high–level CTMC models
◦ Matrix–vector description and operations for CTMC solution
◦ Sparse matrices
• Structured models
◦ High–level CTMC models structured into submodels
◦ State space and state indexing: potential vs. actual indices
◦ Multiway decision diagrams (MDDs)
• Decision–diagram–based CTMC encodings
◦ Multiterminal multiway decision diagrams (MTMDDs)
◦ Kronecker descriptors
◦ Matrix diagrams (MXDs)
◦ Edge–valued multiway decision diagrams (EVMDDs)
• Solution algorithms
◦ Vector–matrix multiplication with Kronecker–based encodings
◦ Jacobi vs. Gauss–Seidel style iterations
Background on CTMCs
4
Discrete-state models
A discrete state model is fully specified by:
• a potential state space S the “type” of the states
• a set of initial states S init ⊆ S often there is a single initial state sinit
b
• a next-state function N : S → 2S naturally extended to sets: N (X ) = i∈X N (i)
The state space S of the model is the smallest set containing S init and satisfying:
• the recursive definition i ∈ S ∧ j ∈ N (i) ⇒ j ∈ S
• or the fixed-point equation X = X ∪ N (X )
S = S init ∪ N (S init ) ∪ N 2 (S init ) ∪ N 3 (S init ) ∪ · · · = N ∗ (S init )
5
Definition of continuous-time Markov chain
A stochastic process {X(t) : t ≥ 0} is a collection of r.v.’s indexed by a time parameter t
We say that X(t) is the state of the process at time t
The possible values X(t) can ever assume for any t is (a subset of) the state space S
{X(t) : t ≥ 0} over a discrete S is a continuous-time Markov chain (CTMC) if
Pr {X(tn+1 ) = in+1 | X(tn ) = in ∧ X(tn−1 ) = in−1 ∧ . . . ∧ X(t0 ) = i0 }
= Pr {X(tn+1 ) = in+1 | X(tn ) = in }
for any times 0 ≤ t0 ≤ . . . ≤ tn−1 ≤ tn ≤ tn+1 and states {i0 , . . . , in−1 , in , jn+1 } ⊆ S
Markov property:
“given the present state, the future is independent of the past”
“the most recent knowledge about the state is all we need”
6
Markov chain description and analysis
A continuous-time Markov chain (CTMC) {X(t) : t ≥ 0} with state space S is described by
• its infinitesimal generator Q = R − diag(R · 1) = R − diag(h)−1 ∈ R|S|×|S|
• its initial probability vector π(0) ∈ R|S|
where
• R is the transition rate matrix: R[i, j] is the rate of going to state j when in state i
• h is the expected holding time vector : h[i] = 1/ j∈S R[i, j]
• π(0)[i] = Pr {chain is in state i at time 0, i.e., initially}
Transient probability vector π(t) ∈ R|S| : π(t)[i] = Pr {X(t) = i}
dπ(t)
• π(t) is the solution of = π(t) · Q with initial condition π(0)
dt
Steady-state probability vector π ∈ R|S| : π[i] = limt→∞ Pr {X(t) = i}
• π is the solution of π · Q = 0 subject to i∈S π[i] = 1 (Q must be ergodic)
7
Storing a CTMC explicitly
We need to store:
• R a real matrix of size |S| × |S|
• h a real vector of size |S|
We focus on R, for which we can employ sparse storage:
• Requires memory proportional to η(R), the number of nonzeros in R, instead of |S|2
• Allows iterative numerical methods to run in time proportional to η(R), instead of |S|2
0 1 2 3 4
α
0 1 0 α β 0 1 α 2 β
γ
ε 1 γ 1 0 γ
β 4 2 α 2 3 α
δ 3 γ δ 3 2 γ 4 δ
α
2 3 4 ε 4 0 ε
γ
8
Exploiting common entries with explicit storage
We can map the real values in R to (small) integer indices:
• Causes a small runtime overhead
• Reduces memory by a constant factor if R contains several entries with the same value
• If all nonzero entries have the same value (e.g., 1), this is essentially the reachability graph
α
0 1 0 1 α 2 β 0 10 2 1
γ
ε 1 0 γ 1 0 2
β 4 2 3 α 2 3 0
δ 3 2 γ 4 δ 3 2 2 43
α
2 3 4 0 ε 4 0 4
γ
0 1 2 3 4
α β γ δ ε
Structured models
10
Structured discrete-state models
A structured discrete state model is specified by
• a potential state space S = SK × · · · × S1 = ×K≥k≥1 Sk
◦ the “type” of the (global) state
◦ Sk is the (discrete) local state space for submodel k
• a set of initial states S init ⊆ S
◦ often there is a single initial state sinit
• a set of events E defining a disjunctively-partitioned next-state function
b
◦ N α : S → 2S j ∈ Nα (i) iff state j can be reached by firing event α in state i
b
◦ N : S → 2S is defined by N (i) = α∈E Nα (i)
◦ we can extend N to take sets of states as argument N (X ) = i∈X N (i)
◦ α is enabled in i iff Nα (i) = ∅, otherwise it is disabled
◦ i is absorbing, or a trap, or dead iff N (i) = ∅
Petri nets and their state space S : finite case
11
e fires p
a
q s
p p b c d p
es
d
fir
fir
a a a
es
c
es
q s q s r t q s
fir
e
b
a fires p
b c d b c d b c d
s
e
fir
a
c
r t r t d q s r t
es
fir
e e es e
fir
b
b c d
r t
e
init (N + 1)(N + 2)(2N + 3)
If the initial state is s = (N , 0, 0, 0, 0), S contains states
6
12
State indexing
Let the size of the k th local state space be nk = |Sk | and map Sk to {0, 1, . . . , nk −1}
S contains |S| = nK · nK−1 · · · n1 states, but not all of them are actually reachable
A potential indexing of a (global) state i = (iK , . . . , i1 ) is simply the mixed-base value of i:
ψ(i) = i k nl
K≥k≥1 k>l≥1
An actual indexing for the reachable states is instead harder to define, store, and compute
ψ : S → {0, 1, . . . , |S|−1} ∪ {null}
in reality, we often have no a priori knowledge of Sk
Explicit generation of S and R
13
Explore(in: S init , N ; out: S, R, ψ) is
1. n ← 0; state indices start at 0
2. S ← ∅; S contains the states explored so far
3. U ← S init ; U contains the unexplored states known so far
init
4. for each i ∈ S do
5. ψ(i) ← n++; assign to i the next available index and increment n
6. end for
7. while U = ∅ do
8. choose a state i in U and move it from U to S ;
9. for each event α ∈ E and each state j ∈ Nα (i) do
10. if j ∈ S ∪ U then search to determine whether j is a new state
11. ψ(j) ← n++; assign to j the next available index and increment n
12. U ← U ∪ {j}; remember to explore j later
13. end if;
14. R[ψ(i), ψ(j)] ← R[ψ(i), ψ(j)] + λα (i)∆α (i, j); ψ is used to index R
15. end for;
16. end while;
ψ : S → {0, ..., |S|−1} ∪ {null} is a state indexing function (e.g., discovery order)
λα (i) is the rate at which event α fires in state i
∆α (i, j) is the probability that, if event α fires in state i, the next state is j
14
(Quasi–reduced ordered) multiway decision diagrams
• Nodes are organized into K + 1 levels
◦ Level K contains only one root node
◦ Levels K −1 through 1 contain one or more nodes, NO DUPLICATES
◦ Level 0 contains only the two terminal nodes, 0 and 1 (false and true).
• For k > 0, a node at level k has |Sk | arcs pointing to nodes at level k−1
S4 = {0, 1, 2, 3} 0 1 2 3
S3 = {0, 1, 2} 0 1 2 0 1 2 0 1 2
S2 = {0, 1} 0 1 0 1 0 1 0 1
S1 = {0, 1, 2} 0 1 2 0 1 2 0 1 2
0 1
0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3
S= 2 0 0 1 1 2 0 0 1 1 2 0 1 2 2 2 2 2 2
1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 1 2
[Kam et al. 1998] defined fully–reduced ordered MDDs as an interface to BDDs
15
How to define and store the indexing function ψ using MDDs
S4 = {0, 1, 2, 3} 0 1 2 3
S3 = {0, 1, 2} 0 1 2 0 1 2 0 1 2
S2 = {0, 1} 0 1 0 1 0 1 0 1
S1 = {0, 1, 2} 0 1 2 0 1 2 0 1 2
0 1
0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3
S= 2 0 0 1 1 2 0 0 1 1 2 0 1 2 2 2 2 2 2
1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 1 2
To compute the index of a state, use edge–value offsets: 19 0 1 2 3
0 1 6 11
• Sum the offsets found on the corresponding path:
1 2 5 0 1 2 8 0 1 2
ψ(2, 1, 1, 0) = 6 + 2 + 1 + 0 = 9 0 0 2 4 0 1 2
• A state is unreachable if the path is not complete: 2 0 1 1 1 6 0 1
0 1 0 0 3
ψ(0, 2, 0, 0) = 0 + 0 + ? + ? = null
1 0 3 0 1 2
lexicographic, not discovery, order!!! 0 0 1 2
16
State indexing options: potential ψ vs. actual ψ
Once we know S :
b
• We can store the original N : S → 2S or its restriction N : S → 2S
• We can store R : S × S → R or R : S × S → R
• We can choose algorithms that use π : S → R or π : S → R
With strictly explicit methods: using actual R and π works best
With (even just partially) implicit methods, there are tradeoffs
• Storing π instead of π is often unavoidable if we employ a full vector
• Symbolic storage of R is much cheaper than that of R in terms of memory requirements
• However, using R in conjunction with π complicates indexing...
• ...at the very least, it forces us to store ψ : S → {0, 1, . . . , |S| − 1} ∪ {null}, hence S
17
Saturation: an iteration strategy based on the model structure
MDD node p at level k is saturated if the set of states it encodes is a fixed point w.r.t.
any α s.t. Top(α) ≤ k (thus, all nodes below p are also saturated)
• build the K -level MDD encoding of S init (if |S init | = 1, there is one node per level)
• saturate each node at level 1: fire in them all events α s.t. Top(α) = 1
• saturate each node at level 2: fire in them all events α s.t. Top(α) = 2
(if this creates nodes at level 1, saturate them immediately upon creation)
• saturate each node at level 3: fire in them all events α s.t. Top(α) = 3
(if this creates nodes at levels 2 or 1, saturate them immediately upon creation)
• ...
• saturate the root node at level K : fire in it all events α s.t. Top(α) = K
(if this creates nodes at levels K −1, K −2, . . . , 1, saturate them immediately upon creation)
states are not discovered in breadth-first order
18
Solution requirements: SMART vs. NuSMV (800MHz P-III)
Time and memory to generate S using saturation in SMART vs. breadth–first iterations in NuSMV
Final memory (kB) Peak memory (kB) Time (sec)
N |S| SMART NuSMV SMART NuSMV SMART NuSMV
Dining Philosophers: K=N
50 2.23×1031 18 10,800 22 10,819 0.15 5.9
200 2.47×10125 74 27,155 93 72,199 0.68 12,905.7
10,000 4.26×106269 3,749 — 4,686 — 877.82 —
Slotted Ring Network: K=N
10 8.29×109 4 5,287 28 10,819 0.13 5.5
15 1.46×1015 10 9,386 80 13,573 0.39 2,039.5
200 8.38×10211 1,729 — 120,316 — 902.11 —
Round Robin Mutual Exclusion: K =N +1
20 4.72×107 18 7,300 20 7,306 0.07 0.8
100 2.85×1032 356 16,228 372 26,628 3.81 2,475.3
300 1.37×1093 3,063 — 3,109 — 140.98 —
Flexible Manufacturing System: K = 19
10 4.28×106 16 1,707 26 11,238 0.05 9.4
20 3.84×109 55 14,077 101 31,718 0.20 1,747.8
250 3.47×1026 25,507 — 69,087 — 231.17 —
Decision–diagram–based CTMC encodings
20
Multiterminal multiway decision diagrams
From BDDs to MDDs: allow multiway choices at each nonterminal node, not just binary choices
From BDDs to MTBDDs: allow multiple terminal nodes, not just 0 and 1
From BDDs to MTMDDs combine both generalizations
We can use a quasi–reduced MTMDD to encode a real matrix A :S ×S →R
• Nodes are organized into 2K + 1 levels
◦ Variables {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } are mapped onto {2K, 2K −1, ..., 1}
◦ Let v(l) be the variable corresponding to level l ∈ {2K, 2K −1, ..., 1}
◦ Level 2K contains only one root node
◦ Levels 2K −1 through 1 contain one or more nodes, no duplicate nodes allowed
◦ Level 0 contains as many nodes as the different entries in A
• A node at level l > 0, with v(l) = ik or jk , has |Sk | arcs pointing to nodes at level l−1
A[i, j] = x ⇔ path labelled {iK , iK−1 , ..., i1 , jK , jK−1 , ..., j1 } leads to node x at level 0
21
MTMDDs encoding of the transition rate matrix
In principle, the mapping v can be any permutation of the 2K variables
In practice, the order (iK , jK , iK−1 , jK−1 , ..., i1 , j1 ) is usually the best choice
(we still need to decide a good order for the K “from-to” pairs)
When using MTMDDs to store the transition rate matrix, we have a choice:
• Store R : S × S → R (note that R[i, j] = 0 if i ∈ S and j ∈ S )
◦ a natural choice if we use a compositional approach
• Store R : S × S → R (actually, S × S → R, but R[i, j] = 0 if i ∈ S or j ∈ S )
◦ usually requires more MTMDD nodes
◦ can be built by enumerating the entries explicitly and storing them implicitly in an MTMDD
◦ or by zeroing the rows corresponding to S \ S in the MTMDD encoding of R
i.e., premultiplying R by a filtering diagonal matrix F[i, i] = 1 if i ∈ S , 0 if i ∈ S \ S
22
An example of MTMDD
S4 : {p1,p0} ≡ {0,1} S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2} S2 : {s0,s1} ≡ {0,1} S1 : {t0,t1} ≡ {0,1}
p
R 0 1
a
q s 0 1 0 1
0 1 2 0 1 2 0 1 2
b c d
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
r t
e 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
µa µc µb µd µe
23
Definition of Kronecker product
Given K matrices Ak ∈ Rnk ×nk , their Kronecker product is
A= Ak ∈ RnK ···n1 ×nK ···n1
K≥k≥1
where
• A[i, j] = AK [iK , jK ] · AK−1 [iK−1 , jK−1 ] · · · A1 [i1 , j1 ]
• using the mixed-base numbering scheme (indices start at 0)
i = (...((iK ) · nK−1 + iK−1 ) · nK−2 · · · ) · n1 + i1 = ik · nl
K≥k≥1 k>l≥1
nonzeros: η Ak = η(Ak )
K≥k≥1 K≥k≥1
24
Kronecker product by example
b00 b01 b02
a00 a01
Given the real matrices A= and B = b10
b11 b12
a10 a11
b20 b21 b22
a00 b00 a00 b01 a00 b02 a01 b00 a01 b01 a01 b02
a00 b10 a00 b11 a00 b12 a01 b10 a01 b11 a01 b12
a00 B a01 B a00 b20 a00 b21 a00 b22 a01 b20 a01 b21 a01 b22
A⊗B= =
a10 B a11 B a10 b00 a10 b01 a10 b02 a11 b00 a11 b01 a11 b02
a10 b10 a10 b11 a10 b12 a11 b10 a11 b11 a11 b12
a10 b20 a10 b21 a10 b22 a11 b20 a11 b21 a11 b22
Kronecker product expresses contemporaneity or synchronization
If A and B are the transition probability matrices of independent discrete-time Markov chains
⇒ A ⊗ B is the transition probability matrix of their composition
25
Kronecker-consistent decomposition of a CTMC model
A decomposition of a discrete-state model describing a CTMC is Kronecker-consistent if:
• the potential transition rate matrix R is additively partitioned R= α∈E Rα
• S = SK × · · · × S1 , a global state i consists of K local states i = (iK , . . . , i1 )
• and, most importantly, we can multiplicatively partition each Rα , that is, we can write
λα (i) = λK,α (iK ) · · · λ1,α (i1 ) and ∆α (i, j) = ∆K,α (iK , jK ) · · · ∆1,α (i1 , j1 )
Rα = RK,α ⊗ · · · ⊗ R1,α
We encode the potential transition rate matrix R with |E| × K small matrices Rk,α ∈ Rnk ×nk
for stochastic Petri nets with transition rates depending on at most one
place, any partition of the places into K subsets is consistent
(even with inhibitor, reset, or probabilistic arcs)
26
Kronecker description of the transition rate matrix of a CTMC
• Parallel composition of K submodels with overall event set E (synchronizing vs. local)
• Global state i is a K -tuple (iK , ..., i1 ) of local states S ⊆ S = SK × · · · × S1
• Transition rate matrix R = R[S, S] where R = Rk,α
α∈E K≥k≥1
λk,α (ik ) · ∆k,α (ik , jk ) if α is in submodel k
• Rk,α [ik , jk ] = 1 if α is not in submodel k and ik = jk
0 if α is not in submodel k and ik = jk
encode a huge R with K · |E| “small” matrices
“On the stochastic structure of parallelism and synchronisation models for distributed algorithms”
Plateau (SIGMETRICS 1985)
factor K slowdown, still needs a probability vector of size |S|
“Complexity of memory-efficient Kronecker operations with applications to the solution of Markov mod-
els” Buchholz, Ciardo, Donatelli, Kemper (INFORMS J. Comp., 2000)
27
Kronecker encoding of R (K = 5)
S5 = ? S4 = ? S3 = ? S2 = ? S1 = ?
EVENTS →
L 0 0
E R5,a: ? I I I R5,e: ? p
V
1 0
E
L 0 0 0 a
R4,a: ? R4,b: ? R4,c: ? I I q s
S 0 0 1
↓
0 0 0
I R3,b: ? R3,c: ? I R3,e: ?
1 0 1 b c d
0 0
R2,a: ? I I R2,d: ? I
0 1 r t
e
0 0
I I I R1,d: ? R1,e: ?
0 10
we determine a priori from the model whether Rk,α = I
28
Kronecker encoding of R = α∈{a,b,c,d,e} 5≥k≥1 Rk,α
S5 : {p1,p0}≡{0,1} S4 : {q 0,q 1}≡{0,1} S3 : {r0,r1}≡{0,1} S2 : {s0,s1}≡{0,1} S1 : {t0,t1}≡{0,1}
EVENTS →
L a
0 γ5 0 0
E R5,a: I I I R5,e: e p
V
0 0 γ5 0
E a b
L 0 γ4 0 γ4 0 0 a
R4,a: R4,b: R4,c: c I I q s
S 0 0 0 0 γ4 0
↓ c
0 0 0 γ3 0 0
I R3,b: b R3,c: I R3,e: e
γ3 0 0 0 γ3 0 b c d
a
0 γ2 0 0
R2,a: I I R2,d: d I
0 0 γ2 0 r t
d e
0 γ1 0 0
I I I R1,d: R1,e: e
0 0 γ1 0
a a a
R[00i3 0i1 , 11i3 1i1 ] = γ5 · γ4 · γ2
a a
Not a canonical representation: changing to γ5 ·2 and γ4 /2 would describe the same R
29
Kronecker encoding of R (K = 4)
S4 = ? S3 = ? S2 = ? S1 = ?
EVENTS → p
L
E R4,a :? I I I R4,e :?
V a
E q s
L
S R3,a :? R3,b :? R3,c :? I R3,e :?
↓
b c d
R2,a :? I I R2,d :? I
I I I R1,d :? R1,e :? r t
e
The matrices for b and c differ only at level 3: we can merge them into a single (local) event l
we determine automatically from the model whether Rk,α = I
30
Kronecker encoding of R = α∈{a,l,d,e} 4≥k≥1 Rk,α
S4 : {p1,p0} ≡ {0,1} S3 : {q 0 r0,q 1 r0,q 0 r1} ≡ {0,1,2} S2 : {s0,s1} ≡ {0,1} S1 : {t0,t1} ≡ {0,1}
EVENTS →
0 µa 0 0 p
L R4,a : 4 I I R4,e: e
E 0 0 µ4 0
V a
E a
0 µ3 0 0 0 0 0 00 q s
L
S R3,a:0 0 0 R3,l:0 0 µc
3 I R3,e: 0 0 0
b
0 0 0 0 µ3 0 µe 0 0
3
↓
b c d
0 µa
2 0 0
R2,a: I R2,d: I
0 0 µd 0
2
r t
0 µd
1 0 0 e
I I R1,d: R1,e: e
0 0 µ1 0
We have merged b and c into a single (local) event l
The matrix R encoded by the Kronecker descriptor (K = 4)
31
00 00 00 00 00 00 11 11 11 11 11 11
00 00 11 11 22 22 00 00 11 11 22 22
p 00 11 00 11 00 11 00 11 00 11 00 11
01 01 01 01 01 01 01 01 01 01 01 01
• • • • •
0000• ·· ·· ·· ·· ·· ·· ·· ·· ·· a· ·· ··
a 0001 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·a ·· ··
q s 0010 ·d ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··
0011 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··
0100 ·· ·· ·· ·· c· ·· ·· ·· ·· ·· ·· ··
0101 ·· ·· ·· ·· ·c ·· ·· ·· ·· ·· ·· ··
0110 ·· ·· ·d ·· ·· c· ·· ·· ·· ·· ·· ··
b c d 0111 ·· ·· ·· ·· ·· ·c ·· ·· ·· ·· ·· ··
0200 ·· ·· b· ·· ·· ·· ·· ·· ·· ·· ·· ··
0201 ·· ·· ·b ·· ·· ·· ·· ·· ·· ·· ·· ··
0210 ·· ·· ·· b· ·d ·· ·· ·· ·· ·· ·· ··
0211 ·· ·· ·· ·b ·· ·· ·· ·· ·· ·· ·· ··
r t
e 1000 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··
1001 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··
1010 ·· ·· ·· ·· ·· ·· ·d ·· ·· ·· ·· ··
{p1,p0 }≡ {0,1} 1011 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··
1100 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· c· ··
1101• ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·c ··
{q 0 r0,q 1 r0,q 0 r1 }≡ {0,1,2} 1110• ·· ·· ·· ·· ·· ·· ·· ·· ·d ·· ·· c·
1111 ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·c
{s0,s1 }≡ {0,1} 1200 ·· ·· ·· ·· ·· ·· ·· ·· b· ·· ·· ··
1201• e· ·· ·· ·· ·· ·· ·· ·· ·b ·· ·· ··
{t0,t1 }≡ {0,1} 1210• ·· ·· ·· ·· ·· ·· ·· ·· ·· b· ·d ··
1211 ·· e· ·· ·· ·· ·· ·· ·· ·· ·b ·· ··
32
Matrix diagrams (MXDs)
A generalization of the idea of Kronecker encoding of a matrix
Allows us to enforce knowledge of the reachable states S to the potential transition rate matrix R
0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2
An example of (non–canonical) MXD: 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
000 70
001 90 10
R[001,210] = 5*2*9 = 90 002 20
010 56 40
R[111,211] = 2*(2*5+1*1) = 22
011 35 42 25 30
0 1 2 012 77 55
0 1 5 100
1 1 2 101
2 3 2 102
110 16 56 32 14
0 1 0 1 0 1 0 1 111 10 12 72 8 18 22 24
0 0 0 2 0 112 22 16 44 4
1 7 1 2 1 1 1 4 21 200 42 28
201 54 6 36 4
0 1 2 0 1 2 202 12 8
0 8 0 7 210 24 16
1 5 6 1 9 1 211 15 18 10 12
2 11 2 2 212 33 22
33
How to build an MXD
Two methods for building MXDs in our tool SMART:
• From the Kronecker encoding
◦ Build the matrices for the Kronecker encoding
◦ Insert the matrices in an MXD, removing duplicate (and redundant?) ones
◦ Since we start from a Kronecker encoding, we have the same limitations
◦ Similar memory requirements to a Kronecker encoding in practice
◦ We can use them to store R instead of R
• From an explicit enumeration of the entries
◦ Requires the use of canonical MXDs [Miner PNPM’01]
◦ Entries are added individually (or in batches for greater efficiency)
◦ Requires more time (additional overhead prior to numerical solution) and memory
◦ They are fully general, can encode any matrix, do not need Kronecker consistency
MXDs exploit the presence of identity matrices Rk,α
(skipped levels)
34
EVBDDs by example
[Lai et al. 1992] defined edge–valued binary decision diagrams
0 0 -3
i3 0 0 0 0 1 1 1 1 0 1 0 1 0 1
0 2 1 1 2 3
i2 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1
0 3 0 -1 0 1 2 -1 2 2 3 -1
i1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 2 0 -1 -1 1 1 0 -1 1 2 1
f 02322410 T T T
Canonicity: all nodes have 0–value on the 0–arc (only the first EVBDD is canonical)
In canonical form, the root incoming edge has value f (0 · · · 0)
EV+MDDs by example
35
[Ciardo and Siminiceanu 2002] defined edge–valued positive multiway decision diagrams
From BDD to MDD: the usual extension
We allow ∞–edge values: can store partial arithmetic functions
New canonization rule: essential to encode partial arithmetic functions
0 0
i3 0 0 0 0 1 1 1 1 0 1 i3 0 0 0 0 1 1 1 1 0 1
0 0 0 0
i2 0 0 1 1 0 0 1 1 0 1 0 1 i2 0 0 1 1 0 0 1 1 0 1 0 1
0 2 2 0 0 3 4 0
i1 0 1 0 1 0 1 0 1 0 1 0 1 i1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 2 1 0 0 2 0 0 1 0
f 02322410 T f 0 2 3 ∞∞ 4 1 0 T
Canonicity: all edge values are non–negative and at least one is zero
In canonical form, the root incoming edge has value mini∈S f (i)
b
f (1, 0, 0) = ∞ but f (1, 0, 1) = 4
the traditional EVMDD normalization cannot represent this function
Definition of EV+MDDs
36
(differences from EVBDDs of [Lai et al. 1992] are in blue)
Given f : S → Z ∪ {∞}, an EV+MDD encoding f is a DAG with labelled edges such that:
• There is a single terminal node 0|T
• There is a single root K|r with a “dangling” incoming edge having value ρ ∈ Z ∪{∞}
• Non-terminal node k|p has nk edges, k|p [ik ].child has value k|p [ik ].val ∈ N ∪ {∞}
• If k|p [ik ].val = ∞, the value of k|p [ik ].child is irrelevant
• If k|p [ik ].val ∈ N, k|p [ik ].child is the index of a node at level k − 1
• Each non-terminal node has at least one outgoing edge labelled with 0 (if not all ∞)
• All nodes are unique (taking into account both k|p [ik ].child and k|p [ik ].val)
analogous definition uses real values instead of integers
Theorem: EV+MDDs are canonical
Using an EV+MDD to store the indexing function ψ
37
The “MDD with offsets” used to store and evaluate ψ can be formalized as an EV+MDD
0
19 0 1 2 3 0 1 2 3
0 1 6 11 0 16 11
1 2 5 0 1 2 8 0 1 2 0 1 2 0 1 2 0 1 2
0 0 2 4 0 1 2 0 0 2 4 0 1 2
is equivalent to
2 0 1 1 1 6 0 1 0 1 0 1 0 1
0 1 0 0 3 01 0 0 3
1 0 3 0 1 2 0 1 2 0 1 2
0 0 1 2 0 01 2
T
∈S
lexicographic order for i
ψ(i) = ∞ ⇔ i ∈ S
EV∗MDDs by example
38
One way to think about EV∗MDDs is “EV+MDD = − log(EV∗MDD)”:
0 ⇔ 1
edge values ∈ [0, +∞] ⇔ edge values ∈ [0, 1]
root incoming edge ∈ (−∞, +∞] ⇔ root incoming edge ∈ [0, +∞)
values add along the path ⇔ values multiply along the path
7
i3 0 0 0 0 1 1 1 1 0 1
1 1
i2 0 0 1 1 0 0 1 1 0 1 0 1
1 0.3 0.4 1
i1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0.5 1 1 1 1 0.2
f 3.5 7 2.1 0 0 2.8 7 1.4 T
Canonicity: all edge values are in [0, 1] and at least one is 1
In canonical form, the root incoming edge has value maxi∈S f (i)
b
Encoding R with an EV∗MDD: initial non-canonical EV∗MDDs
39
We can store R with a 2K –level EVMDD: consider the example of Kronecker encoding
R= α∈{a,l,d,e} Rα = α∈{a,l,d,e} 4≥k≥1 Rk,α
Ra 0 1 Rl 0 1 Rd 0 1 Re 0 1
0 1 0 1 0 1 0 1 0 1 0 1
µa4 µe
4
0 1 2 0 1 2 0 1 2 0 1 2
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
µa
3 µc
3 µb µe
3
3
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1
µa2 µd
2
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1
µd
1 µe
1
T T T T
note the shaded identity patterns!!!
Encoding R with an EV∗MDD: canonical EV∗MDDs
40
(assume that µb is the largest rate in R)
3
µa µa µa
4 3 2 µb
3 µd µd
2 1 µe µe µe
4 3 1
Ra 0 1 Rl 0 1 Rd 0 1 Re 0 1
0 1 0 1 0 1 0 1 0 1 0 1
0 1 2 0 1 2 0 1 2 0 1 2
µc /µb
3 3
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1
T T T T
Encoding R with an EV∗MDD: final EV∗MDD
41
Use a recursive algorithm to compute R = α∈{a,l,d,e} Rα
µb
3
R 0 1
0 1 0 1
µa µa µa /µb
4 3 2 3 µe µe µe /µb
4 3 1 3
0 1 2 0 1 2 0 1 2
µdµd/µb
2 1 3 µc /µb
3 3
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
µdµd/µb
2 1 3 µdµd/µb
2 1 3
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1
0 1 0 1
T
hidden identity patterns remain!!!
42
Empirical comparison
Memory consumption in bytes for:
S (MDD), R (Sparse), R (Kronecker), R and R (Pot/Act MXD), R and R (Pot/Act MTMDD)
Model N |S| |S| MDD Sparse Kron Pot Act Pot Act
MXD MXD MTMDD MTMDD
2
qn4 324 324 333 14256 772 586 722 22784 22784
6 38416 38416 499 2524480 3092 2494 2870 36864 36864
10 527076 527076 905 38524464 7076 5778 6522 62720 62720
qn8 2 6561 324 681 14256 1204 738 1688 43776 49152
6 5764801 38416 1119 2524480 2404 1674 5872 55040 70912
10 214358881 527076 1953 38524464 3604 2610 12040 66304 98560
mserv2 3 1485 495 705 23352 4124 3246 3952 34560 40704
6 6345 2115 3176 111408 17468 13998 16432 111104 135168
10 18495 6165 8846 342720 52228 42278 49032 306560 378460
mserv4 3 14256 495 1174 23352 5568 4098 4916 68864 79616
6 106596 2115 8453 111408 22920 17502 20054 254360 298856
10 488268 6165 33739 342720 67560 52342 58934 873896 998552
mserv6 3 32076 495 1333 23352 5724 4066 5316 86784 101376
6 239841 2115 8614 111408 23076 17470 20238 298596 347956
10 1098603 6165 33900 342720 67716 52310 59118 982396 1112684
Model N |S| |S| MDD Sparse Kron Pot Act Pot Act
MXD MXD MTMDD MTMDD
molloy4 5 4536 91 660 4204 1316 1148 2534 23552 28160
8 32805 285 1215 14676 2528 2300 5216 27648 38656
10 87846 506 1766 27104 3556 3288 7504 31232 47360
molloy5 5 7776 91 846 4204 1100 792 4298 28416 37120
8 59049 285 1545 14676 1592 1188 9356 31232 50944
10 161051 506 2223 27104 1920 1452 13778 33280 61952
kanban3 1 160 160 264 8032 500 412 544 18432 18432
3 58400 58400 937 5590400 7572 6786 8134 66816 67072
5 2546432 2546432 5646 303705920 45660 41816 48780 303776 303776
kanban4 1 256 160 332 8032 420 354 602 23552 24576
3 160000 58400 628 5590400 2500 2216 3284 44032 50176
5 9834496 2546432 1532 303705920 7940 7118 9950 92928 110592
kanban16 1 65536 160 1275 8032 2148 866 3000 95232 107520
3 Overflow 58400 1902 5590400 3236 1746 10566 115456 151808
5 Overflow 2546432 3149 303705920 4324 2626 24106 135168 216320
fms5 1 2100 84 535 3228 1456 604 1808 36096 40960
3 9432500 20600 3294 1554080 8304 5224 24320 151296 247040
5 2016379008 852012 30490 82727748 34484 24664 138244 654892 1255108
fms21 1 4194304 84 2050 3228 3132 1132 7396 126976 148224
3 Overflow 20600 6777 1554080 5028 2328 68762 176896 437760
5 Overflow 852012 22038 82727748 6924 3524 255988 235008 1393932
44
Conclusions
MTMDDs work best when many nonzero entries in R have the same value
Extensive presence of state–dependent rates can make MTMDDs inefficient
Kronecker, MXDs, and EV∗MDD remain instead efficient if the model is Kronecker–consistent
Size of Kronecker is not affected by variable ordering
MTMDDs, MXDs, EV∗MDDs, and MDDs are instead affected by variable ordering
Kronecker and MXDs are restricted to contiguous “(ik , jk )” or “(jk , ik )” variable ordering
MTMDDs and EV∗MDDs can instead have any ordering (but is this generality useful in practice?)
Kronecker is quite efficient but more restrictive
MXDs and EV∗MDDs are fully general and their size is similar to Kronecker, if it exists
MTMDDs are also fully general, their size tends to be larger than Kronecker, MXDs, and EV∗MDDs
MTMDDs, MXDs, and EV∗MDDs can encode R instead of R, but memory requirements increase
Kronecker must rely on an external (MDD) representation of S to zero unreachable rows
Fundamental advantage of Kronecker, MXDs over MTMDDs:
they exploit the presence of numerous identity matrices in the description of R (also
true when encoding just N , e.g., in symbolic model-checking)
As presented, EV∗MDDs do not exploit identities, but we are working on that
Solution algorithms
46
Potential Kronecker: the shuffle algorithm PSh
First algorithm to be proposed for Kronecker solution [Plateau SIGMETRICS 1985]
PSh computes y ← x · ⊗K≥k≥1 Ak
PSh + computes y ← x · InK ···nk+1 ⊗ Ak ⊗ Ink−1 ···n1
Based on the equality [Davio 1981]
Ak = ST K ···nk+1 ,nk ···n1 ) · (I|S|/nk ⊗ Ak ) · S(nK ···nk+1 ,nk ···n1 )
(n b
K≥k≥1 K≥k≥1
where S(a,b) ∈ {0, 1}a·b×a·b is an (a, b)-perfect shuffle permutation:
1 if j = (i mod a) · b + (i div a)
S(a,b) [i, j] =
0 otherwise
Requires
• K vector permutations and
• K multiplications x · (I|S|/nk ⊗ Ak ).
b
Complexity of the k -th multiplication: O(|S|/nk · η[Ak ]).
47
Potential Kronecker: the shuffle algorithm PSh
b b
PSh (in: nK , ..., n1 , AK , ..., A1 ; inout: x, y);
1. nleft ← 1;
2. nright ← nK−1 · · · n1 ;
3. for k = K down to 1
4. base ← 0;
5. jump ← nk · nright ;
6. if Ak = I then
7. for block = 0 to nleft − 1
8. for offset = 0 to nright − 1
9. index ← base + offset ;
10. for h = 0 to nk − 1
11. zh ← xindex ;
b
12. index ← index + nright ;
13. z ← z · Ak ;
14. index ← base + offset ;
15. for h = 0 to nk − 1
16. yindex ← zh ;
b
17. index ← index + nright ;
18. base ← base + jump ;
19. x ← y;
b b
20. nleft ← nleft · nk ;
21. nright ← nright /nk−1 ; Let n0 be 1
48
Example of shuffle computation
y ← x · (A ⊗ B) Follow the entries marked with a diamond to obtain y2
x A B x
a 00B a 01B T
S2,3
I3 A
A 0 0
a 10B a 11B u
0 A 0
y 0 0 A
y ← x · ST ·(I3 ⊗ A) ·S2,3 ·ST · (I2 ⊗ B) · S6,1
2,3 6,1
v
| {z }
S2,3
u I2 B
| {z } w
v
| {z } B 0
w
| {z }
y
y2 ← 0 B
B02 w0 +B12 w1 +B22 w2 =
B02 v0 +B12 v2 +B22 v4 =
y
B02 (u0 A00 +u1 A10 )+B12 (u2 A00 +u3 A10 )+B22 (u4 A00 +u5 A10 ) =
B02 (x0 A00 +x3 A10 )+B12 (x1 A00 +x4 A10 )+B22 (x2 A00 +x5 A10 ) =
A00 B02 x0+A00 B12 x1+A00 B22 x2+A10 B02 x3+A10 B12 x4+A10 B22 x5
49 +
Complexity of PSh and PSh
PSh has complexity O |S|/nk · η[Ak ] = O |S| · K · α
K≥k≥1
Even when S = S , PSh is faster than Ordinary explicit multiplication only if
1
K
|S| · K · α < |S| · α ⇔ α>K K−1
PSh + has complexity O |S|/nk · η[Ak ] = O |S| · α
Complexity of computing y ← y+x· K≥k≥1 Ak :
η[Ak ]
O |S|/nk · η[Ak ] = O |S| = O |S| · K · α
nk
K≥k≥1 K≥k≥1
Ordinary is faster than PSh if α ≤ 1
PSh + saves space, but not time, w.r.t. Ordinary
PRw and PRw +
50
Potential Kronecker:
b
PRwEl (in: i, x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1. for each jK s.t. AK [iK , jK ] > 0
2. jK ← jK ; aK ← AK [iK , jK ];
3. for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
4. jK−1 ← jK · nK−1 + jK−1 ; aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
...
5. for each j1 s.t. A1 [i1 , j1 ] > 0
6. j1 ← j2 · n1 + j1 ; a1 ← a2 · A1 [i1 , j1 ];
7. yj ← yj + x · a1 ;
b 1
b 1
b b
PRw (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1. b
for i = 0 to |S| − 1
2. b b
PRwEl (i, xi , nK , ..., n1 , AK , .., A1 , y);
PRwEl + (in: nk , nk−1 · · · n1 , i− , ik , i+ , x, Ak ; inout: y)
k k b
1. >0
for each jk s.t. Ak [ik , jk ]
2. j ← (i− ·
k nk + jk ) · nk−1 · · · n1 + i+ ;
k
3. yj ← yj
b b + x · Ak [ik , jk ];
PRw + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
b b
1. for i ≡ (i− , ik , i+ ) = 0 to nK · · · nk+1 · nk · nk−1 · · · n1 − 1
k k
2. PRwEl + (nk , nk−1 · · · n1 , i− , ik , i+ , xi , Ak , y);
k k b b
51 +
Complexity of PRw and PRw
PRw computes y ← y + x · A, according to the definition of Kronecker product
Requires sparse row-wise format for each Ak
PRwEl computes the contribution of xi to each entry of y as
y ← y + xi · Ai,S
b
PRwEl reaches statement ak ← ak−1 · Ak [ik , jk ] O(αk ) times.
PRw makes |S| calls to PRwEl , hence has complexity
|S| · k O |S| · K = O(K · η[A]) if α ≤1
O α =
K≥k≥1 O(|S| · αK ) = O(η[A]) if α >1
+ η[Ak ]
PRw has complexity O |S| · = O |S| · α
nk
Complexity of computing y ← y+x· K≥k≥1 Ak using PRw + : O |S| · K · α
PRw amortizes the multiplications for aK−1 ,...,a2 only if α 1
PRw + saves space, but not time, w.r.t. Ordinary
PRwCl and PRwCl +
52
Potential Kronecker:
b b
PRwCl (in: x, nK , ..., n1 , AK , ..., A1 ; inout: y)
1. for iK = 0 to nK − 1
2. for each jK s.t. AK [iK , jK ] > 0
3. jK ← jK ; aK ← AK [iK , jK ];
4. for iK−1 = 0 to nK−1 − 1
5. for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
6. . . . jK−1 ← jK · nK−1 + jK−1 ; aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
7. for i1 = 0 to n1 − 1
8. for each j1 s.t. A1 [i1 , j1 ] > 0
9. j1 ← j2 · n1 + j1 ; a1 ← a2 · A1 [i1 , j1 ];
10.
1
yj ← yj + xi · a1 ;
b 1
b b
The overall complexity is O |S| · αK
PRwCl + (in: x, nK · · · nk+1 , nk , nk−1 · · · n1 , Ak ; inout: y)
b b
−
1. for ik = 0 to nK · · · nk+1 − 1
2. for ik = 0 to nk − 1
3. for each jk s.t. Ak [ik , jk ] > 0
4. jk ← i− · nk + jk ;
k
+
5. for ik = 0 to nk−1 · · · n1 − 1
6. jK ← jk · nk−1 · · · n1 + i+ ;
k
7. yjK ← yjK + x(i− ,i ,i+ ) · Ak [ik , jk ];
b b b
k k k
53
Potential Kronecker: PRwCl
a0 a1 b0 b1 c0 c1
x·A=x· ⊗ ⊗
a2 a3 b2 b3 c2 c3
PRw 1
a0 b0 c0
2
c1
3
b1 c0
4
c1
5
a1 b0 c0
6
c1
7
b1 c0
8
c1
a0
b0 b1
a1
b0 b1
PRwCl
1 2 5 6 17 18 21 22
9 10 11 12 13 14 15 16 c0 c1 c0 c1 c0 c1 c0 c1
a0 b0 c2 c3 b1 c2 c3 a1 b0 c2 c3 b1 c2 c3 3 4 7 8 19 20 23 24
c2 c3 c2 c3 c2 c3 c2 c3
17 18 19 20 21 22 23 24
a0 b2 c0 c1 b3 c0 c1 a1 b2 c0 c1 b3 c0 c1 b2 b3 b2 b3
9 10 13 14 25 26 29 30
c0 c1 c0 c1 c0 c1 c0 c1
25 26 27 28 29 30 31 32 11 12 15 16 27 28 31 32
a0 b2 c2 c3 b3 c2 c3 a1 b2 c2 c3 b3 c2 c3 c2 c3 c2 c3 c2 c3 c2 c3
33 34 35 36 37 38 39 40 a2 a3
a2 b0 c0 c1 b1 c0 c1 a3 b0 c0 c1 b1 c0 c1 b0 b1 b0 b1
33 34 37 38 49 50 53 54
c0 c1 c0 c1 c0 c1 c0 c1
41 42 43 44 45 46 47 48
a2 b0 c2 c3 b1 c2 c3 a3 b0 c2 c3 b1 c2 c3 35 36 39 40 51 52 55 56
c2 c3 c2 c3 c2 c3 c2 c3
49 50 51 52 53 54 55 56 b2 b3 b2 b3
a2 b2 c0 c1 b3 c0 c1 a3 b2 c0 c1 b3 c0 c1 41 42 45 46 57 58 61 62
c0 c1 c0 c1 c0 c1 c0 c1
57 58 59 60 61 62 63 64 43 44 47 48 59 60 63 64
a2 b2 c2 c3 b3 c2 c3 a3 b2 c2 c3 b3 c2 c3 c2 c3 c2 c3 c2 c3 c2 c3
Each “b” and “c” box corresponds to one multiplication: 8 × 8 = 64 entries of the form ai bj cl
• Computing each entry from scratch: 64 × 2 = 128 multiplications
• Using PRw : 64 + 32 = 96 multiplications
• Using PRwCl : 64 + 16 = 80 multiplications: interleaving helps!
same complexity as Ordinary regardless the sparsity level
but the entries of A are not generated in column order
54
Actual Kronecker: ARw
ARw (in: x, AK , ..., A1 , S ; inout: y)
1. for each i ∈S
2. I ← ψ(i);
3. for each jK s.t. AK [iK , jK ] > 0
4. aK ← AK [iK , jK ];
5. for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
6. aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
...
7. >0
for each j1 s.t. A1 [i1 , j1 ]
8. a1 ← a2 · A1 [i1 , j1 ];
9. J ← ψ(j);
10. yJ ← yJ + xI · a1 ;
= ψ(j) of state j in the array y.
Statement 9 computes the index J
k K
O (|S|·(K +log |S|)) if α ≤ 1
O |S|· α +α · log |S| =
K≥k≥1 O |S|·αK ·log |S| if α > 1
if K < log |S|: ARw has a log |S| overhead w.r.t. Ordinary
ARwCl and ARwCl +
55
Actual Kronecker:
ARwCl (in: x, AK , ..., A1 , S ; inout: y)
1. for each iK ∈ SK all local states iK
2. IK ← ψK (iK );
3. for each jK s.t. AK [iK , jK ] > 0
4. JK ← ψK (jK );
5. if JK = null then
6. aK ← AK [iK , jK ];
7. for each iK−1 ∈ SK−1 (iK ) all iK−1 compatible with iK
8. IK−1 ← ψK−1 (IK , iK−1 );
9. for each jK−1 s.t. AK−1 [iK−1 , jK−1 ] > 0
10. JK−1 ← ψK−1 (JK , jK−1 );
11. if JK−1 = null then
12. aK−1 ← aK · AK−1 [iK−1 , jK−1 ];
...
13. for each i1 ∈ S1 (iK , ..., i2 ) all i1 compatible with iK , ..., i2
14. I1 ← ψ1 (I2 , i1 );
15. for each j1 s.t. A1 [i1 , j1 ] > 0
16. J1 ← ψ1 (J2 , j1 );
17. if J1 = null then
18. a1 ← a2 · A1 [i1 , j1 ];
19. yJ1 ← yJ1 + xI1 · a1 ;
note the need for EV+MDDs to index the state space
56
Actual Kronecker: ARwCl
Complexity of ARwCl : O |S1 | · · · |Sk | · αk · log nk = O(|S| · αK · log nK )
K≥k≥1
(assuming that |S1 | · · · |SK−1 | |S|)
+
Complexity of ARwCl : O(|S| · α · log nK )
regardless of k , and the resulting complexity of computing
y ← y+x· Ak
K≥k≥1
S,S
+
using ARwCl is O (K · |S| · α · log nK )
only log nK overhead w.r.t. Ordinary for any sparsity level
but it cannot be used in a Gauss-Seidel iteration
57
Results for the numerical solution
Matrix diagrams achieve the same efficiency as multiplication by blocks (ARwCl ). . .
. . . but can provide access by columns as required by Gauss–Seidel
Time requirements for the Kanban model (450MHz Pentium workstation with 384 Mbytes RAM)
number MXDs Kronecker Explicit
N |S| of arcs Gauss–Seidel Gauss–Seidel Jacobi Gauss–Seidel
in R Iters sec/iter Iters sec/iter Iters sec/iter Iters sec/iter
2 4,600 28,120 40 0.11 55 0.17 134 0.09 55 0.02
3 58,400 446,400 67 1.46 97 2.56 240 1.34 97 0.34
4 454,475 3,979,850 99 12.33 149 23.69 370 11.99 149 3.04
5 2,546,432 24,460,016 139 73.09 214 147.70 527 74.09 214 18.51
6 11,261,376 115,708,992 185 336.21 289 723.30 713 359.15 — —
7 41,644,800 450,455,040 238 1,289.91 374 2,922.80 — — — —
Related docs
Get documents about "