# cache

Document Sample

```					EFFICIENT ANALYTICAL
MODELING OF DATA
CACHE BEHAVIOUR

By
Japinder Singh Chawla
Input-Output
INPUT: Any benchmark application and other
cache parameters, e.g. Line size, Cache size.
OUTPUT: Memory Performance Estimate for
different cache parameter values.

   Memory Performance Estimates include
quantification of reuse in the program, cache
hits and cache misses
Modeling of the Program
   Any memory reference
A[f1(i1,i2,..)][f2(i1,i2,..)]..[fn(i1,i2,..)]
for any stride values s1,s2,….,sn and loop variable limits
(l1,h1), (l2,h2),….,(ln,hn) can be expressed as
A[a1i1+a2i2+….+anin+a]
with stride values as 1 and loop variable limits as
(1,N1), (1,N2),….,(1,Nn)
 We generate a data structure corresponding to each cache line
(or cache set if associativity K>1)
 The data structure contains information about the memory
accesses that map to that cache line
 The following example generates a data structure for the cache
line L=3
Modeling of the Program
L1=1 : for i1=1 to N1 step 1
L2=1 : for i2=1 to N2 step 1
αR = 1 : if (i2 ≥ 2)
αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10))
L2=2 : for i2=1 to N2 step 1
αR = 1 : Read b[i1][i2]
L1=2 : for i1=1 to N1 step 1
L2=1 : for i2=1 to N2 step 1
αR = 1 : Read a[i1][i2]

  Any memory access can be uniquely modeled by the
access vector
( L1 , i1 , L2 , i2 , …… , Ln , in , αR )
L=3

L1 = 2                         L1 = 1

L2 = 1                  L2 = 2                        L2 = 1

a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)
Approach

   The cache equations are:
Solving the Equation
L1=1 : for i1=1 to N1 step 1
L2=1 : for i2=1 to N2 step 1
αR = 1 : if (i2 ≥ 2) Read b[i1][i2-1]
αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10)) Read a[i1-1][i2]
L2=2 : for i2=1 to N2 step 1
αR = 1 : Read b[i1][i2]
L1=2 : for i1=1 to N1 step 1
L2=1 : for i2=1 to N2 step 1
αR = 1 : Read a[i1][i2]
 In this program, let N1=N2=20, C=32, L=4 and for the cache line
3, memory reference b[i1][i2-1] and @b = 4, the equation is:
8 ≤ 20i1 mod 32 + i2 mod 32 – 18 < 12
b[i1][i2-1] (α=1)

β1= 20         β1= 8           β1= 16            β1= 24      β1= 12
i1= 1,9,17    i1= 2,10,18     i1= 4,12,20         i1= 6,14    i1= 7,15

β2=(6,9)     β2=(18,21)       β2=(10,13)         β2=(2,5)   β2=(14,17)
i2=(6,9)     i2=(18,21)       i2=(10,13)         i2=(2,5)   i2=(14,17)

MIN = (1,6)                 MAX = (20,13)
Inter-loop Reuse for Direct Mapped Cache

Conditions for inter-loop reuse between two loop nests
L1 and L2 for a cache line l :

the two loop nests.
   The memory access vectors corresponding to the last
memory access of L1 , a and the first memory access
of L2 , b access the same array and lie on the same
memory line,
i.e. mlR1(a) = mlR2(b).
Inter-loop Reuse for Direct Mapped Cache
L=3

L1 = 2                         L1 = 1

L2 = 1                  L2 = 2                        L2 = 1

a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)

MIN=(2,9)           MIN=(1,5)               MIN=(3,9)                MIN=(1,6)
MAX=(20,4)          MAX=(20,12)             MAX=(19,12)              MAX=(20,13)

MIN = B(1,6)                   MAX = B(20,13)
L=3

L1 = 2                         L1 = 1

L2 = 1                  L2 = 2                        L2 = 1

a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)

MIN=(2,9)           MIN=(1,5)               MIN=(3,9)                MIN=(1,6)
MAX=(20,4)          MAX=(20,12)             MAX=(19,12)              MAX=(20,13)

MIN = A(2,9)         MAX = B(20,12)
Inter-loop Reuse for K-way Set Associative
Cache
Conditions for inter-loop reuse of the memory access
vector aI , 1≤ I ≤ K , between two loop nests L1 and L2
for a cache set s :

   There are no more than I-1 memory accesses to the
cache set between the two loop nests, which access
different memory lines and are also different from
mlR(aI) .
   Let the above such accesses = J. Then memory
access vector aI is reused iff there exists a vector b in
the first I-J minimum memory access vectors of L2
which access the same array as aI and lies on the
memory line mlR2(b) = mlR1(aI).
b[i1][i2-1] (α=1)

β1= 20        β1= 28          β1= 16             β1= 4      β1= 24       β1= 0
i1= 1,9,17   i1= 3,11,19      i1= 4,12,20        i1= 5,13    i1= 6,14     i1= 8,16

β2=(14,20)    β2=(6,13)      β2=(18,20)           β2=(2,6)   β2=(10,17)   β2=(2,9)
i2=(14,20)    i2=(6,13)      i2=(18,20)           i2=(2,6)   i2=(10,17)   i2=(2,9)

MIN = (1,14)               MAX = (20,20)
Memory line = 100, 2nd MAX should not lie on the memory line 100

Memory line = 3, 2nd MIN should not lie on the memory line 3
b[i1][i2-1] (α=1)

β1= 20       β1= 28          β1= 16             β1= 4      β1= 24       β1= 0
i1= 1,9,17   i1= 3,11,19     i1= 4,12,20        i1= 5,13    i1= 6,14     i1= 8,16

β2=(14,20)   β2=(6,13)      β2=(18,20)           β2=(2,6)   β2=(10,17)   β2=(2,9)
i2=(14,20)   i2=(6,13)      i2=(18,20)           i2=(2,6)   i2=(10,17)   i2=(2,9)

2nd MIN = (1,18)         2nd MAX = (19,13)
Inter-loop Reuse for K-way Set Associative
Cache
L=3

Ref = A
v1 = (i11, i12)

A[f1(i11)][f2(i12)]
L=3

Ref = A           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)

A[f1(i21)][f2(i22)]
L=3

Ref = A           Ref = A           Ref = B
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)

B[f1(i31)][f2(i32)]
L=3

Ref = A           Ref = A           Ref = B            Ref = B
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)

B[f1(i41)][f2(i42)]
L=3

Ref = A           Ref = A           Ref = B            Ref = B           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)   v5 = (i51, i52)

A[f1(i51)][f2(i52)]
L=3

Ref = A           Ref = A           Ref = B            Ref = B           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)   v5 = (i51, i52)

(v1 ,v2), (v3 ,v4),(v5,v6),…..
A        B       A

A[f1(i51)][f2(i52)]
Intra-loop Reuse for Direct Mapped Cache
Self-Spatial Reuse
   Self-spatial reuse occurs when a memory reference
accesses the same cache line in different iterations

   Let the number of iteration vectors in a interval be A.
The self spatial reuse within that interval = Rint
(Rint = A – Nmiss) where Nmiss is the number of different
memory lines brought into the cache line

   Nmiss = Nr_miss + L/an if group spatial reuse with the
preceding interval is zero otherwise Nmiss = Nr_miss

   Nr_miss = A / (L/an) are the number of replacement
misses
Intra-loop Reuse for K-Way Set Associative
Cache
Group-Spatial Reuse
Conditions for group-spatial reuse of the memory access
vector aI , 1≤ I ≤ K , between two intervals I1 and I2 for
a cache set s :

   The memory references corresponding to the two
intervals access the same array.

   The memory access vector aI is reused iff there exists
a vector b in the first I minimum memory access
vectors of I2 which lies on the same memory line
mlR2(b) = mlR1(aI).
Self-Spatial Reuse
   The self spatial reuse within a interval = Rint
(Rint = A – Nmiss) where Nmiss is the number of different
memory lines brought into the cache set.

   Nmiss = Nr_miss + (K-m)*L/an where group spatial reuse
with the preceding interval is m.

   Nr_miss = A / (L/an) are the number of replacement
misses

   The number of iteration vectors in the interval (A) will
increase because of associativity.
Space and Time Complexity
   The approach includes the following steps:
1. Building of a data structure in
O(ΣlCΣrefsMAX_REFSΠi=1N)
2. Computing Interloop reuse in
O(KΣlCΣfMAX_FORSΣrefsMAX_REFSN)
3. Computing Intraloop reuse in
O(KΣlCΣfMAX_FORSΣsC/(LXK)N)
So the time complexity of the approach is
O(KΣlCΣfMAX_FORSΣsC/(LXK)N)

   The space complexity of the approach is
O(ΣrefsMAX_REFSN)
THANKS

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 14 posted: 10/6/2012 language: Unknown pages: 29