Docstoc

cache

Document Sample
cache Powered By Docstoc
					EFFICIENT ANALYTICAL
 MODELING OF DATA
 CACHE BEHAVIOUR

          By
          Japinder Singh Chawla
          Anil Kumar Gadgotra
                  Input-Output
INPUT: Any benchmark application and other
  cache parameters, e.g. Line size, Cache size.
OUTPUT: Memory Performance Estimate for
  different cache parameter values.

   Memory Performance Estimates include
    quantification of reuse in the program, cache
    hits and cache misses
                  Modeling of the Program
   Any memory reference
       A[f1(i1,i2,..)][f2(i1,i2,..)]..[fn(i1,i2,..)]
for any stride values s1,s2,….,sn and loop variable limits
 (l1,h1), (l2,h2),….,(ln,hn) can be expressed as
        A[a1i1+a2i2+….+anin+a]
with stride values as 1 and loop variable limits as
        (1,N1), (1,N2),….,(1,Nn)
 We generate a data structure corresponding to each cache line
   (or cache set if associativity K>1)
 The data structure contains information about the memory
   accesses that map to that cache line
 The following example generates a data structure for the cache
   line L=3
            Modeling of the Program
L1=1 : for i1=1 to N1 step 1
     L2=1 : for i2=1 to N2 step 1
             αR = 1 : if (i2 ≥ 2)
                        Read b[i1][i2-1]
             αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10))
                        Read a[i1-1][i2]
      L2=2 : for i2=1 to N2 step 1
             αR = 1 : Read b[i1][i2]
L1=2 : for i1=1 to N1 step 1
      L2=1 : for i2=1 to N2 step 1
             αR = 1 : Read a[i1][i2]

  Any memory access can be uniquely modeled by the
   access vector
( L1 , i1 , L2 , i2 , …… , Ln , in , αR )
                                       L=3


                  L1 = 2                         L1 = 1


     L2 = 1                  L2 = 2                        L2 = 1




a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)
                      Approach

   The cache equations are:
              Solving the Equation
L1=1 : for i1=1 to N1 step 1
     L2=1 : for i2=1 to N2 step 1
             αR = 1 : if (i2 ≥ 2) Read b[i1][i2-1]
             αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10)) Read a[i1-1][i2]
      L2=2 : for i2=1 to N2 step 1
             αR = 1 : Read b[i1][i2]
L1=2 : for i1=1 to N1 step 1
      L2=1 : for i2=1 to N2 step 1
             αR = 1 : Read a[i1][i2]
 In this program, let N1=N2=20, C=32, L=4 and for the cache line
  3, memory reference b[i1][i2-1] and @b = 4, the equation is:
       8 ≤ 20i1 mod 32 + i2 mod 32 – 18 < 12
                              b[i1][i2-1] (α=1)




 β1= 20         β1= 8           β1= 16            β1= 24      β1= 12
i1= 1,9,17    i1= 2,10,18     i1= 4,12,20         i1= 6,14    i1= 7,15




 β2=(6,9)     β2=(18,21)       β2=(10,13)         β2=(2,5)   β2=(14,17)
 i2=(6,9)     i2=(18,21)       i2=(10,13)         i2=(2,5)   i2=(14,17)




MIN = (1,6)                 MAX = (20,13)
 Inter-loop Reuse for Direct Mapped Cache

Conditions for inter-loop reuse between two loop nests
 L1 and L2 for a cache line l :

   There is no memory access to cache line l between
    the two loop nests.
   The memory access vectors corresponding to the last
    memory access of L1 , a and the first memory access
    of L2 , b access the same array and lie on the same
    memory line,
    i.e. mlR1(a) = mlR2(b).
Inter-loop Reuse for Direct Mapped Cache
                                       L=3


                  L1 = 2                         L1 = 1


     L2 = 1                  L2 = 2                        L2 = 1




a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)



  MIN=(2,9)           MIN=(1,5)               MIN=(3,9)                MIN=(1,6)
 MAX=(20,4)          MAX=(20,12)             MAX=(19,12)              MAX=(20,13)




                       MIN = B(1,6)                   MAX = B(20,13)
                                         L=3


                    L1 = 2                         L1 = 1


       L2 = 1                  L2 = 2                        L2 = 1




  a[i1][i2] (α=1)      b[i1][i2] (α=1)         a[i1-1][i2] (α=2)      b[i1][i2-1] (α=1)



    MIN=(2,9)           MIN=(1,5)               MIN=(3,9)                MIN=(1,6)
   MAX=(20,4)          MAX=(20,12)             MAX=(19,12)              MAX=(20,13)




MIN = A(2,9)         MAX = B(20,12)
Inter-loop Reuse for K-way Set Associative
                  Cache
Conditions for inter-loop reuse of the memory access
 vector aI , 1≤ I ≤ K , between two loop nests L1 and L2
 for a cache set s :

   There are no more than I-1 memory accesses to the
    cache set between the two loop nests, which access
    different memory lines and are also different from
    mlR(aI) .
   Let the above such accesses = J. Then memory
    access vector aI is reused iff there exists a vector b in
    the first I-J minimum memory access vectors of L2
    which access the same array as aI and lies on the
    memory line mlR2(b) = mlR1(aI).
                                   b[i1][i2-1] (α=1)




 β1= 20        β1= 28          β1= 16             β1= 4      β1= 24       β1= 0
i1= 1,9,17   i1= 3,11,19      i1= 4,12,20        i1= 5,13    i1= 6,14     i1= 8,16




β2=(14,20)    β2=(6,13)      β2=(18,20)           β2=(2,6)   β2=(10,17)   β2=(2,9)
i2=(14,20)    i2=(6,13)      i2=(18,20)           i2=(2,6)   i2=(10,17)   i2=(2,9)




MIN = (1,14)               MAX = (20,20)
             Memory line = 100, 2nd MAX should not lie on the memory line 100

Memory line = 3, 2nd MIN should not lie on the memory line 3
                                  b[i1][i2-1] (α=1)




 β1= 20       β1= 28          β1= 16             β1= 4      β1= 24       β1= 0
i1= 1,9,17   i1= 3,11,19     i1= 4,12,20        i1= 5,13    i1= 6,14     i1= 8,16




β2=(14,20)   β2=(6,13)      β2=(18,20)           β2=(2,6)   β2=(10,17)   β2=(2,9)
i2=(14,20)   i2=(6,13)      i2=(18,20)           i2=(2,6)   i2=(10,17)   i2=(2,9)




2nd MIN = (1,18)         2nd MAX = (19,13)
Inter-loop Reuse for K-way Set Associative
                  Cache
                        L=3




  Ref = A
v1 = (i11, i12)




                  A[f1(i11)][f2(i12)]
                                          L=3




  Ref = A           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)




                                    A[f1(i21)][f2(i22)]
                                                      L=3




  Ref = A           Ref = A           Ref = B
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)




                                             B[f1(i31)][f2(i32)]
                                                      L=3




  Ref = A           Ref = A           Ref = B            Ref = B
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)




                                             B[f1(i41)][f2(i42)]
                                                      L=3




  Ref = A           Ref = A           Ref = B            Ref = B           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)   v5 = (i51, i52)




                                             A[f1(i51)][f2(i52)]
                                                      L=3




  Ref = A           Ref = A           Ref = B            Ref = B           Ref = A
v1 = (i11, i12)   v2 = (i21, i22)   v3 = (i31, i32)    v4 = (i41, i42)   v5 = (i51, i52)




                           (v1 ,v2), (v3 ,v4),(v5,v6),…..
                              A        B       A




                                             A[f1(i51)][f2(i52)]
Intra-loop Reuse for Direct Mapped Cache
                 Self-Spatial Reuse
   Self-spatial reuse occurs when a memory reference
    accesses the same cache line in different iterations

   Let the number of iteration vectors in a interval be A.
    The self spatial reuse within that interval = Rint
    (Rint = A – Nmiss) where Nmiss is the number of different
    memory lines brought into the cache line

   Nmiss = Nr_miss + L/an if group spatial reuse with the
    preceding interval is zero otherwise Nmiss = Nr_miss

   Nr_miss = A / (L/an) are the number of replacement
    misses
Intra-loop Reuse for K-Way Set Associative
                  Cache
              Group-Spatial Reuse
Conditions for group-spatial reuse of the memory access
 vector aI , 1≤ I ≤ K , between two intervals I1 and I2 for
 a cache set s :

   The memory references corresponding to the two
    intervals access the same array.

   The memory access vector aI is reused iff there exists
    a vector b in the first I minimum memory access
    vectors of I2 which lies on the same memory line
    mlR2(b) = mlR1(aI).
                Self-Spatial Reuse
   The self spatial reuse within a interval = Rint
    (Rint = A – Nmiss) where Nmiss is the number of different
    memory lines brought into the cache set.

   Nmiss = Nr_miss + (K-m)*L/an where group spatial reuse
    with the preceding interval is m.

   Nr_miss = A / (L/an) are the number of replacement
    misses

   The number of iteration vectors in the interval (A) will
    increase because of associativity.
      Space and Time Complexity
   The approach includes the following steps:
1. Building of a data structure in
    O(ΣlCΣrefsMAX_REFSΠi=1N)
2. Computing Interloop reuse in
    O(KΣlCΣfMAX_FORSΣrefsMAX_REFSN)
3. Computing Intraloop reuse in
    O(KΣlCΣfMAX_FORSΣsC/(LXK)N)
So the time complexity of the approach is
    O(KΣlCΣfMAX_FORSΣsC/(LXK)N)

   The space complexity of the approach is
    O(ΣrefsMAX_REFSN)
THANKS

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:10/6/2012
language:Unknown
pages:29