Document Sample

EFFICIENT ANALYTICAL MODELING OF DATA CACHE BEHAVIOUR By Japinder Singh Chawla Anil Kumar Gadgotra Input-Output INPUT: Any benchmark application and other cache parameters, e.g. Line size, Cache size. OUTPUT: Memory Performance Estimate for different cache parameter values. Memory Performance Estimates include quantification of reuse in the program, cache hits and cache misses Modeling of the Program Any memory reference A[f1(i1,i2,..)][f2(i1,i2,..)]..[fn(i1,i2,..)] for any stride values s1,s2,….,sn and loop variable limits (l1,h1), (l2,h2),….,(ln,hn) can be expressed as A[a1i1+a2i2+….+anin+a] with stride values as 1 and loop variable limits as (1,N1), (1,N2),….,(1,Nn) We generate a data structure corresponding to each cache line (or cache set if associativity K>1) The data structure contains information about the memory accesses that map to that cache line The following example generates a data structure for the cache line L=3 Modeling of the Program L1=1 : for i1=1 to N1 step 1 L2=1 : for i2=1 to N2 step 1 αR = 1 : if (i2 ≥ 2) Read b[i1][i2-1] αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10)) Read a[i1-1][i2] L2=2 : for i2=1 to N2 step 1 αR = 1 : Read b[i1][i2] L1=2 : for i1=1 to N1 step 1 L2=1 : for i2=1 to N2 step 1 αR = 1 : Read a[i1][i2] Any memory access can be uniquely modeled by the access vector ( L1 , i1 , L2 , i2 , …… , Ln , in , αR ) L=3 L1 = 2 L1 = 1 L2 = 1 L2 = 2 L2 = 1 a[i1][i2] (α=1) b[i1][i2] (α=1) a[i1-1][i2] (α=2) b[i1][i2-1] (α=1) Approach The cache equations are: Solving the Equation L1=1 : for i1=1 to N1 step 1 L2=1 : for i2=1 to N2 step 1 αR = 1 : if (i2 ≥ 2) Read b[i1][i2-1] αR = 2 : if ((i1 ≥ 2)&&(i1+i2 ≥ 10)) Read a[i1-1][i2] L2=2 : for i2=1 to N2 step 1 αR = 1 : Read b[i1][i2] L1=2 : for i1=1 to N1 step 1 L2=1 : for i2=1 to N2 step 1 αR = 1 : Read a[i1][i2] In this program, let N1=N2=20, C=32, L=4 and for the cache line 3, memory reference b[i1][i2-1] and @b = 4, the equation is: 8 ≤ 20i1 mod 32 + i2 mod 32 – 18 < 12 b[i1][i2-1] (α=1) β1= 20 β1= 8 β1= 16 β1= 24 β1= 12 i1= 1,9,17 i1= 2,10,18 i1= 4,12,20 i1= 6,14 i1= 7,15 β2=(6,9) β2=(18,21) β2=(10,13) β2=(2,5) β2=(14,17) i2=(6,9) i2=(18,21) i2=(10,13) i2=(2,5) i2=(14,17) MIN = (1,6) MAX = (20,13) Inter-loop Reuse for Direct Mapped Cache Conditions for inter-loop reuse between two loop nests L1 and L2 for a cache line l : There is no memory access to cache line l between the two loop nests. The memory access vectors corresponding to the last memory access of L1 , a and the first memory access of L2 , b access the same array and lie on the same memory line, i.e. mlR1(a) = mlR2(b). Inter-loop Reuse for Direct Mapped Cache L=3 L1 = 2 L1 = 1 L2 = 1 L2 = 2 L2 = 1 a[i1][i2] (α=1) b[i1][i2] (α=1) a[i1-1][i2] (α=2) b[i1][i2-1] (α=1) MIN=(2,9) MIN=(1,5) MIN=(3,9) MIN=(1,6) MAX=(20,4) MAX=(20,12) MAX=(19,12) MAX=(20,13) MIN = B(1,6) MAX = B(20,13) L=3 L1 = 2 L1 = 1 L2 = 1 L2 = 2 L2 = 1 a[i1][i2] (α=1) b[i1][i2] (α=1) a[i1-1][i2] (α=2) b[i1][i2-1] (α=1) MIN=(2,9) MIN=(1,5) MIN=(3,9) MIN=(1,6) MAX=(20,4) MAX=(20,12) MAX=(19,12) MAX=(20,13) MIN = A(2,9) MAX = B(20,12) Inter-loop Reuse for K-way Set Associative Cache Conditions for inter-loop reuse of the memory access vector aI , 1≤ I ≤ K , between two loop nests L1 and L2 for a cache set s : There are no more than I-1 memory accesses to the cache set between the two loop nests, which access different memory lines and are also different from mlR(aI) . Let the above such accesses = J. Then memory access vector aI is reused iff there exists a vector b in the first I-J minimum memory access vectors of L2 which access the same array as aI and lies on the memory line mlR2(b) = mlR1(aI). b[i1][i2-1] (α=1) β1= 20 β1= 28 β1= 16 β1= 4 β1= 24 β1= 0 i1= 1,9,17 i1= 3,11,19 i1= 4,12,20 i1= 5,13 i1= 6,14 i1= 8,16 β2=(14,20) β2=(6,13) β2=(18,20) β2=(2,6) β2=(10,17) β2=(2,9) i2=(14,20) i2=(6,13) i2=(18,20) i2=(2,6) i2=(10,17) i2=(2,9) MIN = (1,14) MAX = (20,20) Memory line = 100, 2nd MAX should not lie on the memory line 100 Memory line = 3, 2nd MIN should not lie on the memory line 3 b[i1][i2-1] (α=1) β1= 20 β1= 28 β1= 16 β1= 4 β1= 24 β1= 0 i1= 1,9,17 i1= 3,11,19 i1= 4,12,20 i1= 5,13 i1= 6,14 i1= 8,16 β2=(14,20) β2=(6,13) β2=(18,20) β2=(2,6) β2=(10,17) β2=(2,9) i2=(14,20) i2=(6,13) i2=(18,20) i2=(2,6) i2=(10,17) i2=(2,9) 2nd MIN = (1,18) 2nd MAX = (19,13) Inter-loop Reuse for K-way Set Associative Cache L=3 Ref = A v1 = (i11, i12) A[f1(i11)][f2(i12)] L=3 Ref = A Ref = A v1 = (i11, i12) v2 = (i21, i22) A[f1(i21)][f2(i22)] L=3 Ref = A Ref = A Ref = B v1 = (i11, i12) v2 = (i21, i22) v3 = (i31, i32) B[f1(i31)][f2(i32)] L=3 Ref = A Ref = A Ref = B Ref = B v1 = (i11, i12) v2 = (i21, i22) v3 = (i31, i32) v4 = (i41, i42) B[f1(i41)][f2(i42)] L=3 Ref = A Ref = A Ref = B Ref = B Ref = A v1 = (i11, i12) v2 = (i21, i22) v3 = (i31, i32) v4 = (i41, i42) v5 = (i51, i52) A[f1(i51)][f2(i52)] L=3 Ref = A Ref = A Ref = B Ref = B Ref = A v1 = (i11, i12) v2 = (i21, i22) v3 = (i31, i32) v4 = (i41, i42) v5 = (i51, i52) (v1 ,v2), (v3 ,v4),(v5,v6),….. A B A A[f1(i51)][f2(i52)] Intra-loop Reuse for Direct Mapped Cache Self-Spatial Reuse Self-spatial reuse occurs when a memory reference accesses the same cache line in different iterations Let the number of iteration vectors in a interval be A. The self spatial reuse within that interval = Rint (Rint = A – Nmiss) where Nmiss is the number of different memory lines brought into the cache line Nmiss = Nr_miss + L/an if group spatial reuse with the preceding interval is zero otherwise Nmiss = Nr_miss Nr_miss = A / (L/an) are the number of replacement misses Intra-loop Reuse for K-Way Set Associative Cache Group-Spatial Reuse Conditions for group-spatial reuse of the memory access vector aI , 1≤ I ≤ K , between two intervals I1 and I2 for a cache set s : The memory references corresponding to the two intervals access the same array. The memory access vector aI is reused iff there exists a vector b in the first I minimum memory access vectors of I2 which lies on the same memory line mlR2(b) = mlR1(aI). Self-Spatial Reuse The self spatial reuse within a interval = Rint (Rint = A – Nmiss) where Nmiss is the number of different memory lines brought into the cache set. Nmiss = Nr_miss + (K-m)*L/an where group spatial reuse with the preceding interval is m. Nr_miss = A / (L/an) are the number of replacement misses The number of iteration vectors in the interval (A) will increase because of associativity. Space and Time Complexity The approach includes the following steps: 1. Building of a data structure in O(ΣlCΣrefsMAX_REFSΠi=1N) 2. Computing Interloop reuse in O(KΣlCΣfMAX_FORSΣrefsMAX_REFSN) 3. Computing Intraloop reuse in O(KΣlCΣfMAX_FORSΣsC/(LXK)N) So the time complexity of the approach is O(KΣlCΣfMAX_FORSΣsC/(LXK)N) The space complexity of the approach is O(ΣrefsMAX_REFSN) THANKS

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 14 |

posted: | 10/6/2012 |

language: | Unknown |

pages: | 29 |

OTHER DOCS BY ajizai

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.