Embed
Email

Computer Systems

Document Sample

Shared by: Lingjuan Ma
Categories
Tags
Stats
views:
0
posted:
12/23/2011
language:
pages:
21
Computer Systems



the impact of caches







University of Amsterdam





Arnoud Visser 1

Computer Systems – the impact of caches

Introduction

Different sorts of memory

• On-die 0/1/10 cycles

• On-board 100

• On-disk 10.000

• Off-machine 1.000.000







University of Amsterdam





Arnoud Visser 2

Computer Systems – the impact of caches

The CPU-Memory Gap

• The increasing gap between

disk, DRAM and SRAM, CPU speeds.

100,000,000

10,000,000

1,000,000

100,000 Disk seek time

DRAM access time

ns









10,000

SRAM access time

1,000

CPU cycle time

100

10

1

1980 1985 1990 1995 2000

University of Amsterdam

year

Arnoud Visser 3

Computer Systems – the impact of caches

Storage Trends

bigger, not faster



metric 1980 1985 1990 1995 2000 2000:1980



$/MB 500 100 8 0.30 0.05 10,000

Disk access (ms) 87 75 28 10 8 11

typical size (MB) 1 10 160 1,000 9,000 9,000





metric 1980 1985 1990 1995 2000 2000:1980



DRAM $/MB 8,000 880 100 30 1 8,000

access (ns) 375 200 100 70 60 6

typical size (MB) 0.064 0.256 4 16 64 1,000









(Culled from back issues of Byte and PC Magazine)

University of Amsterdam





Arnoud Visser 4

Computer Systems – the impact of caches

Processor trends

faster



metric 1980 1985 1990 1995 2000 2000:1980



SRAM $/MB 19,200 2,900 320 256 100 190

access (ns) 300 150 35 15 2 100

typical size (MB) 0.008 0.016 0.032





1980 1985 1990 1995 2000 2000:1980



processor 8080 286 386 Pent P-III

clock rate (MHz) 1 6 20 150 750 750

cycle time (ns) 1,000 166 50 6 1.6 750









University of Amsterdam





Arnoud Visser 5

Computer Systems – the impact of caches

Intel Processors Cache

SRAM



L1 L2

486 1989-1994 8K -

Pentium 1993 8K 8K -



Pentium Pro 1995-1999 8K 8K 256K-1M

Pentium II 1997 16 K 16 K 512K ½

Celeron A 1998 16 K 16 K 128K

Pentium III 2000 16 K 16 K 256K

Coppermine

Pentium 4 2000 12 K 8K 256K

Willamette

Pentium 4 2002 12 K 8K 512K

Northwood

University of Amsterdam

http://www.intel.com/pressroom/kits/quickreffam.htm

Arnoud Visser 6

Computer Systems – the impact of caches

Memory Hierarchy

Smaller, L0:

faster, Registers CPU registers hold words

and retrieved from cache memory.

costlier L1: On-chip L1

(per byte) cache (SRAM) L1 cache holds cache lines

storage retrieved from the L2 cache.

devices Off-chip L2

L2:

cache (SRAM) L2 cache holds cache lines

retrieved from memory.



L3: Main memory

Larger, (DRAM)

Main memory holds disk

slower, blocks retrieved from local

and disks.

cheaper Local secondary storage

(per byte) L4:

(local disks) Local disks hold files

storage

retrieved from disks

devices

on remote network

servers.

L5: Remote secondary storage

(distributed file systems, Web servers)







University of Amsterdam





Arnoud Visser 7

Computer Systems – the impact of caches

Pay the price

• To access large amounts of data in a

cost-effective manner, the bulk of the

data must be stored on disk

80 GB: ~$110

1GB: ~$200

4 MB: ~$500



SRAM DRAM Disk







University of Amsterdam





Arnoud Visser 8

Computer Systems – the impact of caches

Locality

• Principle of Locality:

– Programs tend to reuse data and instructions near

those they have used recently, or that were recently

referenced themselves.

– Temporal locality: Recently referenced items are

likely to be referenced in the near future.

– Spatial locality: Items with nearby addresses tend

to be referenced close together in time.







University of Amsterdam





Arnoud Visser 9

Computer Systems – the impact of caches

University of Amsterdam





Arnoud Visser 10

Computer Systems – the impact of caches

Locality Example

sum = 0;

for (i = 0; i < n; i++)

sum += a[i];

return sum;

• Data

– Reference array elements in succession

(stride-1 reference pattern): Spatial locality



– Reference sum each iteration: Temporal locality

• Instructions

– Reference instructions in sequence: Spatial locality

– Cycle through loop repeatedly: Temporal locality

University of Amsterdam





Arnoud Visser 11

Computer Systems – the impact of caches

Power Programmer

• Claim: Being able to look at code and

get a qualitative sense of its locality is

a key skill for a professional

programmer.

int sumarrayrows(int a[M][N])

{

int i, j, sum = 0;

• Good locality? for (i = 0; i < M; i++)

for (j = 0; j < N; j++)

sum += a[i][j];

return sum

} University of Amsterdam





Arnoud Visser 12

Computer Systems – the impact of caches

Stride-M example

• Question: Does this function have

good locality?

int sumarraycols(int a[M][N])

{

int i, j, sum = 0;



for (j = 0; j < N; j++)

for (i = 0; i < M; i++)

sum += a[i][j];

return sum

}



University of Amsterdam





Arnoud Visser 13

Computer Systems – the impact of caches

Matrix M=2,N=3

int sumarrowrows()





Adress 0 4 8 12 16 20

Contents a00 a01 a02 a10 a11 a12

Acces order 1 2 3 4 5 6

int sumarrowcols()





Adress 0 4 8 12 16 20

Contents a00 a01 a02 a10 a11 a12



Acces order 1 3 5 2 4 6

University of Amsterdam





Arnoud Visser 14

Computer Systems – the impact of caches

Expect: Stride-1 is better!

32 bytes





600









500









400

MB/s









300 Series1









200









100









0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)









– int A[2][4] University of Amsterdam





Arnoud Visser 15

Computer Systems – the impact of caches

Reality:

small matrices fit in cache

4 KB





5000





4500





4000





3500

Througput (MB/s)









3000





2500 Series1





2000





1500





1000





500





0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)









– int A[32][32] University of Amsterdam





Arnoud Visser 16

Computer Systems – the impact of caches

Reality:

Performance-drop cache L2 / L1

not dramatic

128 KB





6000









5000









4000

Throughput (MB/s)









3000 Series1









2000









1000









0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)







– int A[180][180] University of Amsterdam





Arnoud Visser 17

Computer Systems – the impact of caches

Reality:

Only when DRAM is accessed,

the penalty can be seen

1 MB





1800





1600





1400





1200

Throughput (MB/s)









1000

Series1

800





600





400





200





0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)







– int A[512][512] University of Amsterdam





Arnoud Visser 18

Computer Systems – the impact of caches

Memory Mountain

Pentium 4

5000

2.4 GHz

4500 8 KB L1 d-cache

Read throughput (MB/s)









4000

12 KB L1 i-cache

L1 512 KB L2 cache

3500



3000



2500 L2

2000

Ridges of

1500 xe

temporal

Slopes of

locality

spatial 1000

locality 500

0

Mem

s1

s3









2k

s5









8k

s7









32k

s9









128k

s11









Stride (words)

512k

s13









Working set size (bytes)

2m

s15

8m









University of Amsterdam





Arnoud Visser 19

Computer Systems – the impact of caches

Summary

• As long as your data fits in the cache, and

your program shows good locality, good

performance is guaranteed.









University of Amsterdam





Arnoud Visser 20

Computer Systems – the impact of caches

Assignment



• Practice Problem 6.9 (p. 624):

'Order three functions to the spatial locality

enjoyed by each.'

• Practice Problem 6.22 (p. 659):

'Estimate the time, in CPU cycles, to read a 8-byte

word, from the different L1-d of a i7 processor





University of Amsterdam





Arnoud Visser 21

Computer Systems – the impact of caches



Related docs
Other docs by Lingjuan Ma
Data Summative
Views: 0  |  Downloads: 0
Data Structures for Representing Trees
Views: 0  |  Downloads: 0
DATA STRUCTURE
Views: 0  |  Downloads: 0
DATA STRUCTURE_1_
Views: 0  |  Downloads: 0
Data Structure and Algorithms
Views: 0  |  Downloads: 0
Data Storage Cost Reduction Strategies
Views: 0  |  Downloads: 0
Data Smoothing and Filtering
Views: 0  |  Downloads: 0
Data sharing_ What about cohorts_
Views: 0  |  Downloads: 0
Data Retreat_ Trainer-of-Trainer Workshop
Views: 6  |  Downloads: 0
Data Representation_2_
Views: 2  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!