Chris Ward Cache Memory
Document Sample


What is it and why do we need it?
Chris Ward
CS147
10/16/2008
What drives us to require cache?
How and Why does it work?
???
What we would prefer in our Computer Memory:
Fast
Large
Cheap
However,
Very Fast memory = Very expen$ive memory
Since we need large capacity ( today multi-
gigabyte memories) we need to build a system
that is the best compromise to keep the total $$
reasonable.
SRAM DRAM
DRAMs are smaller
and less expensive
SRAM is faster than because SRAMs are
DRAM made from four to six
transistors (flip flops)
per bit.
SRAMs don't require
external refresh DRAMs use only
circuitry or other work one transistor, plus
in order for them to a capacitor.
keep their data intact.
In the early days of PC technology, memory access
was only slightly slower than register access
Since the 1980s the performance gap between
processor and memory has been growing.
CPU speed continues to double every few years,
while the speed of disk and RAM cannot boast such a
rapid rate of speed improvements.
For Main Memory RAM, the speed has increased
from 50 nanoseconds (one billionth of a second) to
<2 nanoseconds, a 25x improvement over a 30-year
period
CPU Speed vs Memory Bus Speed
4.000
3.500
3.000
2.500
GHz
2.000
1.500
1.000
0.500
0.000
1975 1981 1982 1985 1987 1989 1993 1995 1996 1997 1999 2000 2001 2002 2003 2005 2007 2008
CPU Speed GHz
Memory Bus Speed (GHz)
It has been discovered that for about 90% of the
time that our programs execute only 10% of our
code is used!
This is known as the Locality Principle
Temporal Locality
When a program asks for a location in memory , it will likely
ask for that same location again very soon thereafter
Spatial Locality
When a program asks for a memory location at a memory
address (lets say 1000)… It will likely need a nearby location
1001,1002,1003,10004 … etc.
Construct a Memory Hierarchy which
tricks the CPU into thinking it has a very
fast, large, cheap memory system.
For a 1 GHz CPU a 50 ns
wait means 50 wasted
clock cycles
Main Memory and Disk estimates Fry’s Ad 10/16/2008
We established that the Locality principle states that
only a small amount of Memory is needed for most
of the program’s lifetime…
We now have a Memory Hierarchy that places very
fast yet expensive RAM near the CPU and larger –
slower – cheaper RAM further away…
The trick is to keep the data that the CPU
wants in the small expensive fast memory
close to the CPU … and how do we do that???
Hardware and the Operating System are
responsible for moving data throughout the Memory
Hierarchy when the CPU needs it.
Modern programming languages mainly assume two
levels of memory, main memory and disk storage.
Programmers are responsible for moving data
between disk and memory through file I/O.
Optimizing compilers are responsible for
generating code that, when executed, will cause the
hardware to use caches and registers efficiently.
A computer program or a hardware-maintained
structure that is designed to manage a cache of
information
When the smaller cache is full, the algorithm must
choose which items to discard to make room for the
new data
The "hit rate" of a cache describes how often a
searched-for item is actually found in the cache
The "latency" of a cache describes how long after
requesting a desired item the cache can return that
item
Each replacement strategy is a compromise between hit rate and latency.
Direct Mapped Cache
The direct mapped cache is the simplest form of cache and the easiest to check for
a hit.
Unfortunately, the direct mapped cache also has the worst performance, because
again there is only one place that any address can be stored.
Fully Associative Cache
The fully associative cache has the best hit ratio because any line in the cache can
hold any address that needs to be cached.
However, this cache suffers from problems involving searching the cache
A replacement algorithm is used usually some form of a LRU "least recently used"
algorithm
N-Way Set Associative Cache
The set associative cache is a good compromise between the direct mapped and set
associative caches.
What happens when we run out of main memory?
Our programs need more and more RAM!
Virtual Memory is basically the extension of physical
main memory (RAM) into a lower cost portion of our
Memory Hierarchy (lets say… Hard Disk)
A form of the Overlay approach, managed by the
OS, called Paging is used to swap “pages” of
memory back and forth between the Disk and
Physical Ram.
Hard Disks are huge, but to you remember how slow
they are??? Millions of times slower that the other
memories in our pyramid.
CPU Pentium Pentium AMD K6- AMD K6- Athlon Pentiu Celeron/3 Celero Pentiu Quad-Core IXeon
Type Pentium Pro II 2 3 Duron Athlon XP m III 70 n/478 m 4 5400
CPU 233MHz 200MHz 450MHz 550MHz 450MHz 1.3GHz 1.4GH 2.2GHz 1.4GHz 1.4GHz 2.4GH 3.6GHz 3.40 GHz
speed z z
L1 cache 4.3ns 5.0ns 2.2ns 1.8ns 2.2ns 0.77ns 0.71ns 0.45ns 0.71ns 0.71ns 0.42ns 0.28ns 0.42ns (2.4GHz)
speed (233MHz (200MHz (450MHz (550MHz (450MHz (1.3GHz) (1.4GH (2.2GHz) (1.4GHz (1.4GHz) (2.4G (3.6GHz
) ) ) ) ) z) ) Hz) )
L1 cache 16K 32K 32K 64K 64K 128K 128K 128K 32K 32K 20K 20K 256K
size
L2 cache onboard on-chip on-chip onboard on-die on-die on-die on-die on-die on-die on-die on-die on-die
type
CPU/L2 1/1 1/2 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1
speed
ratio
L2 cache 15ns 5ns 4.4ns 10ns 2.2ns 0.77ns 0.71ns 0.45ns 0.71ns 0.71ns 0.42ns 0.28ns 0.42ns (2.4GHz)
speed (66MHz) (200MHz (225MHz (100MHz (450MHz (1.3GHz) (1.4GH (2.2GHz) (1.4GHz (1.4GHz) (2.4G (3.6GHz
) ) ) ) z) ) Hz) )
L2 cache varies 256K 512K varies 256K 64K 256K 512K 512K 256K 128K 1M 12MB
size
CPU bus 66MHz 66MHz 100MHz 100MHz 100MHz 200MHz 266M 400MHz 133MHz 100MHz 400M 800MH 1600 MHz
speed Hz Hz z
Memory 60ns 60ns 10ns 10ns 10ns 5ns 3.8ns 2.5ns 7.5ns 10ns 2.5ns 1.25ns 1.25ns (800MHz)
bus (16MHz) (16MHz) (100MHz (100MHz (100MHz (200MHz (266M (400MHz (133MH (100MHz) (400M (800MH
speed ) ) ) ) Hz) ) z) Hz) z)
http://en.wikipedia.org/wiki/CPU_cache
http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf
http://processorfinder.intel.com/details.aspx?sSpec=SLBBD
http://www.dba-oracle.com/t_history_ram.htm
http://www.superssd.com/products/ramsan-400/indexb.htm
http://www.pcguide.com/ref/ram/types_DRAM.htm
http://en.wikipedia.org/wiki/Memory_hierarchy
http://e-articles.info/e/a/title/Memory-Basics-~-ROM-DRAM-SRAM-
Cache-Memory/
http://en.wikipedia.org/wiki/Cache_algorithms
http://www.pcguide.com/ref/mbsys/cache/funcWhy-c.html
m (mili) 10^-3 k (kilo) 10^3
micro (µ) (micro) 10^-6 M (mega) 10^6
n (nano) 10^-9 G (giga) 10^9
p (pico) 10^-12 T (tera) 10^12
f (femto) 10^-15 P (peta) 10^15
a (atto) 10^-18 E (exa) 10^18
z (zepto) 10^-21 Z (zeta) 10^21
Y (yotta) 10^24
Get documents about "