Chris Ward Cache Memory

Shared by: R6tO8p
Categories
Tags
-
Stats
views:
29
posted:
5/19/2012
language:
pages:
19
Document Sample
scope of work template
							What is it and why do we need it?

                                    Chris Ward
                                         CS147
                                    10/16/2008
What drives us to require cache?
How and Why does it work?

                                   ???
   What we would prefer in our Computer Memory:
     Fast
     Large
     Cheap
However,
 Very Fast memory = Very expen$ive memory
 Since we need large capacity ( today multi-
  gigabyte memories) we need to build a system
  that is the best compromise to keep the total $$
  reasonable.
SRAM                        DRAM
                           DRAMs are smaller
                            and less expensive
 SRAM is faster than       because SRAMs are
      DRAM                made from four to six
                          transistors (flip flops)
                                  per bit.

 SRAMs don't require
    external refresh       DRAMs use only
circuitry or other work   one transistor, plus
 in order for them to        a capacitor.
keep their data intact.
   In the early days of PC technology, memory access
    was only slightly slower than register access
   Since the 1980s the performance gap between
    processor and memory has been growing.
   CPU speed continues to double every few years,
    while the speed of disk and RAM cannot boast such a
    rapid rate of speed improvements.
   For Main Memory RAM, the speed has increased
    from 50 nanoseconds (one billionth of a second) to
    <2 nanoseconds, a 25x improvement over a 30-year
    period
                                          CPU Speed vs Memory Bus Speed
      4.000


      3.500


      3.000


      2.500
GHz




      2.000


      1.500


      1.000


      0.500


      0.000
              1975   1981   1982   1985   1987   1989   1993   1995   1996   1997   1999   2000   2001   2002   2003   2005   2007   2008

                                                         CPU Speed GHz
                                                         Memory Bus Speed (GHz)
   It has been discovered that for about 90% of the
    time that our programs execute only 10% of our
    code is used!
   This is known as the Locality Principle
     Temporal Locality
       When a program asks for a location in memory , it will likely
        ask for that same location again very soon thereafter
     Spatial Locality
       When a program asks for a memory location at a memory
        address (lets say 1000)… It will likely need a nearby location
        1001,1002,1003,10004 … etc.
Construct a Memory Hierarchy which
 tricks the CPU into thinking it has a very
 fast, large, cheap memory system.
For a 1 GHz CPU a 50 ns
wait means 50 wasted
clock cycles




                          Main Memory and Disk estimates Fry’s Ad 10/16/2008
   We established that the Locality principle states that
    only a small amount of Memory is needed for most
    of the program’s lifetime…
   We now have a Memory Hierarchy that places very
    fast yet expensive RAM near the CPU and larger –
    slower – cheaper RAM further away…
   The trick is to keep the data that the CPU
    wants in the small expensive fast memory
    close to the CPU … and how do we do that???
   Hardware and the Operating System are
    responsible for moving data throughout the Memory
    Hierarchy when the CPU needs it.
   Modern programming languages mainly assume two
    levels of memory, main memory and disk storage.
   Programmers are responsible for moving data
    between disk and memory through file I/O.
   Optimizing compilers are responsible for
    generating code that, when executed, will cause the
    hardware to use caches and registers efficiently.
   A computer program or a hardware-maintained
    structure that is designed to manage a cache of
    information
   When the smaller cache is full, the algorithm must
    choose which items to discard to make room for the
    new data
   The "hit rate" of a cache describes how often a
    searched-for item is actually found in the cache
   The "latency" of a cache describes how long after
    requesting a desired item the cache can return that
    item
Each replacement strategy is a compromise between hit rate and latency.

   Direct Mapped Cache
     The direct mapped cache is the simplest form of cache and the easiest to check for
      a hit.
     Unfortunately, the direct mapped cache also has the worst performance, because
      again there is only one place that any address can be stored.
   Fully Associative Cache
     The fully associative cache has the best hit ratio because any line in the cache can
      hold any address that needs to be cached.
     However, this cache suffers from problems involving searching the cache
     A replacement algorithm is used usually some form of a LRU "least recently used"
      algorithm

   N-Way Set Associative Cache
     The set associative cache is a good compromise between the direct mapped and set
      associative caches.
What happens when we run out of main memory?
Our programs need more and more RAM!
   Virtual Memory is basically the extension of physical
    main memory (RAM) into a lower cost portion of our
    Memory Hierarchy (lets say… Hard Disk)
   A form of the Overlay approach, managed by the
    OS, called Paging is used to swap “pages” of
    memory back and forth between the Disk and
    Physical Ram.
   Hard Disks are huge, but to you remember how slow
    they are??? Millions of times slower that the other
    memories in our pyramid.
CPU              Pentium Pentium AMD K6- AMD K6-                  Athlon Pentiu Celeron/3   Celero Pentiu   Quad-Core IXeon
Type     Pentium Pro     II      2       3       Duron     Athlon XP     m III  70          n/478 m 4       5400
CPU      233MHz 200MHz 450MHz 550MHz 450MHz 1.3GHz         1.4GH 2.2GHz 1.4GHz 1.4GHz       2.4GH 3.6GHz    3.40 GHz
speed                                                      z                                z

L1 cache 4.3ns   5.0ns   2.2ns   1.8ns   2.2ns   0.77ns   0.71ns 0.45ns 0.71ns 0.71ns      0.42ns 0.28ns 0.42ns (2.4GHz)
speed    (233MHz (200MHz (450MHz (550MHz (450MHz (1.3GHz) (1.4GH (2.2GHz) (1.4GHz (1.4GHz) (2.4G (3.6GHz
         )       )       )       )       )                z)              )                Hz)    )
L1 cache 16K      32K     32K    64K      64K     128K     128K 128K       32K    32K       20K   20K       256K
size

L2 cache onboard on-chip on-chip onboard on-die   on-die   on-die on-die   on-die on-die    on-die on-die   on-die
type

CPU/L2            1/1     1/2             1/1     1/1      1/1   1/1       1/1    1/1       1/1   1/1       1/1
speed
ratio

L2 cache 15ns    5ns     4.4ns   10ns    2.2ns   0.77ns   0.71ns 0.45ns 0.71ns 0.71ns      0.42ns 0.28ns 0.42ns (2.4GHz)
speed    (66MHz) (200MHz (225MHz (100MHz (450MHz (1.3GHz) (1.4GH (2.2GHz) (1.4GHz (1.4GHz) (2.4G (3.6GHz
                 )       )       )       )                z)              )                Hz)    )
L2 cache varies   256K    512K   varies   256K    64K      256K 512K       512K   256K      128K 1M         12MB
size

CPU bus 66MHz     66MHz   100MHz 100MHz 100MHz 200MHz 266M 400MHz 133MHz 100MHz             400M 800MH      1600 MHz
speed                                                 Hz                                    Hz   z

Memory 60ns    60ns    10ns    10ns    10ns    5ns     3.8ns 2.5ns   7.5ns 10ns      2.5ns 1.25ns 1.25ns (800MHz)
bus    (16MHz) (16MHz) (100MHz (100MHz (100MHz (200MHz (266M (400MHz (133MH (100MHz) (400M (800MH
speed                  )       )       )       )       Hz)   )       z)              Hz)   z)
   http://en.wikipedia.org/wiki/CPU_cache
   http://download.intel.com/pressroom/kits/IntelProcessorHistory.pdf
   http://processorfinder.intel.com/details.aspx?sSpec=SLBBD
   http://www.dba-oracle.com/t_history_ram.htm
   http://www.superssd.com/products/ramsan-400/indexb.htm
   http://www.pcguide.com/ref/ram/types_DRAM.htm
   http://en.wikipedia.org/wiki/Memory_hierarchy
   http://e-articles.info/e/a/title/Memory-Basics-~-ROM-DRAM-SRAM-
    Cache-Memory/
   http://en.wikipedia.org/wiki/Cache_algorithms

   http://www.pcguide.com/ref/mbsys/cache/funcWhy-c.html
m       (mili)    10^-3    k   (kilo)    10^3
micro (µ) (micro) 10^-6    M   (mega)    10^6
n       (nano)    10^-9    G   (giga)    10^9
p       (pico)    10^-12   T   (tera)    10^12
f       (femto) 10^-15     P   (peta)    10^15
a       (atto)    10^-18   E   (exa)     10^18
z       (zepto)   10^-21   Z   (zeta)    10^21
                           Y   (yotta)   10^24

						
Related docs
Other docs by R6tO8p
BFC Agenda January 9, 2012-Monday
Views: 1  |  Downloads: 0
music lovers PR 4
Views: 0  |  Downloads: 0
pastoral sow introduction
Views: 0  |  Downloads: 0
20120111 21 21 Agenda 2012 01 17
Views: 4  |  Downloads: 0
NoReturn Cast crew
Views: 2  |  Downloads: 0
20120201 16 16 Agenda 2012 02 07
Views: 1  |  Downloads: 0
SFAC min04022009
Views: 2  |  Downloads: 0
Stevne i Ekeberghallen 5 feb
Views: 39  |  Downloads: 0
Measuring CPU Utilization in Windows
Views: 53  |  Downloads: 0
SAS 70 presentation
Views: 8  |  Downloads: 0