In this experiment

Document Sample
In this experiment Powered By Docstoc
Caches are small, fast memories located between the CPU and the main memory. Buffering
instruction and data, caches speed up program execution, while being completely transparent to
the programmer.

In general, caches are organised in hierarchies, where the speed and the production cost decrease
with the cache level, while the size increases, as shown in Figure 2.1. Caches assume spatial and
temporal locality of data, meaning that if a datum has been requested by the CPU, data close to
this datum will probably be needed soon. Therefore, the whole cache line is loaded. Requests
coming from the processor can either cause a cache hit or a cache miss. The term “hit” expresses
that the data is valid in the cache. So the cache can fulfill the processor’s request. In contrast, a
“miss” indicates that the requested data is not present in the cache. Cache misses can be
classified into four categories: conflict, compulsory, capacity, and coherence. Conflict misses are
misses that would not occur if the cache was fully-associative and had LRU replacement.
Compulsory misses are misses required in any cache organization because they are the first
references to an instruction or piece of data. Capacity misses occur when the cache size is not
sufficient to hold data between references. Coherence misses are misses that occur as a result of
invalidation to preserve multiprocessor cache consistency.

                      miss rate vs cache size
   miss rate %

                 40                                             FFT
                 30                                             SIMPLE
                 20                                             SPEECH
                 10                                             WEATHER

                              cache size

In this experiment, global miss rate is decrease as the cache size increase because more data can
be stored into the cache at the same time (with no need of replacement) and so there is a greater
chance that the data the CPU needs is already in the cache. All in benchmark showed reduction
in miss rate as cache size increase. From this experiment, the miss rate also depends on the
locality grade because larger cache size takes advantage of spatial locality. As cache size
increase, the capacity and conflict (collision) misses were reduce. When enlarge the size of
cache, the compulsory miss rate component stays constant, since it is unaffected by cache size
.The increase in coherence misses occurs because the probability of a miss being caused by an
invalidation increases with cache size. In this experiment, there are conflict (collision) misses
and they occur when another location with the same mapping was loaded (can also be considered
associativity capacity). However, the conflict misses is reduce because in a nonassociative or
direct-mapped cache, each memory block maps to one, and only one, cache line. But because
multiple blocks map to each cache line, accesses to different memory addresses can conflict. In a
fully associative cache, on the other hand, any memory block can be stored in any cache line,
eliminating conflicts. Fully associative caches, however, are expensive and slow, so they are
usually approximated by n-way set-associative caches. As a rule of thumb, a two-way set-
associative cache has a miss rate similar to a direct-mapped cache twice the size. Miss-rate
improvement, however, diminishes rapidly with increasing associativity. For all practical
purposes, an eight-way set-associative cache is just as effective as a fully associative cache. At
large cache size the degree of misses stabilizes because of the compulsory miss is independent of
cache size. The great differences of miss rate for a determinate increment of cache size indicates
the memory addresses are so near that a little increase of cache size brings about a great increase
of performance. In general, parallel program exhibit less spatial and temporal locality because it
is due to shared data are used by multiple processors, essentially providing communication
among the processors through reads and writes of the shared data. When shared data are cached,
the shared value may be replicated in multiple caches. In addition to the reduction in access
latency and required memory bandwidth, this replication also provides a reduction in contention
that may exist for shared data items that are being read by multiple processors simultaneously.
Caching of shared data, however, introduces a new problem: cache coherence.

In conclusion, the increase of cache size does not totally improve the multiprocessor system
performance. As we increase the number of processors, the total amount of cache increases,
usually causing the capacity misses to drop. In contrast, increasing the processor count usually
causes the amount of communication to increase, in turn causing the coherence misses to rise.


   1) Reducing Compulsory and Capacity Misses by Norman P. Jouppi
   2) Analysis of Cache Misses by Using SIMICS by Martin Schindewolf
   3) Computer Architecture A Quantitative Approach Fourth Edition by John L. Hennessy
       and David A. Patterson
   4) NBU CA 2008-09 lab5-cache by Volen Ilarionov
   5) Large-Scale Multiprocessors and Scientific Applications by Jim Gray, Microsoft Research

Shared By: