lecture12 by smbutt


									The Memory Hierarchy

In all of our discussions so far we have treated memory as having just two levels – Registers Very fast, but not many of them Slower, but much larger than the registers – Main memory

Lecture 12 The Memory Hierachy


In fact, some machine have other levels in their memory hierarchy – Caches Fast, medium-sized memory which acts as a “buffer” between main memory and the registers Large region of memory which is physically on the hard disk, but logically part of main memory

– Virtual Memory

Dr Iain Styles, School of Computer Science November 2006


See Chapter 7 of Patterson, esp. 7.2 (caches), 7.4 (virtual memory)


Principles of Caches

What do caches do?

Caches are used to improve machine performance

They are a medium-sized, fast area of storage used to store a copy of some part of the main memory

Caches taken advantage of the spatial and temporal locality inherent in program code to allow quicker access to instructions and data that are likely to be used again soon When an item in memory is referenced, the cache is searched to see if it can be found there

Caches lead to performace improvements because of two basic principles – The principle of spatial locality If you have recently referenced a particular item, you are likely to want to access nearby items soon Instructions execute sequentially Arrays etc are stored in contiguous memory locations If you have recently referenced an item, you are likely to want to reference it again soon Loops!

If the item is in the cache, it is available to the CPU much more quickly than if it were in main memory only If the item is not in the cache, it is fetched from main memory into the cache and made available to the CPU


– The principle of temporal locality

Over time, the contents of the cache stabilise and many fewer accesses to main memory are required – most memory requests can be serviced by the cache We usually have separate caches for instructions and for data, since they are dealt with quite separately by the processor






How do caches work?

Schematic of a cache
8176 Tag

Caches need to store two things – The data that is being cached – The address of the data being cached

Caches therefore consist of two arrays, one storing data, one storing the address of that data (called the tag) When a request is made to the cache, the memory address being accessed is compared to all of the addresses stored in the cache – Exact details depend on the type of cache



If the address is in the cache, the corresponding word of data is read/written – a cache hit



Otherwise, main memory must be accessed – a cache miss






Cache Associativity

Direct-mapped caches

The number of different locations that an item of data can take in the cache is known as its associativity

The simplest, and most common type of cache is the directmapped cache

In a direct-mapped cache, each address in main memory can go in only one location in the cache – Easy to design the cache and fast – Can lead to a lot of cache misses as items may have been thrown out of the cache to make room for other items

Here, the mapping between entries in the cache and addresses in memory is largely hard-wired A simple tag for a direct mapped cache with eight entries is: 111 110 101 100 011 010 001 000

In a fully-associative cache each address from main memory can go in any location in the cache – Misses are much less common as you can replace less recently used items – Hard to design, and slower

In an n-way set associative cache, each address in main memory can occupy n cache locations – Provides a compromise

address7[29:0] address6[29:0] address5[29:0] address4[29:0] address3[29:0] address2[29:0] address1[29:0] address0[29:0]

Data with address 111* can only go here



Note that any set of three bits could be used for the encoding – Depends on processor details

2-way set associative caches

Two-way set associative cache
11 10 01 00 11 10 01 00

The problem with direct-mapped caches is that we can't store, for example, the words with addresses 111x and 111y in the cache at the same time

This can lead to “thrashing” – Entries in the cache are continually swapped

Careful selection of the hard-wired address bits can minimise this, but sometimes this is not enough

A 2-way set associative cache allows each word to be store in two places in the cache

address7[30:0] address6[30:0] address5[30:0] address4[30:0] address3[30:0] address2[30:0] address1[30:0] address0[30:0]

Data with address 11* can go here

And here

For an 8-entry cache, we would hard-wire only two of the address bits

We can similarly build 4-way, 8-way etc set associative caches, which allow each memory address to sit in n cache locations

The ultimate cache is fully associative – No hardwired address mapping – All items can be anywhere in the cache



Notes on associativity

Refilling the cache

The more associative a cache is, the less thrashing will occur

In a direct-mapped cache, new entries can only go to one location

This can lead to large performance gains by reducing the number of main memory accesses, but at a cost

In an n-way associative cache, we need to decide which of the n possible locations we should replace when a mis occurs

More associativity leads to increased complexity in the cache design

This is governed by the refill policy and is controlled by a refill engine

The goal for cache designers is to allow the CPU to access cached data within a single processor cycle

Most obvious policy is least-recently-used (LRU)

This is much harder to do as n increases due to the extra design complexity

Of the n possible locations at which we could put a word in the cache, we put it in the one which has been accessed least recently – Can cause problems in loops The LRU entry may be the one you need next! – Requires a lot of housekeeping, esp. if n is large

We gain by having fewer accesses to external memory

But lose by needing to allow more cycles for cache accesses


Whether larger associativity leads to greater performance depends very much on the nature of the machine, and on the code that uses it – there are no concrete rules as to which is best

The alternative is random replacement of an entry – easy


For n=2, the miss rate for random replacement is about 1.1x that for LRU

Writing to the Cache

Write-back vs write-through

Each write policy has advantages and disadvantages

Read accesses to caches are straightforward – If the data is in the cache, fetch it from the cache – If it isn't, fetch it from main memory

Write-through caches ensure that memory and cache are consistent, but require extra memory accesses

Write accesses are harder

Write-back caches use lower memory bandwidth, but cache and memory are incoherent Which policy you adopt depends on the details of your system

– If the data is in the cache, should we write just to the cache, or should we also update main memory to maintain coherence of cache and main memory

If you only allow the CPU (and nothing else) to access the memory, then write-back is (probably) safe

Writing just to the cache means that main memory can be out-of-date, but reduces the number of memory accesses – Must make sure that main memory is updated when an entry in the cache is replaced (must flag entries on a write) – This is known as a write-back cache

But if you allow other devices (e.g. DMA modules) to talk to the memory, then great care is required to ensure that the memory is up-to-date. – DMA must either be routed through the cache – Or the cache must be flushed (updated words written back to main memory and cache contents invalidated) before DMA can start

If main memory and cache are both updated simultaneously, the cache is said to be write-through



– Note that a DMA write to memory only could cause the cache to be out-of-date – we must invalidate it

Virtual Memory

How virtual memory works

We now turn out attention to the problem of extending memory

The basic idea behind virtual memory is that large pieces of programs are unused most of the time

Even in modern machines with gigabytes of main memory, we will sometimes fill it up

The bits that are currently being used can be in main (or physical) memory, the rest is on the disk

Somehow we need to make use of other resources to extend memory

Hard disks are a resource to which we might naturally turn, but they are far too slow to just treat as a simple extension to main memory

When a routine or module is called by the main program, it can be loaded into physical memory in the same way that code/data is loaded into the cache

However, just because main memory is full does not mean that we are actively using all of the data and code it contains

Programmers used to have to load/unload modules by hand when they were needed or finished with – virtual memory deals with this automatically

The idea behind virtual memory is that we can use main memory as a cache for those portions of code and data that are actively being used

One major difference between virtual memory and caches is that physical memory has no facility for storing the disk addresses that are being stored there


The rest of the program is stored in virtual memory on the disk

The way around this is to construct a virtual address space, which maps both physical addresses and disk addresses onto one address space


Virtual Addressing

Virtual Addressing
Virtual addresses Address Translation Physical addresses

Rather than dealing with individual words, virtual memory divides memory into pages which typically contain 16 – 64 kBytes each – This makes the allocation of blocks to main memory much easier

Each page in virtual memory corresponds to a block of addresses either in main memory or on disk

The translation between physical and virtual addresses is done in software (by the operating system)

Disk addresses



How virtual memory works

Notes on virtual memory

When a memory access is made, the virtual memory manager first checks whether the relevant page is in main memory by doing the address translation

Virtual memory allows programs to occupy physically noncontiguous blocks of memory – The virtual addresses can be contiguous no matter what the physical addresses

When the page is not in main memory, a page fault occurs – this is like a cache miss

The virtual memory manager then arranges to transfer the relevant pages into main memory, and updates the translation table

Also ensure that two programs do not share each others virtual address space

The exception to this is code/data that is shared between programs

The mapping is fully associative so the pages can go anywhere in main memory – This also helps reduce the rate of page faults

The virtual memory manager can allocate more than one virtual address to a single physical address – Two programs with distinct virtual address spaces can share the same code or data (e.g. a DLL)

Writing is always write-back, due to the time that writes to disk take

Can be a bit wasteful as it can only work with whole pages – Most modules will not fit exactly into an integer number of pages


All the management of virtual memory can be done in software, since the overhead is small compared to the time taken to access the disk



This lecture we have investigated the memory hierachy – The idea of caches, and different design choices that influence the way the cache works – Virtual memory as a way of organising main memory and disks into a unified memory space to allow larger programs to be written, and to allow the CPU to run multiple programs

Next lecture we will spend some time looking at how we design some of the modules we have been studying using Boolean logic gates


To top