lecture12 by smbutt

VIEWS: 6 PAGES: 6

									The Memory Hierarchy
 

In all of our discussions so far we have treated memory as having just two levels – Registers Very fast, but not many of them Slower, but much larger than the registers – Main memory
¡

Lecture 12 The Memory Hierachy

 

In fact, some machine have other levels in their memory hierarchy – Caches Fast, medium-sized memory which acts as a “buffer” between main memory and the registers Large region of memory which is physically on the hard disk, but logically part of main memory
¡

– Virtual Memory
¡

Dr Iain Styles, School of Computer Science November 2006

 

See Chapter 7 of Patterson, esp. 7.2 (caches), 7.4 (virtual memory)

2

Principles of Caches
 

What do caches do?
 

Caches are used to improve machine performance
 

They are a medium-sized, fast area of storage used to store a copy of some part of the main memory
   

Caches taken advantage of the spatial and temporal locality inherent in program code to allow quicker access to instructions and data that are likely to be used again soon When an item in memory is referenced, the cache is searched to see if it can be found there
 

Caches lead to performace improvements because of two basic principles – The principle of spatial locality If you have recently referenced a particular item, you are likely to want to access nearby items soon Instructions execute sequentially Arrays etc are stored in contiguous memory locations If you have recently referenced an item, you are likely to want to reference it again soon Loops!
  ¡  

If the item is in the cache, it is available to the CPU much more quickly than if it were in main memory only If the item is not in the cache, it is fetched from main memory into the cache and made available to the CPU
 

¡

– The principle of temporal locality
¡

Over time, the contents of the cache stabilise and many fewer accesses to main memory are required – most memory requests can be serviced by the cache We usually have separate caches for instructions and for data, since they are dealt with quite separately by the processor

3

¡

¡

4

¡

How do caches work?
 

Schematic of a cache
8176 Tag
select

Caches need to store two things – The data that is being cached – The address of the data being cached
 

Caches therefore consist of two arrays, one storing data, one storing the address of that data (called the tag) When a request is made to the cache, the memory address being accessed is compared to all of the addresses stored in the cache – Exact details depend on the type of cache
 

Data

 

If the address is in the cache, the corresponding word of data is read/written – a cache hit
 

address

hit/miss

Otherwise, main memory must be accessed – a cache miss

8176

write_data

read_data

5

6

Cache Associativity
 

Direct-mapped caches
 

The number of different locations that an item of data can take in the cache is known as its associativity
 

The simplest, and most common type of cache is the directmapped cache
 

In a direct-mapped cache, each address in main memory can go in only one location in the cache – Easy to design the cache and fast – Can lead to a lot of cache misses as items may have been thrown out of the cache to make room for other items
   

Here, the mapping between entries in the cache and addresses in memory is largely hard-wired A simple tag for a direct mapped cache with eight entries is: 111 110 101 100 011 010 001 000
 

In a fully-associative cache each address from main memory can go in any location in the cache – Misses are much less common as you can replace less recently used items – Hard to design, and slower
 

In an n-way set associative cache, each address in main memory can occupy n cache locations – Provides a compromise

address7[29:0] address6[29:0] address5[29:0] address4[29:0] address3[29:0] address2[29:0] address1[29:0] address0[29:0]

Data with address 111* can only go here

7

8

Note that any set of three bits could be used for the encoding – Depends on processor details

2-way set associative caches
 

Two-way set associative cache
11 10 01 00 11 10 01 00
 

The problem with direct-mapped caches is that we can't store, for example, the words with addresses 111x and 111y in the cache at the same time
 

This can lead to “thrashing” – Entries in the cache are continually swapped
 

Careful selection of the hard-wired address bits can minimise this, but sometimes this is not enough
 

A 2-way set associative cache allows each word to be store in two places in the cache
 

address7[30:0] address6[30:0] address5[30:0] address4[30:0] address3[30:0] address2[30:0] address1[30:0] address0[30:0]

Data with address 11* can go here

And here

For an 8-entry cache, we would hard-wire only two of the address bits

We can similarly build 4-way, 8-way etc set associative caches, which allow each memory address to sit in n cache locations
 

The ultimate cache is fully associative – No hardwired address mapping – All items can be anywhere in the cache

9

10

Notes on associativity
 

Refilling the cache
 

The more associative a cache is, the less thrashing will occur
 

In a direct-mapped cache, new entries can only go to one location
 

This can lead to large performance gains by reducing the number of main memory accesses, but at a cost
 

In an n-way associative cache, we need to decide which of the n possible locations we should replace when a mis occurs
 

More associativity leads to increased complexity in the cache design
 

This is governed by the refill policy and is controlled by a refill engine
 

The goal for cache designers is to allow the CPU to access cached data within a single processor cycle
 

Most obvious policy is least-recently-used (LRU)
 

This is much harder to do as n increases due to the extra design complexity
 

Of the n possible locations at which we could put a word in the cache, we put it in the one which has been accessed least recently – Can cause problems in loops The LRU entry may be the one you need next! – Requires a lot of housekeeping, esp. if n is large
  ¡  

We gain by having fewer accesses to external memory
 

But lose by needing to allow more cycles for cache accesses
 

11

Whether larger associativity leads to greater performance depends very much on the nature of the machine, and on the code that uses it – there are no concrete rules as to which is best

The alternative is random replacement of an entry – easy

12

For n=2, the miss rate for random replacement is about 1.1x that for LRU

Writing to the Cache
 

Write-back vs write-through
 

Each write policy has advantages and disadvantages
 

Read accesses to caches are straightforward – If the data is in the cache, fetch it from the cache – If it isn't, fetch it from main memory
 

Write-through caches ensure that memory and cache are consistent, but require extra memory accesses
 

Write accesses are harder
 

Write-back caches use lower memory bandwidth, but cache and memory are incoherent Which policy you adopt depends on the details of your system
 

– If the data is in the cache, should we write just to the cache, or should we also update main memory to maintain coherence of cache and main memory
 

If you only allow the CPU (and nothing else) to access the memory, then write-back is (probably) safe
 

Writing just to the cache means that main memory can be out-of-date, but reduces the number of memory accesses – Must make sure that main memory is updated when an entry in the cache is replaced (must flag entries on a write) – This is known as a write-back cache
 

But if you allow other devices (e.g. DMA modules) to talk to the memory, then great care is required to ensure that the memory is up-to-date. – DMA must either be routed through the cache – Or the cache must be flushed (updated words written back to main memory and cache contents invalidated) before DMA can start

If main memory and cache are both updated simultaneously, the cache is said to be write-through

13

14

– Note that a DMA write to memory only could cause the cache to be out-of-date – we must invalidate it

Virtual Memory
 

How virtual memory works
 

We now turn out attention to the problem of extending memory
 

The basic idea behind virtual memory is that large pieces of programs are unused most of the time
 

Even in modern machines with gigabytes of main memory, we will sometimes fill it up
 

The bits that are currently being used can be in main (or physical) memory, the rest is on the disk
 

Somehow we need to make use of other resources to extend memory
 

Hard disks are a resource to which we might naturally turn, but they are far too slow to just treat as a simple extension to main memory
 

When a routine or module is called by the main program, it can be loaded into physical memory in the same way that code/data is loaded into the cache
 

However, just because main memory is full does not mean that we are actively using all of the data and code it contains
 

Programmers used to have to load/unload modules by hand when they were needed or finished with – virtual memory deals with this automatically
 

The idea behind virtual memory is that we can use main memory as a cache for those portions of code and data that are actively being used
 

One major difference between virtual memory and caches is that physical memory has no facility for storing the disk addresses that are being stored there
 

15

The rest of the program is stored in virtual memory on the disk

The way around this is to construct a virtual address space, which maps both physical addresses and disk addresses onto one address space

16

Virtual Addressing
 

Virtual Addressing
Virtual addresses Address Translation Physical addresses

Rather than dealing with individual words, virtual memory divides memory into pages which typically contain 16 – 64 kBytes each – This makes the allocation of blocks to main memory much easier
 

Each page in virtual memory corresponds to a block of addresses either in main memory or on disk
 

The translation between physical and virtual addresses is done in software (by the operating system)

Disk addresses

17

18

How virtual memory works
 

Notes on virtual memory
 

When a memory access is made, the virtual memory manager first checks whether the relevant page is in main memory by doing the address translation
 

Virtual memory allows programs to occupy physically noncontiguous blocks of memory – The virtual addresses can be contiguous no matter what the physical addresses
 

When the page is not in main memory, a page fault occurs – this is like a cache miss
 

The virtual memory manager then arranges to transfer the relevant pages into main memory, and updates the translation table
 

Also ensure that two programs do not share each others virtual address space
 

The exception to this is code/data that is shared between programs
 

The mapping is fully associative so the pages can go anywhere in main memory – This also helps reduce the rate of page faults
 

The virtual memory manager can allocate more than one virtual address to a single physical address – Two programs with distinct virtual address spaces can share the same code or data (e.g. a DLL)
 

Writing is always write-back, due to the time that writes to disk take
 

Can be a bit wasteful as it can only work with whole pages – Most modules will not fit exactly into an integer number of pages

19

All the management of virtual memory can be done in software, since the overhead is small compared to the time taken to access the disk

20

Conclusions
 

This lecture we have investigated the memory hierachy – The idea of caches, and different design choices that influence the way the cache works – Virtual memory as a way of organising main memory and disks into a unified memory space to allow larger programs to be written, and to allow the CPU to run multiple programs
 

Next lecture we will spend some time looking at how we design some of the modules we have been studying using Boolean logic gates

21


								
To top