Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

MCSIM A MIPS Architecture Cache Simulator by po6734

VIEWS: 344 PAGES: 16

									 School of Mathematics, Statistics, and
           Computer Science
           PO Box 600                                    Tel: +64 4 463 5341
           Wellington                                    Fax: +64 4 463 5045
         New Zealand                            Internet: office@mcs.vuw.ac.nz




                              MCSIM
        A MIPS Architecture Cache
               Simulator
                         Technical Reference
                          Version 1.0/May 2007

Author:
Dr. Pavle Mogin

Abstract
This document contains technical information about MCSIM Cache Simulator. The
simulator has been made to emulate direct mapped, m-way associative, and fully
associative caches of the RISC (reduced instruction set computer) architecture in
accordance with the book Computer Organization and Design – The
Hardware/Software Interface by D.A. Patterson and J. L. Hennessy. The simulator is
written in Java.


                                                                                     1
Table of Contents


        1. Who Made the Simulator | 3
        2. Supported MIPS Instructions | 3
        3. Pseudo Instructions | 3
        4. Data Declarations | 3
        5. Compilation of Programs | 4
        6. User’s Interface | 4
           6.1. Setup Screen | 5
           6.2. Execute Screen | 12
           6.3. Analyse Screen | 14
        7. How to Use the Simulator? | 12




                                             2
1.     Who Made the Simulator
         The MIPS Cache Simulator project was initiated at the start of 2006 as David
Kean’s BITT 489 project. David did the architectural design and initial coding of the
simulator. Mark Pritchard did further coding and error fixing at the end of 2006.
Andrew Mellanby did additional coding and debugging during the fist half of the year
2007. The project was supervised by Pavle Mogin form its inception till the issue of
the current version. Pavle also defined basic requirements that the simulator has to
satisfy, performed exhaustive testing and debugging.

2. Supported MIPS Instructions
   The simulator supports a relatively large number of MIPS instructions. The most
important instructions supported are:
R-Type:
ADD, SUB, AND, OR, SLL, SRL, SLT, SGE, MULT;
I-Type:
ADDI, SUBI, ANDI, ORI, SLLI, SRLI;
Branch:
BEQ, BNE;
Memory:
SW, LW.
    The MULT instruction produces results, which are maximally 1W (32 bits) long,
and accordingly accepts operands that are only half a word (16 bits) long.
    It should be noted that the simulator is not case sensitive. So, add and ADD are
both plausible op codes.
    There is a restriction, not present in the MIPS assembly language. The simulator
does not allow using register zero ($0) as the destination register of arithmetic and
logic instructions.
    Branch labels are supported. Data declaration is supported. Comments start
with the standard hash (#) sign. There is no practical restriction on the program
length.

3. Pseudo Instructions
    There are two special pseudo instructions supported. These are NOP and END. The
NOP pseudo instruction replaces an instruction that does nothing, like a real MIPS
instruction or $0, $0, $0.
     The END pseudo instruction is a replacement of the MIPS syscall with the
parameter value of 10. It denotes to the simulator that the program end has been
reached.
    There is also a system generated pseudo instruction INVALID. It is not intended
for use in MIPS programs. Its only aim is to clearly designate a wrong program
control design.

4. Data Declaration
        Data declaration feature allows declarative defining of values in main memory
locations. Each memory location has capacity of a four byte word. Data declaration
section comes at the beginning of a program. It starts with the compulsory reserved


                                                                                     3
word .data and ends with an optional .text word. The word .text designates
the start of program instructions.
        Data to be inserted into a sequence of n (n ≥ 1) memory locations follow a
reserved word .word. There can be more than one .word entry between .start
and .text markers. Data following a .word entry represent a comma separated list
of n decimal values. Each decimal value is stored into a separate memory location.
Storing data into memory locations starts from the memory address 0x0, and each
subsequent value from the comma separated list is stored in the following memory
location. If a .word entry with m comma separated decimal values follows a .word
entry with n comma separated values and the n-th value is stored in location with a
memory address q, the first element of the list with m elements will be stored in the
memory location with the address q + 1.
        The simulator executes declarative statements during compilation of a
program.
        Example 4.1. Suppose a program has the following declaration section
.data
.word 33, 44, 0
.word 66, 88
.text
After compilation, the main memory location will be
0x000    000000100001       (33)
0x004    000000101100       (44)
0x008    000000000000       (0)
0x00c    000000100010       (66)
0x010    000000101100       (88)
.

5. Compilation of Programs
The simulator compiles MIPS assembly programs. By compilation, the simulator:
•   Removes empty lines and comments in the source program,
•   Inserts memory addresses,
•   Computes branch offsets,
•   Replaces branch labels by branch offsets,
•   Executes data declaration section, and
•   Removes data declaration section from the compiled program.

6. User Interface
    The user interface of the MCSim Simulator has four screens:
•   Setup,
•   Execute, and
•   Analyse, and
•   Help.




                                                                                    4
6.1 The Setup Screen
       The Setup screen is presented in Figure 6.1. The Setup screen has three main
functions:
• Selecting a program,
• Enabling a cache (or caches) to use with the program and
• Setting up the caches enabled.
Program selection is performed in a usual manner using Browse button.
       The simulator is built with a supposition that the processor data_path contains
separate instruction and data memories and separate instruction and data caches. The
simulator can execute a program in four different modes:
•   Both caches disabled,
•   Instruction cache enabled and data cache disabled,
•   Data cache enabled and instruction cache disabled, and
•   Both caches enabled.
Both instruction and data cache can:
•   Have a cache size of:
       o 4 words,
       o 8 words,
       o 16 words,
       o 32 words,
       o 64 words, or
       o 128 words,
•   Have a block size of:
       o 1 word,
       o 2 words, or
       o 4 words,
•   Be of type:
       o Direct-Mapped,
       o Set-Associative, or
       o Fully-Associative
If a cache is set-associative, it can be either 2-way, or 4-way set associative. Also, if a
cache is either set-associative or fully-associative, it can apply one of the following
replacement algorithms:
•   First-In-Firs-Out (FIFO),
•   Random, or
•   Least-Recently-Used (LRU).
A replacement algorithm can be applied only on associative caches, because only
associative caches allow a memory address to be mapped into different cache slots. A
replacement algorithm determines in which of available cache slots to store a new
content. In the case of a fully-associative cache, all cache slots are available cache
slots. In the case of a set-associative cache, available cache slots are slots of the set
into which the considered memory address maps. The replacement algorithms decide
where to store a new content in the following way:




                                                                                          5
Figure 6.1.




              6
•   FIFO algorithm tries to store a new content into an unused available cache slot,
    first. If all available cache slots are occupied by a valid content, it replaces the
    oldest one.
•   Random algorithm stores a new content into a randomly chosen available cache
    slot, regardless whether the slot contains a valid content, or not.
•   LRU algorithm also tries to store a new content into an unused available cache
    slot, first. If all available cache slots are occupied by a valid content, it replaces a
    content that is not used for the longest time.
        Since the MCSim simulator does not support run time changes of the object
code, instruction memory and its cache do not support writes. Writing is the only
difference between instruction and data caches. Data cache supports the following two
update polices:
•   Write-Through and
•   Write-Back.
If a cache implements the write-through policy, then whenever a cache slot is updated,
the whole block is also copied into the main memory. This way cache and memory
contents are kept in accordance. The implemented write-through policy does not use
any write buffer to avoid data_path stalls due to memory writes.
        If a cache implements the write-back policy, then are the writes to a slot
accumulated in the slot until the slot has to be reused. Each slot of a cache that
implements write-back policy have an additional bit, called dirty bit. The dirty bit is
set when a write to the slot has been made to denote the need of copying it into the
main memory, when the slot has to be reused.
        The write behaviour of the main memory and cache system depends on the
cache block size.
6.1.1 Write Behaviour When Block Size is Greater Than One
        If the cache size is greater than one, then a cache slots contains more than one
memory word. Each data transfer between the cache and the main memory involves
whole multiword blocks. On the other hand, a MIPS sw instruction performs writing
of only one memory word. Prior to any write of a single word into a cache slot, the
memory block that contains the word to be written and all neighbouring words has to
be loaded into the slot. Otherwise, a subsequent read of a neighbouring word, or write
back of the block into memory, will result in an error.
       Example 6.2. Suppose cache block size is four, the content of register $1 is
77 (decimal), and first four main memory locations starting at address 0x000 contain
0x000    000000100001         (33)
0x004    000000101100         (44)
0x008    000000000000         (0)
0x00c    000000100010         (66)
Suppose:
• The memory block at the address 0x000 containing values (33, 44, 0,
   66) maps into the cache slot 0 that contains data values (0, 0, 0, 0),
• A program needs to execute the following instruction
sw $1, 8($0)




                                                                                               7
•   The memory block at the address 0x000 having (33, 44, 0, 66) is not
    transferred into the cache prior to writing.
After writing solely into cache word 2 of the cache slot 0, the cache slot 0 will
contain the following data values (0, 0, 77, 0). A subsequent
lw $2, 12($0)
will produce a READ HIT, but the new content of the register $2 will be 0, which is
wrong.
        After writing the content of the cache slot 0 into main memory, the new
content of the main memory block 0 will be
0x000    000000100001        (0)
0x004    000000101100        (0)
0x008    000000000000        (77)
0x00c    000000100010        (0)
.
Write-Through
       If a
sw $x n($y)
instruction needs to be executed, where x, y∈{0, 1,…, 31}, 0 ≤ n ≤ 216 – 1, and
n + $y maps into a slot (say slot z) with valid bit deasserted, or there is a tag
mismatch (valid = ⊥ ∨ cache.Slot ≠ address.Slot), then:
•   It is a write miss,
•   The memory block containing memory address n + $y is transferred into data
    cache slot z,
•   The content of register $x is written into an appropriate word of the slot z, and
•   The whole updated block is written back into the main memory.
        If n + $y maps into a slot (say slot z) with valid bit asserted and tags match
(valid = T ∧ cache.Slot = address.Slot), then:
•   It is a write hit,
•   The content of register $x is written into an appropriate word of the slot z, and
•   The whole updated block is written back into the main memory.
Writes into a cache slot and main memory address overlap.
       The net gain of the implemented write-through algorithm is that after writing a
word in the cache, the whole block remains in the cache available for subsequent
reads and writes.
Write-Back
        If a
sw $x n($y)
instruction needs to be executed, where x, y∈{0, 1,…, 31}, 0 ≤ n ≤ 216 – 1, and
n + $y maps into a slot (say slot z) with valid bit deasserted, or dirty bit is
deasserted and there is a tag mismatch (valid = ⊥ ∨ (dirty = ⊥ ∧ cache.Slot ≠
address.Slot)), then:


                                                                                        8
•   It is a write miss,
•   The main memory block containing a word with the memory address n + $y is
    transferred into the data cache slot z, and
•   The content of register $x is written into an appropriate word of the slot z, and
•   The dirty bit is asserted.
       If a
sw $x n($y)
or a
lw $x n($y)
instruction needs to be executed, and n + $y maps into a slot (say slot z) with valid
bit asserted and dirty bit asserted and there is a tag mismatch (valid = T ∧ dirty = T ∧
cache.Slot ≠ address.Slot), then:
•   It is a write or read miss, respectively,
•   The whole block at the slot z has to be written into the main memory,
•   The main memory block containing a word with the memory address n + $y is
    transferred into the data cache slot z,
•   The content of register $x is written into an appropriate word of the slot z (if sw),
    or an appropriate word of the slot z is transferred into register $x (if lw), and
•   The dirty bit is asserted (if sw).
        Finally, if sw $x n($y) and n + $y maps into a slot (say slot z) with
valid bit asserted and dirty bit asserted and tags match (valid = ⊥ ∧ dirty = ⊥ ∧
cache.Slot = address.Slot), then:
•   It is a write hit, and
•   The content of register $x is written into an appropriate word of the slot z.
        The net gain of the implemented write-back algorithm is that consecutive
writes to the same memory address do not incur memory write penalties, and the
whole block remains in the cache available for subsequent reads and writes.
6.1.2 Write Behaviour When Block Size is Equal to One
        If the cache size is equal to one, then a cache slots contains only one memory
word. Each data transfer between the cache and the main memory involves just a
word. A MIPS sw instruction writes only one memory word. Contrary to the case
when a block contains multiple words, in the case when block size is only one word,
there is no need to perform reading prior to any writing. Namely, the cache block will
contain only the word written.
       Example 6.2. Suppose cache block size is one, the content of register $1 is 77
(decimal), and first four main memory locations starting at address 0x000 contain
0x000    000000100001        (33)
0x004    000000101100        (44)
0x008    000000000000        (11)
0x00c    000000100010        (66)
Suppose:



                                                                                        9
•   The memory word at the address 0x004 containing value 11 maps into the
    cache slot 2 that contains data value 0,
•   A program needs to execute the following instruction
sw $1, 8($0)
•   The memory word at the address 0x004 having value 11 is not transferred into
    the cache prior to writing.
After writing solely into cache word 0 of the cache slot 2, the cache slot 2 will
contain data value 77. A subsequent
lw $2, 12($0)
will produce a READ HIT, and the new content of the register $2 will be 0, which is
correct.
        After writing the content of the cache slot 2 into main memory, the new
content of the four main memory locations starting at address 0x000 will be
0x000    000000100001        (33)
0x004    000000101100        (44)
0x008    000000000000        (77)
0x00c    000000100010        (66)
.
Write-Through
       If a
sw $x n($y)
instruction needs to be executed, then the following unconditionally occurs:
•   The content of register $x is written into the sole word 0 of the slot z, and
•   The updated word is written back into the main memory.
       The net gain of the implemented write-through algorithm is that after writing a
word in the cache, the word remains in the cache available for subsequent reads.
Write-Back
       If a
sw $x n($y)
or a
lw $x n($y)
instruction needs to be executed, where x, y∈{0, 1,…, 31}, 0 ≤ n ≤ 216 – 1, and
n + $y maps into a slot (say slot z) with the valid bit asserted and dirty bit asserted
and there is a tag mismatch (valid = T ∧ dirty = T ∧ cache.Slot ≠ address.Slot), then:
•   It is a write or read miss, respectively,
•   The word 0 of the slot z has to be written into the main memory,
•   The main memory word at the address n + $y is transferred into the data cache
    slot z,
•   The content of register $x is written into an appropriate word of the slot z (if sw),
    or an appropriate word of the slot z is transferred into register $x (if lw), and


                                                                                      10
•   The dirty bit is asserted (if sw).
       Otherwise, if sw $x n($y) and n + $y maps into a slot (say slot z) the
following unconditionally occurs:
•   The content of register $x is written into an appropriate word of the slot z, and
•   The dirty bit is asserted.
        The net gain of the implemented write-back algorithm is that consecutive
writes to the same memory address do not incur memory write penalties, and the word
written remains in the cache available for subsequent reads.




                                                                                        11
6.2    The Execute Screen
       The Execute screen is given in Figure 6.3. The Execute screen contains:
•   Instruction Memory,
•   Instruction Cache,
•   Data Memory,
•   Data Cache, and
•   Register File
panes. Each pane displays address and data information of the corresponding memory
unit. Instruction memory contains the object version of a program. The Source button
allows viewing the source version of the program in a separate pop-up pane, as in
Figure 6.2.




                                        Figure 6.2.
    The Execute screen also contains fields for:
•   Program Counter,
•   Instruction Cache status, and
•   Data Cache status.
The program Counter field contains the address of the instruction to be executed next.
The instruction to be executed next is also highlighted in the Instruction Memory
pane.
       Cache status fields (located below the Instruction memory and Data Memory
panes) communicate information whether a cache is enabled, or disabled, and if
enabled whether a cache hit, or miss occurred. If there was no cache activity during
the execution of the last instruction, the field is blank.
       Buttons:
•   Step,
•   Execute N Steps,
•   Execute to Compilation, and
•   Reset
are provided for controlling the execution of programs. Step button is particularly
useful, since it allows executing a program instruction by instruction.
        Looking at the panes and fields during a stepwise execution of a program
enables insight into actions performed by the data path and, particularly various
memory units.


                                                                                      12
Figure 6.3.



              13
6.3 The Analyse Screen
        The Analyse screen is given in Figure 6.4. The Analyse screen displays
performance figures of a program execution and statistics regarding activities of
various memory units. Performance figures of a program execution are displayed in
the following fields:
•   Instruction, which counts the number of program instructions executed so far, and
•   Processor Cycles, which counts the number of processor cycles used to execute
    program instructions.
The simulator also monitors and displays in the Analyse screen fields the following
statistics regarding activities of memory units:
•   Memory Updates, which displays the number writes to Data Memory,
•   Read Hits, and
•   Read Misses,
for both Instruction and Data memories, and
•   Write Hits, and
•   Write Misses
solely for Data Memory. Hits and Misses are monitored for each slot separately and
for each cache in total. Also, there are fields named
•   Data Cache Cycles and
•   Instruction Cache Cycles.
These two fields contain the cumulative number of processor cycles, spent on cache
reads, cache writes, and miss penalties.
        The read/write prices are the following:
•   Cache read/write hit/miss = 1 processor cycle,
•   One word per block memory read/write = 10 processor cycles,
•   Two words per block memory/read/write = 12 processor cycles, and
•   Four words per block memory read/write = 16 processor cycles.
        The price of a miss penalty depends on the cache block size. For a block size
b=2, the simulator mimics memory bursting of the type 10/2. While for the cache
block size b = 4, the simulator mimics memory bursting of the type 10/2/2/2.
        The Analyse screen appears in parallel with the Execute screen allowing a
comfortable and simultaneous tracking of program behaviour and various
performance figures and statistics.




                                                                                      14
Figure 6.4.



              15
7.   How to Use the MCSim Cache Simulator
    Store a MIPS assembly program prepared for running by MCSim Cache
Simulator in your private directory.
    We have a command line interface to MCSim Cache Simulator, so you need to
run it from a UNIX prompt in a shell window. To enable the various applications
required, first type
> need spim
    You may wish to add the “need spim” command to your .cshrc file so that it is
run automatically.
    To start MCSim Cache Simulator type:
>mcsim
     To execute a program, perform the following procedure:
1. Use Setup Screen to load your program into MCSim (browse to your program if
     needed).
2. Setup caches needed:
      • Cache Size,
      • Block Size,
      • Cache Type,
      • Cache Associativity (if applicable),
      • Replacement Algorithm (if applicable), and
      • Update Policy (if applicable)
3. Use Execute Screen to execute your program.
4. Step through your program several times to perceive cache/memory and
     cache/Register File interactions.
5. Monitor performance figures and cache system statistics, as you progress through
     your program.
6. Write down your remarks and conclusions.
It is advisable to step through your program first, to see how the whole system
behaves, and then to run it complition.




                                                                                  16

								
To top