Chapter Five The Processor : Datapath and Control by 7a9APMg

VIEWS: 19 PAGES: 124

									           Chapter Five

The Processor : Datapath and Control




                             2004 Morgan Kaufmann Publishers   1
2004 Morgan Kaufmann Publishers   2
Outline

 • 5.1 Introduction
 • 5.2 Logic Design Conventions
 • 5.3 Building a Datapath
 • 5.4 A simple Implementation Scheme
 • 5.5 A Multicycle Implementation
 • 5.6 Exceptions
 • 5.9 Real Stuff: The Organization of Recent Pentium
 • 5.10 Fallacies and Pitfalls
 • 5.11 Concluding Remarks
 • 5.12 Historical Perspective and Further Reading



                                                     2004 Morgan Kaufmann Publishers   3
5.1   Introduction




                     2004 Morgan Kaufmann Publishers   4
The Processor: Datapath & Control

 •   We're ready to look at an implementation of the MIPS
 •   Simplified to contain only:
      – memory-reference instructions: lw, sw
      – arithmetic-logical instructions: add, sub, and, or, slt
      – control flow instructions: beq, j
 •   Generic Implementation:
      –   use the program counter (PC) to supply instruction address
      –   get the instruction from memory
      –   read registers
      –   use the instruction to decide exactly what to do
 •   All instructions use the ALU after reading the registers
           Why? memory-reference? arithmetic? control flow?




                                                           2004 Morgan Kaufmann Publishers   5
More Implementation Details

 •   Abstract / Simplified View:




 •   Two types of functional units:
      – elements that operate on data values (combinational)
      – elements that contain state (sequential)

                                                          2004 Morgan Kaufmann Publishers   6
FIGURE 3.14 MIPS architecture revealed thus far
               MIPS assembly language
Category     Instruction            Example                  Meaning                    Comments
             add                    add $s1, $s2, $s3        $s1 = $s2 + $s3            Three operands; Overflow detected

             subtract               sub $s1, $s2, $s3        $s1 = $s2 - $s3            Three operands; Overflow detected
             add immediate          addi $s1, $s2, 100       $s1 = $s2+ 100             + constants; overflow detected
             add unsigned           addu $s1, $s2, $s3       $s1 = $s2 + $s3            Three operands; overflow undetected
             subtract unsigned      subu $s1, $s2, $s3       $s1 = $s2 - $s3            Three operands; overflow undetected
             add immediate          addiu $s1, $s2, 100      $s1 = $s2+ 100             + constants; overflow detected
             unsigned
             move from              mfc0 $s1, $epc           $s1 = $epc                 Copy Exception PC + special regs
Arithmetic   coprocessor register
             multiply               mult    $s2, $s3         Hi, Lo = $s2 x $s3         64-bit signed product in Hi, Lo
             multiply unsigned      multu     $s2, $s3       Hi, Lo = $s2 x $s3         64-bit unsigned product in Hi, Lo
             divide                 div    $s2, $s3          Lo = $s2 / $s3             Lo = quotient, Hi = remainder
                                                             Hi = $s2 mod $s3
             divide unsigned        divu    $s2, $s3         Lo = $s2 / $s3             Unsigned quotient and remainder
                                                             Hi = $s2 mod $s3
             move from Hi           mfhi    $s1              $s1 = Hi                   Used to get copy of Hi
             move from Lo           mflo    $s1              $s1 = Lo                   Used to get copy of Lo
             load word              lw      $s1, 100($s2)    $s1 = Memory [$s2 + 100]   Word from memory to register
             store word             sw      $s1, 100 ($s2)   Memory [$s2 + 100] = $s1   Word from register to memory
             load half unsigned     lh     $s1, 100($s2)     $s1 = Memory [$s2 + 100]   Halfword memory to register
Data
             store half             sh     $s1, 100 ($s2)    Memory [$s2 + 100] = $s1   Halfword register to memory
transfer
             load byte unsigned     lb     $s1, 100($s2)     $s1 = Memory [$s2 + 100]   Byte from memory to register
             store byte
             load upper immed.
                                    sb
                                    lui
                                           $s1, 100 ($s2)
                                           $s1, 100
                                                             Memory [$s2 + 100] = $s1
                                                             $s1 = 100 * 2^16
                                                                                        Byte from register to memory
                                                                                                  2004 Morgan Kaufmann Publishers
                                                                                        Loads constant in upper 16 bits
                                                                                                                                     7
 Continue..
            and                   add $s1, $s2, $s3     $s1 = $s2 & $s3           Three reg. operands; bit-by-bit AND

            or                    or    $s1, $s2, $s3   $s1 = $s2 | $s3           Three reg. operands; bit-by-bit OR

            nor                   nor   $s1, $s2, $s3   $s1 = ~($s2 | $s3)        Three reg. operands; bit-by-bit NOR
Logical
            and immediate         andi $s1, $s2, 100    $s1 = $s2 & 100           Bit-by-bit AND reg with constant

            or immediate          ori   $s1, $s2, 100   $s1 = $s2 | 100           Bit-by-bit OR reg with constant

            shift left logical    sll   $s1, $s2, 10    $s1 = $s2 << 10           Shift left by constant

            shift right logical   srl   $$s1, $s2, 10   $s1 = $s2 >> 10           Shift right by constant

            branch on equal       beq $s1, $s2, 25      if ($s1 == $s2) go to     Equal test; PC-relative branch
                                                        PC+4+100
            branch on not equal   bne $s1, $s2, 25      if ($s1 != $s2) go to L   Not equal test; PC-relative
                                                        PC+4+100
            set on less than      slt   $s1, $s2, $s3   if ($s2 < $s3) $s1 = 1;   Compare less than; two‟s complement
Condition                                               else $s1 = 0
al branch   set on less than      slt   $s1, $s2, 100   if ($s2 < 100) $s1 = 1;   Compare < constant;
            immediate                                   else $s1 = 0              Two‟s complement
            set less than         sltu $s1, $s2, $s3    if ($s2 < $s3) $s1 = 1;   Compare less than; natural numbers
            unsigned                                    else $s1 = 0
            set less than         sltuiu $s1, $s2,      if ($s2 < 100) $s1 = 1;   Compare< constant;
            immediate unsigned    100                   else $s1 = 0              natural numbers
            jump                  j      2500           go to 10000               Jump to target address
Unconditi   jump register         jr     $ra            go to $ra                 For switch, procedure return
onal jump
            jump and link         jal   2500            $ra = PC + 4; go to                2004 Morgan
                                                                                  For procedure call Kaufmann Publishers   8
                                                        10000
                      MIPS floating-point machine language
Name         Format                        Example                           Comments
add.s          R        17       16        6        4        2        0      add.s $f2, $f4, $f6
sub.s          R        17       16        6        4        2        1      sub.s $f2, $f4, $f6
mul.s          R        17       16        6        4        2        2      mul.s $f2, $f4, $f6
div.s          R        17       16        6        4        2        3      div.s   $f2, $f4, $f6
add.d          R        17       17        6        4        2        0      add.d $f2, $f4, $f6
sub.d          R        17       17        6        4        2        1      sub.d $f2, $f4, $f6
mul.d          R        17       17        6        4        2        2      mul.d $f2, $f4, $f6
div.d          R        17       17        6        4        2        3      div.d $f2, $f4, $f6
lwc1            I       49       20        2                100              lwc1 $f2, $f4, $f6
swc1            I       57       20        2                100              sec1 $f2, $f4, $f6
bc1t            I       17        8        1                 25              bc1t    25
bc1f            I       17        8        0                 25              bc1f    25
c. lt. s       R        17       16        4        2        0       60      c. lt. s $f2, $f4
c. lt. d       R        17       17        4        2        0       60      c. lf. d $f2, $f4
Field size             6 bits   5 bits   5 bits   5 bits   5 bits   6 bits   ALL MIPS instructions 32 bits

                                                                                           2004 Morgan Kaufmann Publishers   9
Figure 5.2 The basic implementation of the MIPS subset
including the necessary multiplexers and control lines.




                                                          2004 Morgan Kaufmann Publishers   10
5.2   Logic Design Conventions




                           2004 Morgan Kaufmann Publishers   11
Keywords

• Clocking methodology The approach used to determine when
  data is valid and stable relative to the clock.

• Edge-triggered clocking A clocking scheme in which all state
  changes occur on a clock edge.

• Control signal A signal used for multiplexer selection or for
  directing the operation of a function unit; contrasts with a data
  signal, which contains information that is operated on by a
  functional unit.




                                                          2004 Morgan Kaufmann Publishers   12
State Elements

•   Unclocked vs. Clocked
•   Clocks used in synchronous logic
     – when should an element that contains state be updated?

                                               Falling edge




              Clock period       Rising edge
               cycle time




                                                              2004 Morgan Kaufmann Publishers   13
An unclocked state element

•   The set-reset latch
     – output depends on present inputs and also on past inputs



              R
                                       Q




                                       Q
               S




                                                        2004 Morgan Kaufmann Publishers   14
Latches and Flip-flops

 •   Output is equal to the stored value inside the element
           (don't need to ask for permission to look at the value)
 •   Change of state (value) is based on the clock
 •   Latches: whenever the inputs change, and the clock is asserted
 •   Flip-flop: state changes only on a clock edge
           (edge-triggered methodology)


                                               "logically true",
                                               — could mean electrically low


     A clocking methodology defines when signals can be read and written
     — wouldn't want to read a signal at the same time it was being written




                                                                  2004 Morgan Kaufmann Publishers   15
D-latch

 •   Two inputs:
      – the data value to be stored (D)
      – the clock signal (C) indicating when to read & store D
 •   Two outputs:
      – the value of the internal state (Q) and it's complement



          C
                                               D
                                      Q
                                               C


                                               Q
                                      _
                                      Q
         D




                                                             2004 Morgan Kaufmann Publishers   16
D flip-flop

 •   Output changes only on the clock edge


                    D    D           Q       D           Q
                               D                   D          Q
                             latch               latch
                                                              Q
                         C                   C           Q


                    C




                D


                C


                Q




                                                             2004 Morgan Kaufmann Publishers   17
Our Implementation

•   An edge triggered methodology
•   Typical execution:
     – read contents of some state elements,
     – send values through some combinational logic
     – write results to one or more state elements



                     State                             State
                    element    Combinational logic    element
                       1                                 2




      Clock cycle




                                                         2004 Morgan Kaufmann Publishers   18
Figure 5.4 An edge-triggered methodology allows a state element to be read and
written in the same clock cycle without creating a race that could lead to
indeterminate data values.




                                                                  2004 Morgan Kaufmann Publishers   19
5.3   Building a Datapath




                            2004 Morgan Kaufmann Publishers   20
Keywords

 • Datapath element A functional unit used to operate on or hold
   data within a processor. In the MIPS implementation the datapath
   elements include the instruction and data memories, the register
   file, the arithmetic logic unit (ALU), and adders.

 • Program counter (PC) The register containing the address of
   the instruction in the program being executed.

 • Register file A state element that consists of a set of registers
   that can be read and written by supplying a register number to be
   accessed.

 • Sign-extend To increase the size of a data item by replicating
   the high-order sign bit of the original data item in the high-order
   bits of the larger, destination data item.


                                                          2004 Morgan Kaufmann Publishers   21
Keywords

• Branch target address The address specified in a branch,
  which becomes the new program counter (PC) if the branch is
  taken. In the MIPS architecture the branch target is given by the
  sum of the offset field of the instruction and the address of the
  instruction following the branch.
• Branch taken A branch where the branch condition is satisfied
  and the program counter (PC) becomes the branch target. All
  unconditional branches are taken branches.
• Branch not taken A branch where the branch condition is false
  and the program counter (PC) becomes the address of the
  instruction that sequentially follows the branch.
• Delayed branch A type of branch where the instruction
  immediately following the branch is always executed, independent
  of whether the branch condition is true or false.


                                                        2004 Morgan Kaufmann Publishers   22
Register File

 •   Built using D flip-flops               Read register
                                             number 1
                                                             Register 0

                                                             Register 1
                                                                             M
                                                                 ...         u               Read data 1
                                                                             x
        Read register                                       Register n – 2
        number 1                    Read                    Register n – 1
        Read register              data 1
        number 2
                   Register file            Read register
        Write                       Read     number 2
        register                   data 2
        Write
        data          Write                                                  M
                                                                             u               Read data 2
                                                                             x




                         Do you understand? What is the “Mux” above?

                                                                                 2004 Morgan Kaufmann Publishers   23
Abstraction

•   Make sure you understand the abstractions!
•   Sometimes it is easy to think you do, when you don‟t
                                              Select



                                       A31
                                                M
                 Select                         u          C31
                                                x
                                       B31
            32
        A
                   M
                   u      32           A30
                               C                M
            32     x                            u          C30
        B
                                                x      .
                                                       .
                                       B30             .
                                                .
                                                .
                                                .


                                        A0
                                                M
                                                u          C0
                                                x
                                        B0

                                                                 2004 Morgan Kaufmann Publishers   24
Register File

 •   Note: we still use the real clock to determine when to write
                    Write


                                                     C
                                       0
                                       1                  Register 0

                             n-to-2n       .         D
          Register number                  .
                                           .
                            decoder
                                                     C
                                                          Register 1
                                n–1
                                                     D
                                       n

                                                               .
                                                               .
                                                               .


                                                     C
                                                         Register n – 2
                                                     D

                                                     C
                                                         Register n – 1
            Register data                            D


                                                                   2004 Morgan Kaufmann Publishers   25
Simple Implementation

•   Include the functional units we need for each instruction




          Instruction
          address

                        Instruction          PC                 Add Sum

          Instruction
           memory

          a. Instruction memory       b. Program counter        c. Adder




                                                           2004 Morgan Kaufmann Publishers   26
          MemWrite

              Read
Address        data
                      16                32
                              Sign
           Data              extend
Write     memory
data

          MemRead

a. Data memory unit   b. Sign-extension unit




                                        2004 Morgan Kaufmann Publishers   27
           5   Read                                                  ALU operation
               register 1                                     4
                                          Read
Register   5                             data 1
               Read
numbers        register 2                                                  Zero
                                                  Data            ALU ALU
           5                Registers
               Write                                                 result
               register                   Read
                                         data 2
   Data        Write
               Data
                                 RegWrite

                          a. Registers                               b. ALU



                                                   Why do we need this stuff?
                                                                  2004 Morgan Kaufmann Publishers   28
Figure 5.10 The datapath for the memory instructions and the
R-type instructions.




                                              2004 Morgan Kaufmann Publishers   29
Building the Datapath
 •   Use multiplexors to stitch them together
                                                                                           PCSrc


                                                                                                   M
                      Add                                                                          u
                                                                                                   x
                                                                                   ALU
            4                                                               Add
                                                                                  result
                                                               Shift
                                                               left 2

                              Read                           ALUSrc          ALU operation
            Read              register 1                                4
      PC    address                            Read                                                     MemWrite
                                              data 1
                              Read                                                                                  MemtoReg
                              register 2                                       Zero
                Instruction                                             ALU ALU
                                     Registers Read                                                       Read
                              Write                                                         Address
                                               data 2                      result                          data          M
             Instruction      register                         M
              memory                                           u                                                         u
                                                               x                                                         x
                              Write
                              data                                                                       Data
                                                                                            Write       memory
                               RegWrite                                                     data

                                      16                32                                 MemRead
                                            Sign
                                           extend




                                                                                                    2004 Morgan Kaufmann Publishers   30
5.4   A Simple Implementation Scheme




                              2004 Morgan Kaufmann Publishers   31
Keywords

• Don‟t-care term An element of a logic function in which the
  output does not depend on the values of all the inputs. Don’t-care
  terms may be specified in different ways.

• Opcode The field that denotes the operation and format of an
  instruction.

• Single-cycle implementation Also called single clock cycle
  implementation. An implementation in which an instruction is
  executed in one clock cycle.




                                                        2004 Morgan Kaufmann Publishers   32
Control

•   Selecting the operations to perform (ALU, read/write, etc.)
•   Controlling the flow of data (multiplexor inputs)
•   Information comes from the 32 bits of the instruction
•   Example:

          add $8, $17, $18           Instruction Format:

         000000    10001     10010    01000   00000 100000

            op      rs        rt       rd     shamt     funct


•   ALU's operation based on instruction type and function code




                                                                2004 Morgan Kaufmann Publishers   33
Control

•   e.g., what should the ALU do with this instruction
•   Example: lw $1, 100($2)

           35      2       1           100


          op       rs      rt       16 bit offset

•   ALU control input

         0000    AND
         0001    OR
         0010    add
         0110    subtract
         0111    set-on-less-than
         1100    NOR

•   Why is the code for subtract 0110 and not 0011?
                                                         2004 Morgan Kaufmann Publishers   34
Figure 5.12 How the ALU control bits are set depends on the ALUOp
control bits and the different function codes for the R-type instruction.



                                                                                  ALU
  Instruction                 Instruction   Funct     Desired ALU
                ALUOp                                                           control
    Opcode                     operation     field      action
                                                                                 input
 LW               00     Load word          XXXXXX   Add                          0010
 SW               00     Store word         XXXXXX   Add                          0010
 Branch equal     01     Branch equal       XXXXXX   Subtract                      0110
 R-type           10     Add                100000   Add                          0010
 R-type           10     subtract           100010   Subtract                      0110
 R-type           10     AND                100100   And                          0000
 R-type           10     OR                 100101   Or                           0001
 R-type           10     Set on less than   101010   Set on less than              0111




                                                                 2004 Morgan Kaufmann Publishers   35
Control

•   Must describe hardware to compute 4-bit ALU control input
     – given instruction type
                 00 = lw, sw               ALUOp
                 01 = beq,                 computed from instruction type
                 10 = arithmetic
     – function code for arithmetic

•   Describe it using a truth table (can turn into gates):




                                                             2004 Morgan Kaufmann Publishers   36
Figure B.5.9 A 1-bit ALU that performs AND, OR, and
addition on a and b or a and b.




                                             2004 Morgan Kaufmann Publishers   37
FIGURE B.5.10 (Top) A 1-bit ALU that performs AND,
OR, and addition on a and b or b.




                                            2004 Morgan Kaufmann Publishers   38
FIGURE B.5.10 (bottom) a 1-bit ALU for the most
significant bit.




                                              2004 Morgan Kaufmann Publishers   39
FIGURE B.5.11 A 32-bit ALU constructed from the 31 copies of the 1-bit
ALU in the top of Figure B.5.10 and one 1-bit ALU in the bottom of that figure.




                                                           2004 Morgan Kaufmann Publishers   40
FIGURE B.5.12 The final 32-bit ALU. This adds a Zero
detector to Figure B.5.11.




                                              2004 Morgan Kaufmann Publishers   41
FIGURE B.5.13 The values of the three ALU control lines
Bnegate and Operation and the corresponding ALU operations.




             ALU control lines        Function
                   0000                 AND
                   0001                  OR
                   0010                  add
                   0110                subtract
                   0111            set-on-less-than
                   1100                 NOR



                                                  2004 Morgan Kaufmann Publishers   42
FIGURE B.5.14 The symbol commonly used to represent an
ALU, as shown in FigureB.5.12.




                                           2004 Morgan Kaufmann Publishers   43
Figure 5.14 The three instruction classes (R-tape, load and
store, and branch) use two different instruction formats.


   Field                    0      rs      rt      rd      shamt             funct
   Bit positions           31:26   25:21   20:16   15:11     10:6               5:0
   a. R-type instruction



   Field              35 or 43     rs      rt              address
   Bit positions           31:26   25:21   20:16            15:0
   b. Load or store instruction


   Field                    4      rs      rt              address
   Bit positions           31:26   25:21   20:16            15:0

   c. Branch instruction


                                                             2004 Morgan Kaufmann Publishers   44
Figure 5.15 The datapath of Figure 5.12 with all necessary
multiplexors and all control lines identified




                                               2004 Morgan Kaufmann Publishers   45
Control

•       Simple combinational logic (truth tables)
                                                                        Inputs
                                                                         Op5
                                                                         Op4
                                                                         Op3

                    ALUOp                                                Op2
                                                                         Op1
                               ALU control block
                                                                         Op0
                      ALUOp0
                      ALUOp1

                                                                                                                  Outputs
                                               Operation2                  R-format   Iw   sw   beq
               F3                                                                                                 RegDst
                                                            Operation
               F2                              Operation1                                                         ALUSrc
    F (5– 0)
               F1                                                                                                 MemtoReg
                                               Operation0
                                                                                                                  RegWrite
               F0
                                                                                                                  MemRead
                                                                                                                  MemWrite

                                                                                                                  Branch
                                                                                                                  ALUOp1

                                                                                                                  ALUOpO




                                                                                                2004 Morgan Kaufmann Publishers   46
Our Simple Control Structure

 •   All of the logic is combinational
 •   We wait for everything to settle down, and the right thing to be done
      – ALU might not produce “right answer” right away
      – we use write signals along with clock to determine when to write
 •   Cycle time determined by length of the longest path


                          State                             State
                         element     Combinational logic   element
                            1                                 2




           Clock cycle




      We are ignoring some details like setup and hold times
                                                                     2004 Morgan Kaufmann Publishers   47
Single Cycle Implementation

•   Calculate cycle time assuming negligible delays except:
     – memory (200ps),
       ALU and adders (100ps),
       register file access (50ps)
                                                                                                     PCSrc


                                                                                                             M
                                Add                                                                          u
                                                                                                             x
                                                                                             ALU
                      4                                                               Add
                                                                                            result
                                                                         Shift
                                                                         left 2

                                        Read                           ALUSrc          ALU operation
                      Read              register 1                                4
                 PC   address                            Read                                                    MemWrite
                                                        data 1
                                        Read                                                                              MemtoReg
                                        register 2                                       Zero
                          Instruction                                             ALU ALU
                                               Registers Read                                                     Read
                                        Write                                                         Address
                                                         data 2                      result                        data      M
                       Instruction      register                         M
                        memory                                           u                                                   u
                                                                         x                                                   x
                                        Write
                                        data                                                                      Data
                                                                                                      Write      memory
                                         RegWrite                                                     data

                                                16                32                                 MemRead
                                                      Sign
                                                     extend




                                                                                                                     2004 Morgan Kaufmann Publishers   48
Figure 5.16 The effect of each of the seven control signals.

 Signal
            Effect when deasserted                           Effect when asserted
 name
 RegDst     The register destination number for the Write    The register destination number for the
            register comes from the rt field (bits 20:16).   Write register comes from the rd field
                                                             (bits 15:11).

 RegWrite   None.                                            The register on the Write register input
                                                             is written with the value on the Write
                                                             data input.
 ALUSrc     The second ALU operand comes from the            The second ALU operand is the sign-
            second register file output (Read data 2).       extended, lower 16 bits of the
                                                             instruction.
 PCSrc      The PC is replaced by the output of the          The PC is replaced by the output of the
            adder that computes the value of PC+4.           adder that computed the branch target.
 MEmRead    None.                                            Data memory contents designated by
                                                             the address input are put on the Read
                                                             data output.
 MemWrite   None.                                            Data memory contents designated by
                                                             the address input are replaced by the
                                                             value on the Write data input.
 MemtoReg   The value fed to the register Write data input   The value fed to the register Write data
            comes from the ALU.                              input comes from the data memory.
                                                                              2004 Morgan Kaufmann Publishers   49
Figure 5.17 The simple datapath with the control unit.




                                                2004 Morgan Kaufmann Publishers   50
Figure 5.18 The setting of the control lines is completely
determined by the opcode fields of the instruction.




                               Memto- Reg Mem Mem
     Instruction RegDst ALUSrc  Reg   Write Read Write Branch ALUOp1 ALUp0
    R-format       1      0      0     1     0    0       0      1     0
    lw             0      1      1     1     1    0       0      0     0
    sw             X      1      X     0     0    1       0      0     0
    beq            X      0      X     0     0    0       1      0     1




                                                             2004 Morgan Kaufmann Publishers   51
Figure 5.19 The datapath in operation for an R-type instruction
such as add $t1, $t2, $t3.




                                                2004 Morgan Kaufmann Publishers   52
Figure 5.20 The datapath in operation for a load instruction.




                                                 2004 Morgan Kaufmann Publishers   53
Figure 5.21 The datapath in operation for a branch equal
instruction.




                                               2004 Morgan Kaufmann Publishers   54
Figure 5.22 The control function for the simple single-cycle
implementation is completely specified by this truth table.
         Input or output   Signal name   R-format   lw   sw       beq
         Inputs            Op5              0       1    1          0
                           Op4              0       0    0          0
                           Op3              0       0    1          0
                           Op2              0       0    0          1
                           Op1              0       1    1          0
                           Op0              0       1    1          0
         Outputs           RegDst           1       0    X          X
                           ALUSrc           0       1    1          0
                           MemtoReg         0       1    X          X
                           RegWrite         1       1    0          0
                           MemRead          0       1    0          0
                           MemWrite         0       0    1          0
                           Branch           0       0    0          1
                           ALUOp1           1       0    0          0
                           ALUOp0           0       0    0          1
                                                              2004 Morgan Kaufmann Publishers   55
Figure 5.23 Instruction format for the jump instruction
(opcode = 2).




     Field           000010                address
     Bit positions   31:26                  25:0




                                                     2004 Morgan Kaufmann Publishers   56
Figure 5.24 The simple control and datapath are extended to
handle the jump instruction.




                                              2004 Morgan Kaufmann Publishers   57
Problem: Performance of Single-Cycle Machines (p.315)
   Assume that the operation times for the major functional units in this
   implementation are the following:

         Memory units: 200 picoseconds (ps)
         ALU and adders: 100 ps
         Register file (read or write): 50 ps

   Assume that the multiplexors, control unit, PC accesses, sign extension unit, and
   wires have no delay, which of the following implementations would be faster and by
   how much?

         1. An implementation in which every instruction operates in 1 clock cycle of a
         fixed length.
         2. An implementation where every instruction executes in 1 clock cycle
         using a variable-length clock, which for each instruction is only as long as it
         needs to be.

   To compare the performance, assume the following instruction mix: 25% loads, 10%
   stores, 45% ALU instructions, 15% branches, and 5% jumps.


                                                                        2004 Morgan Kaufmann Publishers   58
•   Let‟s start by comparing the CPU execution times.
     CPU execution time  Instructio count CPI  Clock cycle time
                                    n
    Since CPI must be 1, we can simplify this to
     CPU execution time  Instructio count Clock cycle time
                                    n
•   The critical path for the different instruction classes is as follows:

    Instruction class                  Functional units used by the instruction class
    R-type              Instruction fetch   Register access   ALU   Register access

    Load word           Instruction fetch   Register access   ALU   Memory access       Register access

    Store word          Instruction fetch   Register access   ALU   Memory access

    Branch              Instruction fetch   Register access   ALU

    Jump                Instruction fetch




                                                                               2004 Morgan Kaufmann Publishers   59
•   Using these critical paths, we can compute the required length for
    each instruction class:
     Instruction   Instruction   Register     ALU        Data    Register
                                                                            Total
     class           memory       read      operation   memory    write
     R-type           200          50         100         0        50       400ps

     Load word        200          50         100        200       50       600ps

     Store word       200          50         100        200                550ps

     Branch           200          50         100         0                 350ps

     Jump             200                                                   200ps



•   Thus, the average time per instruction with a variable clock is
    CPU clock cycle  600  25 %  550 10 %  400  45 %  350 15 %  200  5%
                     447 .5 ps



                                                                            2004 Morgan Kaufmann Publishers   60
•   Since the variable clock implementation has a shorter average clock
    cycle, it is clearly faster. Let‟s find the performance ratio:



     CPU performanc variableclock CPU execution time single clock
                  e
                                 
                   e
     CPU performanc single clock CPU execution time variableclock
        IC  CPU clock cyclesingle clock    CPU clock cyclesingle clock 
      IC  CPU clock cycle                                            
                            variableclock   CPU clock cyclevariableclock 
                                                                          
         600
              1.34
       447.5




                                                                2004 Morgan Kaufmann Publishers   61
5.5   A Multicycle Implementation




                             2004 Morgan Kaufmann Publishers   62
Keywords

• Multicycle implementation Also called multiple clock cycle
  implementation. An implementation in which and instruction is
  executed in multiple clock cycles.

• Microprogramming A symbolic representation of control in the
  form of instructions, called microinstructions, that are executed on
  a simple micromachine.

• Finite state machine A sequential logic function consisting of a
  set of inputs and outputs, a next-state function that maps the
  current state and the inputs to a new state, and an output function
  that maps the current state and possibly the input to a set of
  asserted outputs.

• Next-state function A combinational function that, given the
  inputs and the current state, determines the next state of a finite
  state machine.
                                                          2004 Morgan Kaufmann Publishers   63
Where we are headed

•   Single Cycle Problems:
     – what if we had a more complicated instruction like floating
       point?
     – wasteful of area
•   One Solution:
     – use a “smaller” cycle time
     – have different instructions take different numbers of cycles
     – a “multicycle” datapath:




                                                           2004 Morgan Kaufmann Publishers   64
Multicycle Approach

•   We will be reusing functional units
     – ALU used to compute address and to increment PC
     – Memory used for instruction and data
•   Our control signals will not be determined directly by instruction
     – e.g., what should the ALU do for a “subtract” instruction?
•   We‟ll use a finite state machine for control




                                                            2004 Morgan Kaufmann Publishers   65
Multicycle Approach

 •   Break up the instructions into steps, each step takes a cycle
      – balance the amount of work to be done
      – restrict each cycle to use only one major functional unit
 •   At the end of a cycle
      – store values for use in later cycles (easiest thing to do)
      – introduce additional “internal” registers




                                                            2004 Morgan Kaufmann Publishers   66
Figure 5.27 The multicycle datapath from Figure 5.26 with the
control lines shown.




                                               2004 Morgan Kaufmann Publishers   67
Figure 5.28 The complete datapath for the multicycle
implementation together with the necessary control lines.




                                                2004 Morgan Kaufmann Publishers   68
Figure 5.29 The action caused by the setting of each control
signal in Figure 5.28 on page 323.

                            Actions of the 1-bit control signals
Signal name   Effect when deasserted                      Effect when asserted
RegDst        The register file destination number for    The register file destination number for the Write register
              the Write register comes from the rt field. comes from the rd field.
RegWrite      None.                                       The general-purpose register selected by the Write register
                                                          number is written with the value of the Write data input.
ALUSrcA       The first ALU operand is the PC.            The first ALU operand comes from the A register.

MemRead       None.                                       Content of memory at the location specified by the address
                                                          input is put on Memory data output.
MemWrite      None.                                       Memory contents at the location specified by the address
                                                          input is replaced by value on Write data input.
MemtoReg      The value fed to the register file Write    The value fed to the register file Write data input comes from
              data input comes from ALUOut.               the MDR.
IorD          The PC is used to supply the address to     ALUOut is used to supply the address to the memory unit.
              the memory unit.
IRWrite       None.                                       The output of the memory is written into the IR.

PCWrite       None.                                       The PC is written; the source is controlled by PCSource.

PCWriteCond   None.                                       The PC is written is the Zero output from the ALU is also
                                                          active.

                                                                                             2004 Morgan Kaufmann Publishers   69
Continue…



                            Actions of the 2-bit control signals
 Signal      Value     Effect
 name       (binary)
 ALUOp        00       The ALU performs an add operation.

              01       The ALU performs a subtract operation.

              10       The funct field of the instruction determines the ALU operation.

 ALUSrcB      00       The second input to the ALU comes from the B register.

              01       The second input to the ALU is the constant 4.

              10       The second input to the ALU is the sign-extend, lower 16 bits of the IR.

              11       The second input to the ALU is the sign-extended, lower 16 bits of the IR shifted left 2 bits.

 PCSource     00       Output of the ALU (PC+4) is sent to the PC for writing.

              01       The contents of ALUOut (the branch target address) are sent to the PC for waiting.

              10       The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent
                       to the PC for writing.)




                                                                                             2004 Morgan Kaufmann Publishers   70
Instructions from ISA perspective

•   Consider each instruction from perspective of ISA.
•   Example:
     – The add instruction changes a register.
     – Register specified by bits 15:11 of instruction.
     – Instruction specified by the PC.
     – New value is the sum (“op”) of two registers.
     – Registers specified by bits 25:21 and 20:16 of the instruction

       Reg[Memory[PC][15:11]] <=          Reg[Memory[PC][25:21]] op
                                          Reg[Memory[PC][20:16]]

     – In order to accomplish this we must break up the instruction.
         (kind of like introducing variables when programming)




                                                            2004 Morgan Kaufmann Publishers   71
Breaking down an instruction

•   ISA definition of arithmetic:

    Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]]                        op
                              Reg[Memory[PC][20:16]]

•   Could break down to:
     – IR <= Memory[PC]
     – A <= Reg[IR[25:21]]
     – B <= Reg[IR[20:16]]
     – ALUOut <= A op B
     – Reg[IR[20:16]] <= ALUOut

•   We forgot an important part of the definition of arithmetic!
     – PC <= PC + 4




                                                             2004 Morgan Kaufmann Publishers   72
Idea behind multicycle approach

 •   We define each instruction from the ISA perspective (do this!)

 •   Break it down into steps following our rule that data flows through at
     most one major functional unit (e.g., balance work across steps)

 •   Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)

 •   Finally try and pack as much work into each step
           (avoid unnecessary cycles)
     while also trying to share steps where possible
           (minimizes control, helps to simplify solution)


 •   Result: Our book‟s multicycle Implementation!




                                                             2004 Morgan Kaufmann Publishers   73
Five Execution Steps

•   Instruction Fetch

•   Instruction Decode and Register Fetch

•   Execution, Memory Address Computation, or Branch Completion

•   Memory Access or R-type instruction completion

•   Write-back step


                 INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!




                                                      2004 Morgan Kaufmann Publishers   74
Step 1: Instruction Fetch

•   Use PC to get instruction and put it in the Instruction Register.
•   Increment the PC by 4 and put the result back in the PC.
•   Can be described succinctly using RTL "Register-Transfer Language"

          IR <= Memory[PC];
          PC <= PC + 4;

    Can we figure out the values of the control signals?

    What is the advantage of updating the PC now?




                                                           2004 Morgan Kaufmann Publishers   75
Step 2: Instruction Decode and Register Fetch

 •   Read registers rs and rt in case we need them
 •   Compute the branch address in case the instruction is a branch
 •   RTL:

          A <= Reg[IR[25:21]];
          B <= Reg[IR[20:16]];
          ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

 •   We aren't setting any control lines based on the instruction type
         (we are busy "decoding" it in our control logic)




                                                             2004 Morgan Kaufmann Publishers   76
Step 3 (instruction dependent)

 •   ALU is performing one of three functions, based on instruction type

 •   Memory Reference:

          ALUOut <= A + sign-extend(IR[15:0]);

 •   R-type:

          ALUOut <= A op B;

 •   Branch:

          if (A==B) PC <= ALUOut;




                                                           2004 Morgan Kaufmann Publishers   77
Step 4 (R-type or memory-access)

•   Loads and stores access memory

         MDR <= Memory[ALUOut];
                or
         Memory[ALUOut] <= B;

•   R-type instructions finish

         Reg[IR[15:11]] <= ALUOut;


    The write actually takes place at the end of the cycle on the edge




                                                            2004 Morgan Kaufmann Publishers   78
Write-back step

 • Reg[IR[20:16]] <= MDR;

 Which instruction needs this?




                                 2004 Morgan Kaufmann Publishers   79
Summary:




           2004 Morgan Kaufmann Publishers   80
Simple Questions

•   How many cycles will it take to execute this code?

                 lw $t2, 0($t3)
                 lw $t3, 4($t3)
                 beq $t2, $t3, Label               #assume not
                 add $t5, $t2, $t3
                 sw $t5, 8($t3)
    Label:       ...


•   What is going on during the 8th cycle of execution?
•   In what cycle does the actual addition of $t2 and $t3 takes place?




                                                         2004 Morgan Kaufmann Publishers   81
Problem: CPI in a multicycle CPU

 •   Using the SPECINT2000 instruction mix shown in Figure 3.26, what is
     the CPI, assuming that each state in the multicycle CPU requires 1
     clock cycle?

 Answer:
   The mix is 25% loads (1% load byte+24% load word), 10% stores (1%
   store byte+9% store word), 11% branches (6% beq, 5% bne), 2% jumps
   (1% jal+1% jr), and 52% ALU (all the rest of the mix, which we assume to
   be ALU instructions). From Figure 5.30 on page 329, the number of clock
   cycles for each instruction class is the following:
         Loads: 5 ; Store: 4; ALU instructions: 4; Branches: 3; Jumps: 3;

     The CPI is given by the following:

                CPU clock cycles  Instruction counti  CPI i
           CPI                    
                Instruction count         Instruction count
                   Instruction counti
                                     CPI i
                    Instruction count                         2004 Morgan Kaufmann Publishers   82
•   The ratio

                          n
                Instructio counti
                          n
                Instructio count
    is simplify the instruction frequency for the instruction class i. We
    can therefore substitute to obtain

     CPI  0.25 5  0.10  4  0.52  4  0.11 3  0.02  3  4.12
    This CPI is better than the worst-case CPI of 5.0 when all the
    instructions take the same number of clock cycles.




                                                             2004 Morgan Kaufmann Publishers   83
Review: finite state machines

 •   Finite state machines:
      – a set of states and
      – next state function (determined by current state and the input)
      – output function (determined by current state and possibly input)


                                                        Next
                                                        state
                                          Next-state
                      Current state
                                           function


                         Clock
           Inputs




                                            Output
                                                                    Outputs
                                           function




      – We‟ll use a Moore machine (output based only on current state)
                                                            2004 Morgan Kaufmann Publishers   84
Review: finite state machines

 •   Example:

     B. 37 A friend would like you to build an “electronic eye” for use as a
     fake security device. The device consists of three lights lined up in a
     row, controlled by the outputs Left, Middle, and Right, which, if
     asserted, indicate that a light should be on. Only one light is on at a
     time, and the light “moves” from left to right and then from right to
     left, thus scaring away thieves who believe that the device is monitoring
     their activity. Draw the graphical representation for the finite state
     machine used to specify the electronic eye. Note that the rate of the
     eye’s movement will be controlled by the clock speed (which should not
     be too great) and that there are essentially no inputs.




                                                               2004 Morgan Kaufmann Publishers   85
Implementing the Control

•   Value of control signals is dependent upon:
     – what instruction is being executed
     – which step is being performed

•   Use the information we‟ve accumulated to specify a finite state machine
     – specify the finite state machine graphically, or
     – use microprogramming

•   Implementation can be derived from specification




                                                          2004 Morgan Kaufmann Publishers   86
Figure 5.31 The high-level view of the finite state machine
control.




                                                 2004 Morgan Kaufmann Publishers   87
Figure 5.32 The instruction fetch and decode portion of every
instruction is identical.




                                                2004 Morgan Kaufmann Publishers   88
Figure 5.33 The finite state machine for controlling memory-
reference instructions has four states.




                                                2004 Morgan Kaufmann Publishers   89
Figure 5.34 R-type instructions can be implemented with a
simple two-state finite state machine.




                                               2004 Morgan Kaufmann Publishers   90
Figure 5.35 The branch instruction requires a single state.




                                                 2004 Morgan Kaufmann Publishers   91
Figure 5.36 The jump instruction requires a single state that asserts two control
signals to write the PC with the lower 26 bits of the instruction register shifted left 2
bits and concatenated to the upper 4 bits of the PC of this instruction.




                                                                      2004 Morgan Kaufmann Publishers   92
Finite State Machine for Control

 •   Implementation:

                                                                                     PCWrite
                                                                                     PCWriteCond
                                                                                     IorD
                                                                                     MemRead
                                                                                     MemWrite
                                                                                     IRWrite
                                               Control logic
                                                                                     MemtoReg
                                                                                     PCSource
                                                                                     ALUOp
                                                                          Outputs    ALUSrcB
                                                                                     ALUSrcA
                                                                                     RegWrite
                                                                                     RegDst

                                                                                     NS3
                                                                                     NS2
                                                                                     NS1
                                                     Inputs                          NS0
                       Op5

                             Op4

                                   Op3

                                         Op2

                                               Op1

                                                      Op0


                                                               S3

                                                                     S2

                                                                           S1

                                                                                S0
                              Instruction register                  State register
                                 opcode field




                                                                                                   2004 Morgan Kaufmann Publishers   93
Graphical Specification of FSM

 •   Note:
      – don‟t care if not mentioned
      – asserted if name only
      – otherwise exact value



 •   How many state
     bits will we need?




                                      2004 Morgan Kaufmann Publishers   94
2004 Morgan Kaufmann Publishers   95
PLA Implementation (Section C.3 & AppendixB)
 •   If I picked a horizontal or vertical line could you explain it?
                   Op5

                   Op4

                   Op3

                   Op2

                   Op1

                   Op0

                   S3

                   S2

                   S1

                   S0


                                                   PCWrite
                                                   PCWriteCond
                                                   IorD
                                                   MemRead
                                                   MemWrite
                                                   IRWrite
                                                   MemtoReg
                                                   PCSource1
                                                   PCSource0
                                                   ALUOp1
                                                   ALUOp0
                                                   ALUSrcB1
                                                   ALUSrcB0
                                                   ALUSrcA
                                                   RegWrite
                                                   RegDst
                                                   NS3
                                                   NS2
                                                   NS1
                                                   NS0
                                                                 2004 Morgan Kaufmann Publishers   96
ROM Implementation (Section C.3 & AppendixB)


 •   ROM = "Read Only Memory"
      – values of memory locations are fixed ahead of time
 •   A ROM can be used to implement a truth table
      – if the address is m-bits, we can address 2m entries in the ROM.
      – our outputs are the bits of data that the address points to.


                                                  0   0   0   0   0   1   1
                                                  0   0   1   1   1   0   0
               m            n                     0   1   0   1   1   0   0
                                                  0   1   1   1   0   0   0
                                                  1   0   0   0   0   0   0
                                                  1   0   1   0   0   0   1
                                                  1   1   0   0   1   1   0
                                                  1   1   1   0   1   1   1




        m is the "height", and n is the "width"
                                                                              2004 Morgan Kaufmann Publishers   97
ROM Implementation

•   How many inputs are there?
        6 bits for opcode, 4 bits for state = 10 address lines
        (i.e., 210 = 1024 different addresses)
•   How many outputs are there?
        16 datapath-control outputs, 4 state bits = 20 outputs

•   ROM is 210 x 20 = 20K bits   (and a rather unusual size)

•   Rather wasteful, since for lots of the entries, the outputs are the
    same
         — i.e., opcode is often ignored




                                                               2004 Morgan Kaufmann Publishers   98
ROM vs PLA

•   Break up the table into two parts
         — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
         — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
         — Total: 4.3K bits of ROM
•   PLA is much smaller
         — can share product terms
         — only need entries that produce an active output
         — can take into account don't cares
•   Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms)
          For this example = (10x17)+(20x17) = 510 PLA cells


•   PLA cells usually about the size of a ROM cell (slightly bigger)




                                                             2004 Morgan Kaufmann Publishers   99
Another Implementation Style

•   Complex instructions: the "next state" is often current state + 1

                     Control unit                                       PCWrite
                                                                        PCWriteCond
                                                                        IorD
                                                                        MemRead
                                          PLA or ROM                    MemWrite
                                                                        IRWrite
                                                                        BWrite
                                                              Outputs   MemtoReg
                                                                        PCSource
                                                                        ALUOp
                                                                        ALUSrcB
                                                                        ALUSrcA
                                                                        RegWrite
                                                                        RegDst
                                              Input                     AddrCtl

                          1

                                              State

                               Adder

                                       Address select logic
                                              Op[5– 0]




                                       Instruction register
                                          opcode field




                                                                                      2004 Morgan Kaufmann Publishers   100
Details
                 Dispatch ROM 1                                  Dispatch ROM 2
     Op           Opcode name           Value             Op      Opcode name            Value
   000000            R-format           0110            100011         lw                0011
   000010              jmp              1001            101011         sw                0101
   000100              beq              1000                                                  PLA or ROM
   100011               lw              0010
   101011               sw              0010                      1

                                                                                                   State

                                                                      Adder

                                                                                                   Mux                                      AddrCtl
                                                                                             3     2 1      0


                                                                                                            0


                                                                                  Dispatch ROM 2           Dispatch ROM 1

                                                                                                                     Address select logic


State number                   Address-control action              Value of AddrCtl
                                                                                           Instruction register
      0        Use incremented state                                      3                   opcode field

      1        Use dispatch ROM 1                                         1
      2        Use dispatch ROM 2                                         2
      3        Use incremented state                                      3
      4        Replace state number by 0                                  0
      5        Replace state number by 0                                  0
      6        Use incremented state                                      3
      7        Replace state number by 0                                  0
      8        Replace state number by 0                                  0
      9        Replace state number by 0                                  0                        2004 Morgan Kaufmann Publishers     101
5.6   Exceptions




                   2004 Morgan Kaufmann Publishers   102
Keywords

• Exception Also called interrupt. An unscheduled event that
  disrupts program execution; used to detect overflow.

• Interrupt An exception that comes from outside of the processor.
  (Some architectures use the term interrupt for all exceptions.)


 Type of event                                   From where? MIPS terminology
 I/O device request                              External    Interrupt
 Invoke the operating system from user program   Internal    Exception
 Arithmetic overflow                             Internal    Exception
 Using an undefined instruction                  Internal    Exception
 Hardware malfunctions                           External    Exception or interrupt




                                                                         2004 Morgan Kaufmann Publishers   103
Keywords

• Vectored interrupt An interrupt for which the address to which
  the address to which control is transferred is determined by the
  cause of the exception.


         Exception type        Exception vector address (in hex)
     Undefined instruction            C 000 0000 hex
     Arithmetic overflow              C 000 0020 hex




                                                           2004 Morgan Kaufmann Publishers   104
How control Checks for Exceptions?


• Undefined instruction

• Arithmetic overflow




                                     2004 Morgan Kaufmann Publishers   105
Figure 5.39 The multicycle datapath with the addition needed
to implement exceptions.




                                               2004 Morgan Kaufmann Publishers   106
Figure 5.40 This shows the finite state machine with the
additions to handle exception detection.




                                                2004 Morgan Kaufmann Publishers   107
5.9 Real Stuff: The Organization of
         Recent Pentium




                             2004 Morgan Kaufmann Publishers   108
Keywords

• Microprogrammed control A method of specifying control that
  uses microcode rather than a finite state representation.

• Hardwired control An implementation of finite state machine
  control typically using programmable logic arrays (PLAs) or
  collections of PLAs and random logic.

• Microcode      The set of microinstructions that control a processor.

• Superscalar An advanced pipelining technique that enables the
  processor to execute more than one instruction per clock cycle.

• Microinstruction A representation of control using low-level
  instructions, each of which asserts a set of control signals that are
  active on a given clock cycle as well as specified what
  microinstruction to execute next.

                                                          2004 Morgan Kaufmann Publishers   109
Keywords

• Micro-operations The RISC-like instructions directly executed
  by the hardware in recent Pentium implementations.

• Trace cache An instruction cache that holds a sequence of
  instructions with a given starting address; in recent Pentium
  implementations the trace cache holds microoperations rather
  than IA-32 instructions.

• Dispatch An operation in a microprogrammed control unit in
  which the next microinstruction is selected on the basis of one or
  more fields of a macroinstruction, usually by creating a table
  containing the addresses of the target microinstructions and
  indexing the table using a field of the macroinstruction. The
  dispatch tables are typically implemented in ROM or
  programmable logic array (PLA). The term dispatch is also used
  in dynamically scheduled processors to refer to the process of
  sending an instruction to a queue.
                                                        2004 Morgan Kaufmann Publishers   110
Microprogramming
           Control unit                                    PCWrite
                                                           PCWriteCond
                                                           IorD
                              Microcode memory             MemRead       Datapath
                                                           MemWrite
                                                           IRWrite
                                                           BWrite
                                                 Outputs   MemtoReg
                                                           PCSource
                                                           ALUOp
                                                           ALUSrcB
                                                           ALUSrcA
                                                           RegWrite
                                                           RegDst
                                                           AddrCtl
                                     Input
                 1

                             Microprogram counter

                     Adder

                             Address select logic




                              Instruction register
                                 opcode field



•   What are the “microinstructions” ?
                                                                              2004 Morgan Kaufmann Publishers   111
Microprogramming

•   A specification methodology
     – appropriate if hundreds of opcodes, modes, cycles, etc.
     – signals specified symbolically using microinstructions
                 ALU                    Register             PCWrite
      Label    control   SRC1    SRC2   control     Memory    control           Sequencing
    Fetch     Add        PC     4                  Read PC ALU                 Seq
              Add        PC     Extshft Read                                   Dispatch 1
    Mem1      Add        A      Extend                                         Dispatch 2
    LW2                                            Read ALU                    Seq
                                       Write MDR                               Fetch
    SW2                                            Write ALU                   Fetch
    Rformat1 Func code A        B                                              Seq
                                       Write ALU                               Fetch
    BEQ1      Subt       A      B                              ALUOut-cond     Fetch
    JUMP1                                                      Jump address    Fetch

•   Will two implementations of the same architecture have the same microcode?
•   What would a microassembler do?


                                                                         2004 Morgan Kaufmann Publishers   112
 Microinstruction format
    Field name            Value      Signals active                                               Comment
                   Add            ALUOp = 00          Cause the ALU to add.
ALU control        Subt           ALUOp = 01          Cause the ALU to subtract; this implements the compare for
                                                      branches.
                   Func code      ALUOp = 10          Use the instruction's function code to determine ALU control.
SRC1               PC             ALUSrcA = 0         Use the PC as the first ALU input.
                   A              ALUSrcA = 1         Register A is the first ALU input.
                   B              ALUSrcB = 00        Register B is the second ALU input.
SRC2               4              ALUSrcB = 01        Use 4 as the second ALU input.
                   Extend         ALUSrcB = 10        Use output of the sign extension unit as the second ALU input.
                   Extshft        ALUSrcB = 11        Use the output of the shift-by-two unit as the second ALU input.
                   Read                               Read two registers using the rs and rt fields of the IR as the register
                                                      numbers and putting the data into registers A and B.
                   Write ALU      RegWrite,           Write a register using the rd field of the IR as the register number and
Register                          RegDst = 1,         the contents of the ALUOut as the data.
control                           MemtoReg = 0
                   Write MDR      RegWrite,           Write a register using the rt field of the IR as the register number and
                                  RegDst = 0,         the contents of the MDR as the data.
                                  MemtoReg = 1
                   Read PC        MemRead,            Read memory using the PC as address; write result into IR (and
                                  lorD = 0            the MDR).
Memory             Read ALU       MemRead,            Read memory using the ALUOut as address; write result into MDR.
                                  lorD = 1
                   Write ALU      MemWrite,           Write memory using the ALUOut as address, contents of B as the
                                  lorD = 1            data.
                   ALU            PCSource = 00       Write the output of the ALU into the PC.
                                  PCWrite
PC write control   ALUOut-cond    PCSource = 01,      If the Zero output of the ALU is active, write the PC with the contents
                                  PCWriteCond         of the register ALUOut.
                   jump address   PCSource = 10,      Write the PC with the jump address from the instruction.
                                  PCWrite
                   Seq            AddrCtl = 11        Choose the next microinstruction sequentially.
Sequencing         Fetch          AddrCtl = 00        Go to the first microinstruction to begin a new instruction.
                   Dispatch 1     AddrCtl = 01        Dispatch using the ROM 1.
                   Dispatch 2     AddrCtl = 10        Dispatch using the ROM 2.
                                                                                                               2004 Morgan Kaufmann Publishers   113
Maximally vs. Minimally Encoded

•   No encoding:
     – 1 bit for each datapath operation
     – faster, requires more memory (logic)
     – used for Vax 780 — an astonishing 400K of memory!
•   Lots of encoding:
     – send the microinstructions through logic to get control signals
     – uses less memory, slower
•   Historical context of CISC:
     – Too much logic to put on a single chip with everything else
     – Use a ROM (or even RAM) to hold the microcode
     – It‟s easy to add new instructions




                                                          2004 Morgan Kaufmann Publishers   114
Microcode: Trade-offs

•   Distinction between specification and implementation is sometimes blurred

•   Specification Advantages:
     – Easy to design and write
     – Design architecture and microcode in parallel
•   Implementation (off-chip ROM) Advantages
     – Easy to change since values are in memory
     – Can emulate other architectures
     – Can make use of internal registers
•   Implementation Disadvantages, SLOWER now that:
     – Control is implemented on same chip as processor
     – ROM is no longer faster than RAM
     – No need to go back and make changes

                                                                2004 Morgan Kaufmann Publishers   115
5.10 Fallacies and Pitfalls




                              2004 Morgan Kaufmann Publishers   116
• Pitfall: Adding a complex instruction implemented with
  microprogramming may not be faster than a sequence using
  simpler instructions.

• Fallacy: If there is space in the control store, new instructions are
  free of cost.




                                                         2004 Morgan Kaufmann Publishers   117
5.11 Concluding Remarks




                          2004 Morgan Kaufmann Publishers   118
Figure 5.41 Alternative methods for specifying and
implementing control.




                                               2004 Morgan Kaufmann Publishers   119
5.12 Historical Perspective and Further
                Reading




                                2004 Morgan Kaufmann Publishers   120
Historical Perspective

 •   In the „60s and „70s microprogramming was very important for
     implementing machines
 •   This led to more sophisticated ISAs and the VAX
 •   In the „80s RISC processors based on pipelining became popular
 •   Pipelining the microinstructions is also possible!
 •   Implementations of IA-32 architecture processors since 486 use:
      – “hardwired control” for simpler instructions
          (few cycles, FSM control implemented using PLA or random logic)
      – “microcoded control” for more complex instructions
          (large numbers of cycles, central control store)

 •   The IA-64 architecture uses a RISC-style ISA and can be
     implemented without a large central control store




                                                               2004 Morgan Kaufmann Publishers   121
Pentium 4

•   Pipelining is important (last IA-32 without it was 80386 in 1985)

                                                                  Control

                                   Control                                   I/O
                                                                             interface




                                    Instruction cache
                                                                  Data
                                                                  cache
                                                                                          Chapter 7
                                         Enhanced
                                         floating point
                                         and multimedia           Integer
                                                                  datapath
                                                                             Secondary
                                                                             cache
                                                                             and

                                                        Control
                                                                             memory
                                                                             interface   Chapter 6
                                   Advanced pipelining
                                                                  Control
                                   hyperthreading support




•   Pipelining is used for the simple instructions favored by compilers

    “Simply put, a high performance implementation needs to ensure that the simple
    instructions execute quickly, and that the burden of the complexities of the
    instruction set penalize the complex, less frequently used, instructions”



                                                                                           2004 Morgan Kaufmann Publishers   122
Pentium 4

•   Somewhere in all that “control we must handle complex instructions

                                                                     Control

                                      Control                                   I/O
                                                                                interface




                                       Instruction cache
                                                                     Data
                                                                     cache
                                            Enhanced
                                            floating point
                                            and multimedia           Integer
                                                                     datapath
                                                                                Secondary
                                                                                cache
                                                                                and
                                                                                memory
                                                           Control              interface




                                      Advanced pipelining
                                                                     Control
                                      hyperthreading support




•   Processor executes simple microinstructions, 70 bits wide (hardwired)
•   120 control lines for integer datapath (400 for floating point)
•   If an instruction requires more than 4 microinstructions to implement,
    control from microcode ROM (8000 microinstructions)
•   Its complicated!



                                                                                            2004 Morgan Kaufmann Publishers   123
Chapter 5 Summary

•   If we understand the instructions…
          We can build a simple processor!
•   If instructions take different amounts of time, multi-cycle is better
•   Datapath implemented using:
     – Combinational logic for arithmetic
     – State holding elements to remember bits
•   Control implemented using:
     – Combinational logic for single-cycle implementation
     – Finite state machine for multi-cycle implementation




                                                              2004 Morgan Kaufmann Publishers   124

								
To top