Docstoc

07_2

Document Sample
07_2 Powered By Docstoc
					              Computer Architecture
                  Nguy n Trí Thành
             Information Systems Department
                  Faculty of Technology
                  College of Technology
                   ntthanh@vnu.edu.vn


11/27/2010                                    1
      Enhancing Performance
          with Pipelining



11/27/2010                    2
Pipelining
                Start work ASAP!! Do not waste time!
                     6 PM   7   8   9   10   11   12    1   2 AM
              Time
             Task

             order
                 A
                                                       Not pipelined
                 B

                 C

                 D




   Assume 30 min. each task – wash, dry, fold, store – and that
   separate tasks use separate hardware and so can be overlapped
                     6 PM   7   8   9   10   11   12    1   2 AM
              Time


             Task

             order

                 A
                                                        Pipelined
                 B

                 C


                 D
11/27/2010                                                             3
    Pipelined vs. Single-Cycle
    Instruction Execution: the Plan
             Program
             execution                        2             4             6            8             10            12            14        16           18
             order          Time
             (in instructions)
                                   Instruction                          Data                                                    Single-cycle
                lw $1, 100($0)        fetch
                                               Reg         ALU
                                                                       access
                                                                                 Reg

                                                                                           Instruction                           Data
                lw $2, 200($0)                             8 ns                               fetch
                                                                                                       Reg         ALU
                                                                                                                                access
                                                                                                                                         Reg

                                                                                                                                               Instruction
                lw $3, 300($0)                                                                                    8 ns                            fetch
                                                                                                                                                             ...
                                                                                                                                                     8 ns

   Assume 2 ns for memory access, ALU operation; 1 ns for register access:
   therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
             Program
             execution                       2              4             6            8             10            12            14
                             Time
             order
             (in instructions)
                                  Instruction                                  Data
                 lw $1, 100($0)                         Reg        ALU                     Reg
                                      fetch                                   access

                                                  Instruction                                Data
                                                                                                                                      Pipelined
                lw $2, 200($0)        2 ns                            Reg       ALU                       Reg
                                                     fetch                                  access

                                                                Instruction                                Data
                lw $3, 300($0)                      2 ns                         Reg          ALU                       Reg
                                                                   fetch                                  access

11/27/2010                                                        2 ns        2 ns           2 ns          2 ns          2 ns                                      4
Pipelining: Keep in Mind
      Pipelining does not reduce latency of a single
      task, it increases throughput of entire workload
      Pipeline rate limited by longest stage
             potential speedup = number pipe stages
             unbalanced lengths of pipe stages reduces
             speedup
      Time to fill pipeline and time to drain it – when
      there is slack in the pipeline – reduces
      speedup

11/27/2010                                                5
Example Problem
      Problem: for the laundry fill in the following table when
      1.     the stage lengths are 30, 30, 30 30 min., resp.
      2.     the stage lengths are 20, 20, 60, 20 min., resp.

Person        Unpipelined   Pipeline 1    Ratio unpipelined   Pipeline 2    Ratio unpiplelined
              finish time   finish time   to pipeline 1       finish time   to pipeline 2
  1
  2
  3
  4

  n




      Come up with a formula for pipeline speed-up!

11/27/2010                                                                                  6
Pipelining MIPS

    What makes it easy with MIPS?
         all instructions are same length
            so fetch and decode stages are similar for all instructions
         just a few instruction formats
            simplifies instruction decode and makes it possible in one
            stage
         memory operands appear only in load/stores
            so memory access can be deferred to exactly one later stage
         operands are aligned in memory
            one data transfer instruction requires one memory access
            stage


11/27/2010                                                                7
Pipelining MIPS
     What makes it hard?
             structural hazards: different instructions, at different stages,
             in the pipeline want to use the same hardware resource
             control hazards: succeeding instruction, to put into pipeline,
             depends on the outcome of a previous branch instruction,
             already in pipeline
             data hazards: an instruction in the pipeline requires data to
             be computed by a previous instruction still in the pipeline

     Before actually building the pipelined datapath and
     control we first briefly examine these potential
     hazards individually…
11/27/2010                                                                  8
Structural Hazards
   Structural hazard: inadequate hardware to simultaneously support
   all instructions in the pipeline in the same clock cycle
   E.g., suppose single – not separate – instruction and data memory
   in pipeline below with one read port
       then a structural hazard between first and fourth lw instructions
             Program
             execution                       2             4             6              8            10            12            14
                             Time
             order
             (in instructions)
                                  Instruction                                   Data
                 lw $1, 100($0)                        Reg        ALU                       Reg
                                      fetch                                    access
                                                                                                                         Pipelined
                                                 Instruction                                 Data
                lw $2, 200($0)        2 ns                           Reg        ALU                       Reg
                                                    fetch                                   access
                                                                                                                              Hazard if single memory
                                                               Instruction                                 Data
                lw $3, 300($0)                     2 ns                            Reg        ALU                       Reg
                                                                  fetch                                   access
                                                                             Instruction                                 Data
                lw $4, 400($0)                                                                    Reg      ALU                        Reg
                                                                 2 ns           fetch                                   access

                                                                               2 ns          2 ns          2 ns          2 ns          2 ns



   MIPS was designed to be pipelined: structural hazards are easy to
   avoid!
11/27/2010                                                                                                                                              9
Control Hazards
     Control hazard: need to make a decision based on the
     result of a previous instruction still executing in pipeline
     Solution 1 Stall the pipeline

             Program
             execution                              2             4             6             8            10         12            14     16
             order             Time
             (in instructions)
                                      Instruction                                    Data                                      Note that branch outcome is
                add $4, $5, $6                            Reg          ALU                        Reg
                                         fetch                                      access                                     computed in ID stage with
                                                    Instruction                                    Data                        added hardware (later…)
                 beq $1, $2, 40                        fetch
                                                                          Reg        ALU
                                                                                                  access
                                                                                                              Reg
                                         2ns
                                                                                Instruction                                 Data
                 lw $3, 300($0)                                       bubble       fetch
                                                                                                        Reg     ALU
                                                                                                                           access
                                                                                                                                     Reg


                                                            4 ns                     2ns


                                                                  Pipeline stall
11/27/2010                                                                                                                                            10
Control Hazards
       Solution 2 Predict branch outcome
               e.g., predict branch-not-taken :
    Program

                                                        

    execution
                        2             4             6            8            10            12            14
    order
            Time
    (in instructions)
                             Instruction
                              Data

       add $4, $5, $6           fetch
                                         Reg                ALU
                                                                      access
                                                                                   Reg

                                          Instruction
                               Data

        beq $1, $2, 40                                Reg               ALU                      Reg
                               2 ns          fetch                                  access

                                                        Instruction
                               Data

        lw $3, 300($0)                                              Reg              ALU                       Reg
                                            2 ns           fetch                                  access

                                                Prediction success
   Program
                                             

   execution
                         2             4             6            8            10            12            14
   order
            Time
   (in instructions)
                             Instruction
                              Data

       add $4, $5 ,$6                    Reg                ALU                    Reg
                                fetch                                 access

                                       Instruction
                                 Data

       beq $1, $2, 40                              Reg                 ALU                   Reg
                                          fetch                                    access
                               2 ns
                                                        bubble        bubble       bubble        bubble    bubble

                                                                  Instruction
                                  Data

             or $7, $8, $9                                                    Reg                 ALU                    Reg
                                                                     fetch                                     access
                                                   4 ns
11/27/2010                                                                                                                     11
                                 Prediction failure: undo (=flush) lw
Control Hazards
 Solution 3 Delayed branch: always execute the sequentially next
 statement with the branch executing after one instruction delay –
 compiler’s job to find a statement that can be put in the slot that is
 independent of branch outcome
       MIPS does this – but it is an option in SPIM (Simulator -> Settings)
     Program
     execution                               2             4             6            8            10            12     14
     order             Time
     (in instructions)

         beq $1, $2, 40        Instruction                                    Data
                                                    Reg        ALU                        Reg
                                  fetch                                      access

         add $4, $5, $6                      Instruction                                   Data
                                                                 Reg          ALU                   Reg
       (d elayed branch slot)    2 ns           fetch                                     access

                                                           Instruction                                   Data
             lw $3, 300($0)                                                     Reg         ALU                   Reg
                                                 2 ns         fetch                                     access

                                                               2 ns

                              Delayed branch beq is followed by add that is
                              independent of branch outcome
11/27/2010                                                                                                                   12
Data Hazards
    Data hazard: instruction needs data from the result of a
    previous instruction still executing in pipeline
    Solution Forward data if possible…


                                     2        4            6             8         10
                     Time
                                                                                         Instruction pipeline diagram:
      add $s0, $t0, $t1          IF       ID        EX           MEM           WB        shade indicates use –
                                                                                         left=write, right=read



         Program
         execution                    2        4        6             8        10
         order          Time
         (in instructions)
             add $s0, $t0, $t1   IF       ID       EX           MEM       WB
                                                                                         Without forwarding – blue line –
                                                                                         data has to go back in time;
                                                                                         with forwarding – red line –
             sub $t2, $s0, $t3
                                                                                         data is available in time
                                          IF       ID            EX       MEM       WB




11/27/2010                                                                                                                  13
Data Hazards
    Forwarding may not be enough
         e.g., if an R-type instruction following a load uses the result of the load –
         called load-use data hazard
                                            2            4            6            8          10         12        14
              Program        Time
              execution
              order
              (in instructions)
                                                                                                                   Without a stall it is impossible
                  lw $s0, 20($t1)      IF         ID            EX         MEM          WB
                                                                                                                   to provide input to the sub
                                                                                                                   instruction in time
                  sub $t2, $s0, $t3               IF            ID           EX        MEM          WB



                                            2          4             6           8           10        12      14
             Program
        Time
             execution

             order

             (in instructions)
                                                                                                               With a one-stage stall, forwarding
                lw $s0, 20($t1)       IF         ID            EX         MEM          WB                      can get the data to the sub
                                                                                                               instruction in time
                                                bubble       bubble       bubble       bubble      bubble


                sub $t2, $s0, $t3                              IF           ID           EX        MEM        WB
11/27/2010                                                                                                                                            14
   Reordering Code to Avoid
   Pipeline Stall (Software Solution)
  Example:
lw $t0, 0($t1)
lw $t2, 4($t1)      Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)

  Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
                    Interchanged
sw $t2, 0($t1)



11/27/2010                          15
Pipelined Datapath
        We now move to actually building a pipelined datapath
        First recall the 5 steps in instruction execution
   1.       Instruction Fetch & PC Increment (IF)
   2.       Instruction Decode and Register Read (ID)
   3.       Execution or calculate address (EX)
   4.       Memory access (MEM)
   5.       Write result into register (WB)
        Review: single-cycle processor
            all 5 steps done in a single clock cycle
            dedicated hardware required for each step

        What happens if we break the execution into multiple cycles, but keep
        the extra hardware?
11/27/2010                                                                      16
Review - Single-Cycle Datapath
“Steps”

                ADD


        4                                                                           ADD


PC                                                                       <<2
                                  Instruction I
       ADDR        RD
                        32   16     32
                                              5    5      5
         Instruction
           Memory                         RN1     RN2   WN
                                                         RD1                                Zero
                                          Register File                            ALU
                                         WD
                                                         RD2              M
                                                                          U                    ADDR
                                                                          X
                                                                                                     Data
                                                                                                             RD
                                                        E                                           Memory               M
                                                                                                                         U
                                                   16   X     32                                                         X
                                                        T                                      WD
                                                        N
                                                        D




         IF
 11/27/2010                                       ID                          EX                   MEM              WB
                                                                                                                    17
 Instruction Fetch                  Instruction Decode             Execute/ Address Calc.   Memory Access         Write Back
Pipelined Datapath – Key Idea
     What happens if we break the execution into
     multiple cycles, but keep the extra hardware?
             Answer: We may be able to start executing a new
             instruction at each clock cycle - pipelining
     …but we shall need extra registers to hold data
     between cycles – pipeline registers




11/27/2010                                                     18
Pipelined Datapath

                             Pipeline registers wide enough to hold data coming in
                ADD


         4                                                                      ADD
                             64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
                                   Instruction I
       ADDR        RD
                        32    16     32
                                               5    5      5
         Instruction
           Memory                          RN1     RN2   WN
                                                          RD1
                                                                                          Zero
                                           Register File                       ALU
                                          WD
                                                          RD2             M
                                                                          U                   ADDR
                                                                          X
                                                                                                    Data
                                                                                                   Memory   RD          M
                                                          E                                                             U
                                                    16    X    32                                                       X
                                                          T                                   WD
                                                          N
                                                          D




                        IF/ID                                  ID/EX                 EX/MEM                 MEM/WB
 11/27/2010                                                                                                        19
Pipelined Datapath

                             Pipeline registers wide enough to hold data coming in
                ADD


         4                                                                      ADD
                             64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
                                   Instruction I
       ADDR        RD
                        32    16     32
                                               5    5      5
         Instruction
           Memory                          RN1     RN2   WN
                                                          RD1
                                                                                          Zero
                                           Register File                       ALU
                                          WD
                                                          RD2             M
                                                                          U                   ADDR
                                                                          X
                                                                                                    Data
                                                                                                   Memory   RD          M
                                                          E                                                             U
                                                    16    X    32                                                       X
                                                          T                                   WD
                                                          N
                                                          D




                        IF/ID                                  ID/EX                 EX/MEM                 MEM/WB
 11/27/2010                                                                                                        20
                        Only data flowing right to left may cause hazard…, why?
Bug in the Datapath

                             IF/ID                                         ID/EX               EX/MEM                      MEM/WB
                 ADD


        4                                                                                 ADD


PC                                                                                 <<2
                                          Instruction I
       ADDR        RD
                        32           16     32
                                                      5    5      5
         Instruction
           Memory                                 RN1     RN2   WN
                                                                 RD1
                                                  Register File                          ALU
                                                 WD
                                                                 RD2                M
                                                                                    U                   ADDR
                                                                                    X
                                                                                                              Data
                                                                                                             Memory   RD             M
                                                                E                                                                    U
                                                           16   X     32                                                             X
                                                                T                                       WD
                                                                N
                                                                D




 11/27/2010                                                                                                                     21

              Write register number comes from another later instruction!
Corrected Datapath
                          IF/ID                               ID/EX               EX/MEM                  MEM/WB

                  ADD
                                                                             ADD
              4               64 bits                            133 bits
                                                                                      102 bits                69 bits
                                                                      <<2
PC
       ADDR         RD                5
                                          RN1          RD1
                         32                                                                Zero
         Instruction                      RN2
                                                                            ALU
                                      5
           Memory                               Register
                                      5
                                          WN      File RD2             M
                                          WD                           U                    ADDR
                                                                       X
                                                                                                   Data
                                                       E                                          Memory RD             M
                                                                                                                        U
                                                  16   X 32                                                             X
                                                       T                                    WD
                                                       N
                                  5                    D




 11/27/2010   Destination register number is also passed through ID/EX, EX/MEM                                  22
              and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
     Consider the following instruction sequence:
       lw    $t0,   10($t1)
       sw $t3, 20($t4)
       add $t5, $t6, $t7
       sub $t8, $t9, $t10




11/27/2010                                          23
 Single-Clock-Cycle Diagram:
 Clock Cycle 1
        LW

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    24
 Single-Clock-Cycle Diagram:
 Clock Cycle 2
        SW                                LW

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    25
 Single-Clock-Cycle Diagram:
 Clock Cycle 3
       ADD                                SW                             LW

                         IF/ID                              ID/EX                   EX/MEM                    MEM/WB

                  ADD
                                                                               ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                              ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                        ADDR
                                                                     X
                                                                                                     Data
                                                      E                                             Memory   RD         M
                                                                                                                        U
                                                 16   X   32                                                            X
                                                      T                                       WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                        26
 Single-Clock-Cycle Diagram:
 Clock Cycle 4
       SUB                               ADD                             SW                         LW

                         IF/ID                              ID/EX                   EX/MEM                    MEM/WB

                  ADD
                                                                               ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                              ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                        ADDR
                                                                     X
                                                                                                     Data
                                                      E                                             Memory   RD         M
                                                                                                                        U
                                                 16   X   32                                                            X
                                                      T                                       WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                        27
 Single-Clock-Cycle Diagram:
 Clock Cycle 5
                                           SUB                       ADD                        SW                 LW

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    28
 Single-Clock-Cycle Diagram:
 Clock Cycle 6
                                                                     SUB                    ADD                    SW

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    29
 Single-Clock-Cycle Diagram:
 Clock Cycle 7
                                                                                            SUB                ADD

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
       ADDR         RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    30
 Single-Clock-Cycle Diagram:
 Clock Cycle 8
                                                                                                               SUB

                         IF/ID                              ID/EX               EX/MEM                    MEM/WB

                  ADD
                                                                           ADD
              4
                                                                    <<2
PC
        ADDR        RD                   RN1          RD1
                         32          5
                                                                          ALU            Zero
         Instruction                     RN2
                                     5
           Memory                              Register
                                         WN      File RD2
                                     5
                                                                     M
                                         WD                          U                    ADDR
                                                                     X
                                                                                                 Data
                                                      E                                         Memory   RD         M
                                                                                                                    U
                                                 16   X   32                                                        X
                                                      T                                   WD
                                                      N
                                 5
                                                      D




 11/27/2010                                                                                                    31
  Alternative View –
  Multiple-Clock-Cycle Diagram
                     CC 1   CC 2   CC 3   CC 4    CC 5    CC 6      CC 7      CC 8
                                                                  Time axis
lw $t0, 10($t1)       IM    REG     ALU     DM      REG




sw $t3, 20($t4)              IM     REG     ALU     DM    REG




add $t5, $t6, $t7                    IM    REG      ALU     DM       REG




sub $t8, $t9, $t10                          IM      REG     ALU       DM      REG




  11/27/2010                                                                         32
Notes
    One significant difference in the execution of an R-type instruction
    between multicycle and pipelined implementations:
       register write-back for the R-type instruction is the 5th (the last
       write-back) pipeline stage vs. the 4th stage for the multicycle
       implementation. Why?
       think of structural hazards when writing to the register file…
    Worth repeating: the essential difference between the pipeline
    and multicycle implementations is the insertion of pipeline
    registers to decouple the 5 stages
    The CPI of an ideal pipeline (no stalls) is 1. Why?
    The RaVi Architecture Visualization Project of Dortmund U. has
    pipeline simulations – see link in our Additional Resources page
    As we develop control for the pipeline keep in mind that the text
    does not consider jump – should not be too hard to implement!
11/27/2010                                                              33
Recall Single-Cycle Control –
the Datapath
                                                                                                                                         0
                                                                                                                                         M

                                                                                                                                         u

                                                                                                                                         x
                                                                                                                         ALU

                                                                                                                   Add result            1
                 Add                                                                                    Shift
                               PCSrc
                                                                      RegDst                           left 2
       4                                                              Branch
                                                                      MemRead
                                     Instruction [31 26]              MemtoReg
                                                           Control
                                                                      ALUOp
                                                                      MemWrite
                                                                      ALUSrc
                                                                      RegWrite

                                     Instruction [25 21]               Read

PC     Read
                                                           register 1
       address                                                                          Read

                                     Instruction [20 16]                               data 1
                                                                       Read

                                                                       register 2                                      Zero
                      Instruction
                         0                  Registers Read
                      ALU ALU

                          [31– 0]                                                                       0                                            Read

                                                           M
          Write
          data 2                         result    Address                      1
       Instruction
                                         u
         register                         M
                                           data
                                                                                                         u
                                                  M

         memory                      Instruction [15 11]    x                                                                                                 u

                                                           1           Write
                            x                                     Data

                                                                       data                                                                                   x
                                                                                                        1                                     memory         0
                                                                                                                                Write

                                                                                                                                data
                                                                                    16            32
                                     Instruction [15 0]                                   Sign

                                                                                         extend           ALU

                                                                                                         control

                                                                     Instruction [5 0]

11/27/2010                                                                                                                                                         34
Recall Single-Cycle – ALU Control
    Instruction AluOp Instruction Funct Field Desired     ALU control
    opcode            operation              ALU action input
    LW          00      load word     xxxxxx       add             010
    SW          00      store word    xxxxxx       add             010
    Branch eq   01      branch eq     xxxxxx       subtract        110
    R-type      10      add           100000       add             010
    R-type      10      subtract      100010       subtract        110
    R-type      10      AND           100100       and             000
    R-type      10      OR            100101       or              001
    R-type      10      set on less   101010       set on less     111

                     ALUOp             Funct field     Operation
                ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
                   0        0     X X X X X X            010
                   0        1     X X X X X X            110
                   1        X     X X 0 0 0 0            010
                   1        X     X X 0 0 1 0            110
                   1        X     X X 0 1 0 0            000
                   1        X     X X 0 1 0 1            001
                   1        X     X X 1 0 1 0            111
11/27/2010                                                               35
                       Truth table for ALU control bits
Recall Single-Cycle – Control Signals
                                                          Effect of control bits
 Signal Name    Effect when deasserted                               Effect when asserted

 RegDst          The register destination number for the              The register destination number for the
                Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
 RegWrite        None                                              The register on the Write register input is written
                                                                               with the value on the Write data input
 AlLUSrc          The second ALU operand comes from the                 The second ALU operand is the sign-extended,
                second register file output (Read data 2)            lower 16 bits of the instruction
 PCSrc           The PC is replaced by the output of the adder          The PC is replaced by the output of the adder
                   that computes the value of PC + 4                    that computes the branch target
 MemRead         None                                                           Data memory contents designated by the address
                                                                                input are put on the first Read data output
 MemWrite        None                                                           Data memory contents designated by the address
                                                                                input are replaced by the value of the Write data input
 MemtoReg        The value fed to the register Write data input                 The value fed to the register Write data input
               comes from the ALU                                               comes from the data memory




                                 Memto- Reg Mem Mem
Deter- Instruction RegDst ALUSrc  Reg   Write Read Write Branch ALUOp1 ALUp0
mining R-format      1      0      0     1     0    0       0      1     0
control lw           0      1      1     1     1    0       0      0     0
bits        sw
   11/27/2010        X      1      X     0     0    1       0      0     0
                                                                         36
            beq      X      0      X     0     0    0       1      0     1
Pipeline Control
   Initial design – motivated by single-cycle datapath control – use
   the same control signals
   Observe:
                                                                                    Will be
        No separate write signal for the PC as it is written every cycle            modified
        No separate write signals for the pipeline registers as they are written    by hazard
                                                                                    detection
        every cycle                                                                 unit!!
        No separate read signal for instruction memory as it is read every clock
        cycle
        No separate read signal for register file as it is read every clock cycle
   Need to set control signals during each pipeline stage
   Since control signals are associated with components active
   during a single pipeline stage, can group control lines into five
   groups according to pipeline stage

11/27/2010                                                                               37
Pipelined Datapath with Control I
                                                                                                                                                              PCSrc


                           0
                           M

                            u

                            x
                           1



                                     IF/ID                                                 ID/EX                                      EX/MEM                                  MEM/WB


                           Add

                                                                                                                               Add

             4                                                                                                      Add
                                                                                                                             result
                                                                                                                                           Branch
                                                                                                    Shift

                                                                       RegWrite                    left 2


                                                              Read
                                                                                        MemWrite
                                             Instruction




        PC       Address                                      register 1           Read

                                                              Read
               data 1                ALUSrc
                                                                                                                             Zero
                                                                                                                               Zero                                                    MemtoReg
                      Instruction
                            register 2
                                                                     Registers Read
                                      ALU ALU

                        memory                                Write
                                         0                                                        Read

                                                                              data 2                                         result                 Address                              1
                                                              register                                       M
                                                       data
                                                                                                              u
                                                                          M

                                                                                                                                                              Data
                       u

                                                              Write
                                          x                                              memory
                                                              data                                                                                                                        x
                                                                                                             1
                                                                                                                                                                                         0
                                                                                                                                                    Write

                                                                                                                                                    data
                                                           Instruction

                                                           [15– 0]     16             32                 6
                                                                              Sign
                                 ALU

                                                                             extend                                control                                 MemRead

     Same control                                          Instruction

                                                           [20– 16]
                                                                                                        0
     signals as the                                        Instruction

                                                                                                        M

                                                                                                         u

                                                                                                                   ALUOp


     single-cycle                                          [15– 11]
                                                                                                        1
                                                                                                         x


     datapath                                                                                         RegDst


11/27/2010                                                                                                                                                                                        38
Pipeline Control Signals

    There are five stages in the pipeline
         instruction fetch / PC increment                              Nothing to control as instruction memory
         instruction decode / register fetch                           read and PC write are always enabled


         execution / address calculation
         memory access
         write back
                                                                                 Write-back
                           Execution/Address Calculation Memory access stage    stage control
                                stage control lines          control lines          lines
                            Reg    ALU     ALU      ALU         Mem       Mem   Reg Mem to
             Instruction    Dst    Op1     Op0      Src Branch Read Write       write    Reg
             R-format        1       1      0        0     0      0        0      1       0
             lw              0       0      0        1     0      1        0      1       1
             sw              X       0      0        1     0      0        1      0       X
             beq             X       0      1        0     1      0        0      0       X
11/27/2010                                                                                                39
Pipeline Control
Implementation
         Pass control signals along just like the data – extend each pipeline
         register to hold needed control bits for succeeding stages
                                           WB

                  Instruction
                                Control    M       WB


                                           EX       M       WB




                 IF/ID                    ID/EX   EX/MEM   MEM/WB


         Note: The 6-bit funct field of the instruction required in the EX stage
         to generate ALU control can be retrieved as the 6 least significant
         bits of the immediate field which is sign-extended and passed from
         the IF/ID register to the ID/EX register
11/27/2010                                                                         40
Pipelined Datapath with Control II
                       PCSrc



                                                                                                ID/EX
                      0
                      M

                       u
                                                                        WB
                       x                                                                                                                EX/MEM
                      1
                                                                                 Control         M                                       WB
                                                                                                                                                                                     MEM/WB

                                                                                                 EX                                       M                                           WB
                                IF/ID


                      Add

                                                                                                                               Add

            4                                                                                                            Add result

                                                                     RegWrite
                                                                                                                                                 Branch
                                                                                                         Shift

                                                                                                        left 2




                                                                                                                                                               MemWrite
                                                                                                                              ALUSrc
                                                         Read





                                                                                                                                                                                              MemtoReg
                                        Instruction




       PC   Address                                      register 1
                                                                          Read

                                                                         data 1
                                                         Read

                                                         register 2                                                             Zero
                 Instruction

                                                                Registers Read
                                             ALU ALU

                   memory                                Write
                                                   0                                                          Read

                                                                         data 2                                                result                Address                                       1
                                                         register                                                 M
                                                         data
                                                                                                                   u
                                          Data
                               M

                                                         Write
                                                    x                                          memory                                u

                                                         data                                                                                                                                       x
                                                                                                                  1
                                                                                                                                                                                                   0
                                                                                                                                                     Write

                                                                                                                                                     data

                                                      Instruction
16                       32                 6
                                                      [15– 0]                    Sign
                                   ALU

     Control signals                                                            extend                                  control
                                                                                                                                                                          MemRead


     emanate from                                     Instruction

                                                      [20– 16]
                                                                                                             0              ALUOp
     the control                                      Instruction

                                                                                                             M

                                                                                                              u

                                                                                                              x
     portions of the                                  [15– 11]
                                                                                                             1
                                                                                                                  RegDst
     pipeline registers
 11/27/2010                                                                                                                                                                                              41
                                 IF: lw $10, 20($1)            ID: before<1>                                                          EX: before<2>                            MEM: before<3>                               WB: before<4>




Pipelined
                                                            IF/ID                                                                ID/EX                                   EX/MEM                                          MEM/WB
                                                   0
                                                   M
                                                                   00               00
                                                    u
                                                                            WB
                                                    x
                                                   1                                                                    000              000                                    00
                                                                                                              Control             M                                       WB
                                                                                                                                     0                                          0                                            0
                                                                                                                        0000         00                                         0
                                                                                                                                  EX                                       M                                              WB 0




Execution
                                                                                                                                     0                                          0


                                                   Add

                                                                                                                                                                Add

                                         4                                                                                                                Add result




                                                                                                   RegWrite
                                                                                                                                          Shift
                                     Branch
                                                                                                                                         left 2




                                                                                                                                                                                                   MemWrite
                                                                                                                                                               ALUSrc




and
                                                                                          Read





                                                                                                                                                                                                                                  MemtoReg
                                                                    Instruction
                                  PC     Address                                          register 1       Read

                                                                                          Read
           data 1
                                                                                          register 2                                                             Zero
                                             Instruction

                                                                                                 Registers Read
                                             ALU ALU

                                               memory                                     Write
                                                   0                                                             Read

                                                                                                          data 2                                                result                   Address                                       1
                                                                                          register                                                 M
                                                            data
                                                                                                                                                    u
                                             Data
                               M

                                                                                          Write
                                                    x                                             memory                                u

                                                                                          data                                                                                                                                          x
                                                                                                                                                   1




Control
                                                                                                                                                                                                                                       0
                                                                                                                                                                                         Write

                                                                                                                                                                                         data

                                                                                       Instruction

                                                                                       [15– 0]                 Sign
                                      ALU
                                                MemRead
                                                                                                              extend                                     control

                                                                                       Instruction

                                                                                       [20– 16]
                                                                                                                                               0             ALUOp

                             Clock cycle 1                                             Instruction

                                                                                       [15– 11]
                                                                                                                                               M

                                                                                                                                                u

                                                                                                                                                x


      Instruction
                                                                                                                                               1
                                         Clock 1                                                                                                   RegDst




     sequence:                   IF: sub $11, $2, $3           ID: lw $10, 20($1)                                                     EX: before<1>                            MEM: before<2>                               WB: before<3>


                                                            IF/ID                                                                ID/EX                                   EX/MEM                                          MEM/WB
                                                   0
                                                   M


     lw      $10,   20($1)                         1
                                                    u

                                                    x
                                                                                  lw
                                                                                                                        11


                                                                                                                        010
                                                                                                                                  WB
                                                                                                                                         00


                                                                                                                                         000                                    00
                                                                                                              Control             M                                       WB

     sub     $11,   $2, $3                                                                                              0001
                                                                                                                                  EX
                                                                                                                                     0
                                                                                                                                     00
                                                                                                                                     0
                                                                                                                                                                           M
                                                                                                                                                                                0
                                                                                                                                                                                0
                                                                                                                                                                                0
                                                                                                                                                                                                                             0
                                                                                                                                                                                                                          WB 0


     and     $12,   $4, $7                         Add

                                                                                                                                                                Add


     or      $13,   $6, $7               4                                                                                                                Add result




                                                                                                   RegWrite
                                                                                                                                          Shift
                                     Branch
                                                                                                                                         left 2




                                                                                                                                                                                                   MemWrite
     add     $14,   $8, $9                                                        1       Read

                                                                                                                                                               ALUSrc




                                                                                                                                                                                                                                  MemtoReg
                                                                    Instruction




                                                                                          register 1
                                  PC     Address                                                           Read
 $1
                                                                                  X                       data 1
                                                                                          Read

                                                                                          register 2                                                             Zero
                                             Instruction

                                                                                                 Registers Read
 $X                                          ALU ALU

                                               memory                                     Write
                                                   0                                                             Read

                                                                                                          data 2                                                result                   Address                                       1
                                                                                          register                                                 M
                                                            data
                                                                                                                                                    u
                                             Data
                               M

                                                                                          Write
                                                    x                                             memory                                u

                                                                                          data                                                                                                                                          x
                                                                                                                                                   1

    Label “before<i>” means
                                                                                                                                                                                                                                       0
                                                                                                                                                                                         Write

                                                                                                                                                                                         data

                                                                                       Instruction


    i th instruction before                                                       20   [15– 0]


                                                                                       Instruction

                                                                                                               Sign

                                                                                                              extend
                                                                                                                          20                              ALU

                                                                                                                                                         control
                                                                                                                                                                                                              MemRead




    lw                                                                            10   [20– 16]                           10
                                                                                                                                               0             ALUOp


                            Clock cycle 2
                                                                                                                                               M

                                                                                       Instruction
                                             u

                                                                                  X    [15– 11]                              X                  x
                                                                                                                                               1
11/27/2010                               Clock 2                                                                                                   RegDst                                                                                    42
                                    IF: and $12, $4, $5            ID: sub $11, $2, $3                                                      EX: lw $10, . . .                         MEM: before<1>                               WB: before<2>




Pipelined
                                                                IF/ID                                                                  ID/EX                                    EX/MEM                                          MEM/WB
                                                      0
                                                      M
                                                                     10                11
                                                       u
                                                                               WB
                                                       x
                                                      1                               sub                                    000               010                                     00
                                                                                                                   Control              M                                        WB
                                                                                                                                           0                                           0                                            0
                                                                                                                             1100          00                                          0
                                                                                                                                        EX                                        M                                              WB 0
                                                                                                                                           1                                           0




Execution                                   4
                                                      Add

                                                                                                                                                                       Add

                                                                                                                                                                 Add result




                                                                                                        RegWrite
                                                                                                                                                 Shift
                                     Branch
                                                                                                                                                left 2




                                                                                                                                                                                                          MemWrite
                                                                                                                                                                      ALUSrc
                                                                                      2        Read





                                                                                                                                                                                                                                         MemtoReg
                                                                        Instruction
and
                                     PC     Address                                            register 1       Read
 $2                       $1
                                                                                      3        Read
           data 1
                                                                                               register 2                                                               Zero
                                                 Instruction

                                                                                                      Registers Read
 $3                                            ALU ALU

                                                   memory                                      Write
                                                     0                                                             Read

                                                                                                               data 2                                                  result                   Address                                       1
                                                                                               register                                                   M
                                                            data
                                                                                                                                                           u
                                             Data
                               M

                                                                                               Write
                                                      x                                             memory                                u

                                                                                               data                                                                                                                                            x
                                                                                                                                                          1
                                                                                                                                                                                                                                              0
                                                                                                                                                                                                Write





Control
                                                                                                                                                                                                data

                                                                                            Instruction

                                                                                      X     [15– 0]                 Sign
         X            20                ALU
                                                MemRead
                                                                                                                   extend                                       control

                                                                                            Instruction

                                                                                      X     [20– 16]                              X            10
                                                                                                                                                     0              ALUOp

                             Clock cycle 3                                            11
                                                                                            Instruction

                                                                                            [15– 11]                              11
                                                                                                                                                     M



                                                                                                                                                     1
                                                                                                                                                      u

                                                                                                                                                      x



    Instruction                                 Clock 3                                                                                                   RegDst




   sequence:                        IF: or $13, $6, $7             ID: and $12, $2, $3                                                      EX: sub $11, . . .                        MEM: lw $10, . . .                           WB: before<1>


                                                                IF/ID                                                                  ID/EX                                    EX/MEM                                          MEM/WB
                                                      0
                                                      M
                                                                     10                10
                                                       u
                                                                               WB

   lw        $10,   20($1)                            1
                                                       x
                                                                                      and
                                                                                                                   Control
                                                                                                                             000
                                                                                                                                        M
                                                                                                                                               000
                                                                                                                                                                                 WB
                                                                                                                                                                                       11

                                                                                                                                           1                                           0                                            0

   sub       $11,   $2, $3                                                                                                   1100
                                                                                                                                        EX
                                                                                                                                           10
                                                                                                                                           0
                                                                                                                                                                                  M
                                                                                                                                                                                       1
                                                                                                                                                                                       0
                                                                                                                                                                                                                                 WB 0




   and       $12,   $4, $7                  4
                                                      Add

                                                                                                                                                                       Add

                                                                                                                                                                 Add result




                                                                                                        RegWrite
   or        $13,   $6, $7                                                                                                                       Shift

                                                                                                                                                left 2
                                                                                                                                                                                            Branch




                                                                                                                                                                                                          MemWrite
                                                                                                                                                                      ALUSrc


   add       $14,   $8, $9                                                            4        Read





                                                                                                                                                                                                                                         MemtoReg
                                                                        Instruction




                                                                                               register 1
                                     PC     Address                                                             Read
 $4                       $2
                                                                                      5                        data 1
                                                                                               Read

                                                                                               register 2                                                               Zero
                                                 Instruction

                                                                                                      Registers Read
 $5                       $3                   ALU ALU

                                                   memory                                      Write
                                                     0                                     Address                 Read

                                                                                                               data 2                                                  result                                                                 1
                                                                                               register                                                   M
                                                            data
                                                                                                                                                           u
                                             Data
                               M

                                                                                               Write
                                                      x                                                                                   u

                                                                                                                                                                                                         memory                                x
                                                                                               data                                                       1
                                                                                                                                                                                                                                              0
                                                                                                                                                                                                Write

                                                                                                                                                                                                data

                                                                                            Instruction

                                                                                      X     [15– 0]                 Sign
         X                              ALU
                                                MemRead
                                                                                                                   extend                                       control

                                                                                            Instruction

                                                                                      X     [20– 16]                              X
                                                                                                                                                     0              ALUOp
                                                                                                                                                     M
                                10

                             Clock cycle 4
                                       Clock 4
                                                                                      12
                                                                                            Instruction

                                                                                            [15– 11]                              12           11
                                                                                                                                                     1
                                                                                                                                                      u

                                                                                                                                                      x


11/27/2010                                                                                                                                                RegDst
                                                                                                                                                                                                                                                    43
                                  IF: add $14, $8, $9           ID: or $13, $6, $7                                                     EX: and $12, . . .                        MEM: sub $11, . . .                          WB: lw $10, . . .


                                                             IF/ID                                                                ID/EX                                    EX/MEM                                          MEM/WB
                                                   0




Pipelined
                                                   M
                                                                   10                10
                                                    u
                                                                             WB
                                                    x
                                                   1                               or                                   000               000                                     10
                                                                                                              Control              M                                        WB
                                                                                                                                      1                                           0                                            1
                                                                                                                        1100          10                                          0
                                                                                                                                   EX                                        M                                              WB 1
                                                                                                                                      0                                           0




Execution
                                                   Add

                                                                                                                                                                  Add

                                          4                                                                                                                 Add result




                                                                                                   RegWrite
                                                                                                                                            Shift
                                     Branch
                                                                                                                                           left 2




                                                                                                                                                                                                     MemWrite
                                                                                                                                                                 ALUSrc
                                                                                   6        Read





                                                                                                                                                                                                                                    MemtoReg
                                                                     Instruction
                                   PC    Address                                            register 1       Read
 $6                     $4




and
                                                                                   7        Read
           data 1
                                                                                            register 2                                                             Zero
                                              Instruction
                                                                                $5
                                                                                                   Registers Read
 $7                                          ALU ALU

                                                memory                             10       Write
                                                   0                                                             Read

                                                                                                            data 2                                                result                   Address                                       1
                                                                                            register                                                 M
                                                            data
                                                                                                                                                      u
                                             Data
                               M

                                                                                            Write
                                                    x                                             memory                               u

                                                                                            data                                                                                                                                         x
                                                                                                                                                     1
                                                                                                                                                                                                                                         0
                                                                                                                                                                                           Write

                                                                                                                                                                                           data




Control
                                                                                         Instruction

                                                                                   X     [15– 0]               Sign
         X                              ALU
                                                MemRead
                                                                                                              extend                                       control

                                                                                         Instruction

                                                                                   X     [20– 16]                            X
                                                                                                                                                0              ALUOp

                             Clock cycle 5                                         13
                                                                                         Instruction

                                                                                         [15– 11]                            13           12
                                                                                                                                                M

                                                                                                                                                u

                                                                                                                                                x
                                                                                                                                                                                  11                                           10


                                              Clock 5                                                                                           1



    Instruction
                                                                                                                                                     RegDst




                                  IF: after<1>                  ID: add $14, $8, $9                                                    EX: or $13, . . .                         MEM: and $12, . . .                          WB: sub $11, . . .
   sequence:
                                                             IF/ID                                                                ID/EX                                    EX/MEM                                          MEM/WB
                                                   0
                                                   M
                                                                   10                10
                                                    u
                                                                             WB

   lw        $10,   20($1)                         1
                                                    x
                                                                                   add
                                                                                                              Control
                                                                                                                        000
                                                                                                                                   M
                                                                                                                                          000
                                                                                                                                                                            WB
                                                                                                                                                                                  10

                                                                                                                                      1                                           0                                            1

   sub       $11,   $2, $3                                                                                              1100
                                                                                                                                   EX
                                                                                                                                      10
                                                                                                                                      0
                                                                                                                                                                             M
                                                                                                                                                                                  0
                                                                                                                                                                                  0
                                                                                                                                                                                                                            WB 0




   and       $12,   $4, $7                4
                                                   Add

                                                                                                                                                                  Add

                                                                                                                                                            Add result




                                                                                                   RegWrite
   or        $13,   $6, $7                                                                                                                  Shift

                                                                                                                                           left 2
                                                                                                                                                                                       Branch




                                                                                                                                                                                                     MemWrite
                                                                                                                                                                 ALUSrc
                                                                                   8

   add       $14,   $8, $9                                                                  Read





                                                                                                                                                                                                                                    MemtoReg
                                                                     Instruction



                                                                                            register 1
                                   PC    Address                                                             Read
 $8                     $6
                                                                                   9                        data 1
                                                                                            Read

                                                                                            register 2                                                              Zero
                                              Instruction

                                                                                                   Registers Read
 $9                     $7                   ALU ALU

                                                memory                             11       Write
                                                   0                                                             Read

                                                                                                            data 2                                                result                   Address                                       1
                                                                                            register                                                 M
                                                            data
                                                                                                                                                      u
                                             Data
                               M

                                                                                            Write
                                                    x                                             memory                               u

                                                                                            data                                                                                                                                         x
                                                                                                                                                     1

Label “after<i>” means
                                                                                                                                                                                                                                         0
                                                                                                                                                                                           Write

                                                                                                                                                                                           data

                                                                                         Instruction


i th instruction after add                                                         X     [15– 0]               Sign

                                                                                                              extend
                                                                                                                             X                              ALU

                                                                                                                                                           control
                                                                                                                                                                                                                MemRead

                                                                                         Instruction

                                                                                   X     [20– 16]                            X
                                                                                                                                                0              ALUOp

                             Clock cycle 6                                         14
                                                                                         Instruction

                                                                                         [15– 11]                            14           13
                                                                                                                                                M

                                                                                                                                                u

                                                                                                                                                x
                                                                                                                                                1
                                                                                                                                                                                  12                                           11



11/27/2010                                    Clock 6                                                                                                RegDst                                                                                    44
                                   IF: after<2>                   ID: after<1>                                                         EX: add $14, . . .                        MEM: or $13, . . .                           WB: and $12, . . .


                                                               IF/ID                                                              ID/EX                                    EX/MEM                                          MEM/WB
                                                    0




Pipelined
                                                    M
                                                                     00             10
                                                     u
                                                                            WB
                                                     x
                                                    1                                                                      000            000                                     10
                                                                                                                 Control           M                                        WB
                                                                                                                                      1                                           0                                            1
                                                                                                                           0000       10                                          0
                                                                                                                                   EX                                        M                                              WB 0
                                                                                                                                      0                                           0




Execution
                                                    Add

                                                                                                                                                                  Add

                                           4                                                                                                                Add result




                                                                                                      RegWrite
                                                                                                                                            Shift
                                     Branch
                                                                                                                                           left 2




                                                                                                                                                                                                     MemWrite
                                                                                                                                                                 ALUSrc
                                                                                             Read





                                                                                                                                                                                                                                    MemtoReg
                                                                       Instruction
                                    PC    Address                                            register 1       Read
                       $8




and
                                                                                             Read
           data 1
                                                                                             register 2                                                            Zero
                                                Instruction
                                                                              $9
                                                                                                    Registers Read
                                            ALU ALU

                                                  memory                             12      Write
                                                  0                                                             Read

                                                                                                             data 2                                               result                   Address                                       1
                                                                                             register                                                M
                                                            data
                                                                                                                                                      u
                                             Data
                               M

                                                                                             Write
                                                   x                                             memory                                u

                                                                                             data                                                                                                                                         x
                                                                                                                                                     1
                                                                                                                                                                                                                                         0
                                                                                                                                                                                           Write

                                                                                                                                                                                           data




Control Clock cycle 7
                                                                                          Instruction

                                                                                          [15– 0]                 Sign
                                     ALU
                                                MemRead
                                                                                                                 extend                                    control

                                                                                          Instruction

                                                                                          [20– 16]
                                                                                                                                                0              ALUOp
                                                                                                                                                M
                                13                                           12
                                                                                          Instruction
                                           u

                                                                                          [15– 11]                                        14     x
                                                                                                                                                1
                                               Clock 7                                                                                               RegDst




      Instruction                  IF: after<3>                   ID: after<2>                                                         EX: after<1>                              MEM: add $14, . . .                          WB: or $13, . . .


     sequence:                                      0
                                                    M

                                                               IF/ID

                                                                                                                           00
                                                                                                                                  ID/EX

                                                                                                                                          00
                                                                                                                                                                           EX/MEM                                          MEM/WB

                                                     u
                                                                            WB
                                                     x
                                                    1                                                                      000            000                                     10
                                                                                                                 Control           M                                        WB

    lw       $10,   20($1)                                                                                                 0000
                                                                                                                                   EX
                                                                                                                                      0
                                                                                                                                      00
                                                                                                                                      0
                                                                                                                                                                             M
                                                                                                                                                                                  0
                                                                                                                                                                                  0
                                                                                                                                                                                  0
                                                                                                                                                                                                                               1
                                                                                                                                                                                                                            WB 0


    sub      $11,   $2, $3                          Add

                                                                                                                                                                  Add

    and      $12,   $4, $7                 4                                                                                                                Add result




                                                                                                      RegWrite
                                                                                                                                            Shift
                                     Branch
                                                                                                                                           left 2




                                                                                                                                                                                                     MemWrite
    or       $13,   $6, $7                                                                   Read

                                                                                                                                                                 ALUSrc




                                                                                                                                                                                                                                    MemtoReg
                                                                       Instruction




                                    PC    Address                                            register 1
                                                                                                              Read

                                                                                                             data 1

    add      $14,   $8, $9                      Instruction

                                                  memory                             13
                                                                                             Read

                                                                                             register 2
                                                                                             Write

                                                                                                    Registers Read

                                                                                                             data 2
                                                                                                                                                     0
                                                                                                                                                                   Zero
                                                                                                                                                               ALU ALU

                                                                                                                                                                  result                   Address                 Read
                 1
                                                                                             register                                                M
                                                            data
                                                                                                                                                      u
                                             Data
                               M

                                                                                             Write
                                                   x                                             memory                                u

                                                                                             data                                                                                                                                         x
                                                                                                                                                     1
                                                                                                                                                                                                                                         0
                                                                                                                                                                                           Write

                                                                                                                                                                                           data

                                                                                          Instruction

                                                                                          [15– 0]                 Sign
                                     ALU
                                                MemRead
                                                                                                                 extend                                    control

                                                                                          Instruction

                                                                                          [20– 16]
                                                                                                                                                0              ALUOp

                             Clock cycle 8                                                Instruction

                                                                                          [15– 11]
                                                                                                                                                M



                                                                                                                                                1
                                                                                                                                                 u

                                                                                                                                                 x
                                                                                                                                                                                  14                                           13



11/27/2010                                     Clock 8                                                                                               RegDst                                                                                        45
Pipelined Execution and Control

  Instruction             IF: after<4>                   ID: after<3>                                                         EX: after<2>                              MEM: after<1>                                WB: add $14, . . .

 sequence:                                            IF/ID                                                              ID/EX                                    EX/MEM                                          MEM/WB
                                           0
                                           M
                                                                     00             00
                                            u
                                                                            WB
                                            x
                                           1                                                                      000            000                                     00
lw     $10,   20($1)                                                                                    Control

                                                                                                                  0000
                                                                                                                          M
                                                                                                                             0
                                                                                                                             00
                                                                                                                                                                   WB
                                                                                                                                                                         0
                                                                                                                                                                         0
                                                                                                                                                                                                                      1
                                                                                                                          EX                                        M                                              WB 0
sub    $11,   $2, $3                                                                                                         0                                           0


                                           Add
and    $12,   $4, $7              4
                                                                                                                                                         Add

                                                                                                                                                   Add result

or     $13,   $6, $7


                                                                                             RegWrite
                                                                                                                                   Shift
                                     Branch
                                                                                                                                  left 2




                                                                                                                                                                                            MemWrite
add    $14,   $8, $9                                                                Read

                                                                                                                                                        ALUSrc




                                                                                                                                                                                                                           MemtoReg
                                                              Instruction

                           PC    Address                                            register 1       Read

                                                                                    Read
           data 1
                                                                                    register 2                                                            Zero
                                       Instruction

                                                                                           Registers Read
                                            ALU ALU

                                         memory                             14      Write
                                                  0                                                             Read

                                                                                                    data 2                                               result                   Address                                       1
                                                                                    register                                                M
                                                            data
                                                                                                                                             u
                                             Data
                               M

                                                                                    Write
                                                   x                                             memory                                u

                                                                                    data                                                                                                                                         x
                                                                                                                                            1
                                                                                                                                                                                                                                0
                                                                                                                                                                                  Write

                                                                                                                                                                                  data

                                                                                 Instruction

                                                                                 [15– 0]                 Sign
                                     ALU
                                                MemRead
                                                                                                        extend                                    control

                                                                                 Instruction

                                                                                 [20– 16]
                                                                                                                                       0              ALUOp
                                                                                                                                       M
                                                                             14
                                                                                                                                        u

                       Clock cycle 9                                             Instruction

                                                                                 [15– 11]
                                                                                                                                       1
                                                                                                                                        x

                                      Clock 9                                                                                               RegDst




11/27/2010                                                                                                                                                                                                                            46
Revisiting Hazards
     So far our datapath and control have ignored
     hazards
     We shall revisit data hazards and control
     hazards and enhance our datapath and control
     to handle them in hardware…




11/27/2010                                          47
      Data Hazards and Forwarding
        Problem with starting an instruction before previous are finished:
              data dependencies that go backward in time – called data hazards

                                         Time (in clock cycles)
  $2 = 10 before sub;                Value of 
 CC 1              CC 2   CC 3   CC 4    CC 5     CC 6   CC 7   CC 8   CC 9
  $2 = -20 after sub                 register $2: 10              10     10     10     10/– 20   – 20   – 20   – 20   – 20
                             Program

                             execution

                             order

                             (in instructions)
                                 sub $2, $1, $3     IM            Reg           DM      Reg


sub    $2,         $1, $3
and    $12,        $2, $5        and $12, $2, $5                  IM     Reg             DM      Reg

or     $13,        $6, $2
add    $14,        $2, $2        or $13, $6, $2                          IM     Reg              DM     Reg
sw     $15,        100($2)
                                 add $14, $2, $2                                IM      Reg             DM     Reg




                                 sw $15, 100($2)                                         IM      Reg           DM     Reg



      11/27/2010                                                                                                             48
Software Solution
    Have compiler guarantee never any data hazards!
         by rearranging instructions to insert independent instructions
         between instructions that would otherwise have a data hazard
         between them,
         or, if such rearrangement is not possible, insert nops
 sub         $2,   $1, $3                    sub         $2,      $1, $3
 lw          $10, 40($3)                     nop
 slt         $5, $6, $7                      nop
 and         $12, $2, $5            or       and         $12,      $2, $5
 or          $13, $6, $2                     or          $13,      $6, $2
 add         $14, $2, $2                     add         $14,      $2, $2
 sw          $15, 100($2)                    sw          $15,      100($2)
    Such compiler solutions may not always be possible, and nops
    slow the machine down

11/27/2010
                            MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0   49
Hardware Solution: Forwarding

          Idea: use intermediate data, do not wait for result to
          be finally written to the destination register. Two
          steps:
     1.      Detect data hazard
     2.      Forward intermediate data to resolve hazard




11/27/2010                                                         50
Pipelined Datapath with Control
II (as before)
                      PCSrc



                                                                                               ID/EX
                     0
                     M

                      u
                                                                        WB
                      x                                                                                                                EX/MEM
                     1
                                                                                Control         M                                       WB
                                                                                                                                                                                    MEM/WB

                                                                                                EX                                       M                                           WB
                               IF/ID


                     Add

                                                                                                                              Add

           4                                                                                                            Add result

                                                                    RegWrite
                                                                                                                                                Branch
                                                                                                        Shift

                                                                                                       left 2




                                                                                                                                                              MemWrite
                                                                                                                             ALUSrc
                                                        Read





                                                                                                                                                                                             MemtoReg
                                       Instruction




      PC   Address                                      register 1
                                                                         Read

                                                                        data 1
                                                        Read

                                                        register 2                                                             Zero
                Instruction

                                                               Registers Read
                                             ALU ALU

                  memory                                Write
                                                   0                                                          Read

                                                                        data 2                                                result                Address                                       1
                                                        register                                                 M
                                                         data
                                                                                                                  u
                                          Data
                               M

                                                        Write
                                                    x                                          memory                                u

                                                        data                                                                                                                                       x
                                                                                                                 1
                                                                                                                                                                                                  0
                                                                                                                                                    Write

                                                                                                                                                    data

                                                     Instruction
16                       32                 6
                                                     [15– 0]                    Sign
                                   ALU

    Control signals                                                            extend                                  control
                                                                                                                                                                         MemRead


    emanate from                                     Instruction

                                                     [20– 16]
                                                                                                            0              ALUOp
    the control                                      Instruction

                                                                                                            M

                                                                                                             u

                                                                                                             x
    portions of the                                  [15– 11]
                                                                                                            1
                                                                                                                 RegDst
    pipeline registers
11/27/2010                                                                                                                                                                                              51
Hazard Detection
     Hazard conditions:
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
          Eg., in the earlier example, first hazard between sub $2, $1, $3 and
         and $12, $2, $5 is detected when the and is in EX stage and the
         sub is in MEM stage because
             EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a)


     Whether to forward also depends on:
          if the later instruction is going to write a register – if not, no need to forward,
          even if there is register number match as in conditions above
          if the destination register of the later instruction is $0 – in which case
         there is no need to forward value ($0 is always 0 and never overwritten)
11/27/2010                                                                                  52
 Data Forwarding
      Plan:
              allow inputs to the ALU not just from ID/EX, but also later
              pipeline registers, and
              use multiplexors and control signals to choose appropriate
              inputs to ALU
                                         Time (in clock cycles)
                                                CC 1      CC 2    CC 3   CC 4    CC 5     CC 6   CC 7   CC 8   CC 9
                          Value of register $2 : 10         10    10     10     10/– 20   – 20   – 20   – 20   – 20
                            Value of EX/MEM : X              X     X     – 20      X       X      X      X      X
                           Value of MEM/WB : X               X     X      X       – 20     X      X      X      X

                           Program

                           execution order

                           (in instructions)
                              sub $2, $1, $3    IM         Reg           DM      Reg

sub   $2,       $1, $3
and   $12,      $2, $5        and $12, $2, $5               IM    Reg             DM      Reg
or    $13,      $6, $2
add   $14,      $2, $2
                              or $13, $6, $2                      IM     Reg              DM     Reg
sw    $15,      100($2)

                              add $14, $2, $2                            IM      Reg             DM     Reg




                              sw $15, 100($2)                                     IM      Reg           DM     Reg

 11/27/2010                                                                                                           53
                                 Dependencies between pipelines move forward in time
                                    ID/EX                                 EX/MEM                  MEM/WB




Forwarding            Registers                     ALU



Hardware                                                                                Data

                                                                                       memory              M

                                                                                                           u

                                                                                                           x




                          Datapath before adding forwarding hardware
              a. No forwarding


                                    ID/EX                                 EX/MEM                  MEM/WB



                                               M

                                               u

                                               x
                      Registers
                                                 ForwardA         ALU

                                               M
                                       Data

                                               u
                                      memory
                                               x                                                           M

                                                                                                           u

                                                                                                           x

                                        Rs   ForwardB
                                        Rt
                                        Rt     M

                                               u
                                  EX/MEM.RegisterRd
                                        Rd
                                               x
                                                            Forwarding
            MEM/WB.RegisterRd
                                                               unit


 11/27/2010                                                                                                 54
              b. With forwarding Datapath after adding forwarding hardware
Forwarding Hardware:
Multiplexor Control

 Mux control     Source     Explanation
 ForwardA = 00   ID/EX  The first ALU operand comes from the register file
 ForwardA = 10   EX/MEM The first ALU operand is forwarded from prior ALU result
 ForwardA = 01   MEM/WB The first ALU operand is forwarded from data memory
                       or an earlier ALU result
 ForwardB = 00   ID/EX  The second ALU operand comes from the register file
 ForwardB = 10   EX/MEM The second ALU operand is forwarded from prior ALU result
 ForwardB = 01   MEM/WB The second ALU operand is forwarded from data memory
                       or an earlier ALU result




                              Depending on the selection in the rightmost multiplexor
                              (see datapath with control diagram)

11/27/2010                                                                              55
Data Hazard: Detection and
Forwarding
         Forwarding unit determines multiplexor control according to the
         following rules:

1.        EX hazard
        if (      EX/MEM.RegWrite                       // if there is a write…
             and ( EX/MEM.RegisterRd ≠ 0 )                // to a non-$0 register…
             and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then…
          ForwardA = 10

        if (    EX/MEM.RegWrite                        // if there is a write…
           and ( EX/MEM.RegisterRd ≠ 0 )                 // to a non-$0 register…
           and ( EX/MEM.RegisterRd = ID/EX.RegisterRt ) ) // which matches, then…
         ForwardB = 10




11/27/2010                                                                            56
               Data Hazard: Detection and
               Forwarding
2.        MEM hazard
        if (      MEM/WB.RegWrite                       // if there is a write…
            and ( MEM/WB.RegisterRd ≠ 0 )                 // to a non-$0 register…
            and ( EX/MEM.RegisterRd ≠ ID/EX.RegisterRs )      // and not already a register match
                                                    // with earlier pipeline register…
            and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline
                                                       register, then…
        ForwardA = 01

        if (      MEM/WB.RegWrite                        // if there is a write…
            and ( MEM/WB.RegisterRd ≠ 0 )                  // to a non-$0 register…
            and ( EX/MEM.RegisterRd ≠ ID/EX.RegisterRt )       // and not already a register match
                                                    // with earlier pipeline register…
            and ( MEM/WB.RegisterRd = ID/EX.RegisterRt ) ) // but match with later pipeline
                                                       register, then…
        ForwardB = 01


           This check is necessary, e.g., for sequences such as add $1, $1, $2; add $1, $1, $3; add $1, $1, $4;
           (array
     11/27/2010 summing?), where an earlier pipeline (EX/MEM) register has more recent data                       57
                     Forwarding Hardware with
                     Control                                  ID/EX
                                                                           Called forwarding unit, not hazard detection unit,
                                                                           because once data is forwarded there is no hazard!

                                                               WB
                                                                                                 EX/MEM

                                             Control           M                                  WB
                                                                                                                         MEM/WB


                     IF/ID                                     EX                                  M                          WB




                                                                             M

                             Instruction




                                                                             u

                                                                             x
                                                  Registers
      Instruction
                                                                                              Data

PC                                                                                    ALU
        memory                                                                                                 memory              M

                                                                                                                                   u

                                                                             M
                                                    x
                                                                             u

                                                                             x

                                           IF/ID.RegisterRs           Rs
                                           IF/ID.RegisterRt           Rt
                                           IF/ID.RegisterRt           Rt
                                                                             M
                           EX/MEM.RegisterRd
                                           IF/ID.RegisterRd           Rd     u

                                                                             x
                                                                                   Forwarding
            MEM/WB.RegisterRd
                                                                                      unit




               Datapath with forwarding hardware and control wires – certain details,
               e.g., branching hardware, are omitted to simplify the drawing
     11/27/2010                                                                      58
               Note: so far we have only handled forwarding to R-type instructions…!
                            or $4, $4, $2                and $4, $2, $5                                   sub $2, $1, $3                       before<1>              before<2>

                                                                                                    ID/EX
                                                                                               10           10
                                                                                                     WB
                                                                                                                                         EX/MEM

                                                                               Control               M                                    WB




Forwarding
                                                                                                                                                                  MEM/WB

                                                     IF/ID                                           EX                                    M                       WB


                                                                           2                   $2           $1
                                                                                                                    M





                                                             Instruction
                                                                           5                                        u

                                                                                                                    x
                                                                                   Registers
                                      Instruction
                                                                                                        Data

                               PC                                                                                             ALU
                                        memory                                                                                                           memory             M

                                                                                               $5           $3
                                                                                                                                                                            u

                                                                                                                    M
                                                      x
                                                                                                                    u

                                                                                                                    x


                                                                                                2           1
                                                                                                5           3
                                                                                                                    M

                                                                                                4           2       u

                                                                                                                    x
                                                                                                                           Forwarding


                        Clock cycle 3                                                                                         unit



    Execution               Clock 3

    example:                add $9, $4, $2               or $4, $4, $2                                    and $4, $2, $5                       sub $2, . . .          before<1>

                                                                                                    ID/EX
                                                                                               10           10
                                                                                                     WB
sub    $2,   $1,   $3                                                                                                                    EX/MEM
                                                                                                                                                10
                                                                               Control               M                                    WB
                                                                                                                                                                  MEM/WB
and    $4,   $2,   $5                                                                                EX                                    M                       WB
                                                     IF/ID
or     $4,   $4,   $2                                                      4                   $4           $2

add    $9,   $4,   $2                                                                                               M

                                                             Instruction




                                                                           6                                        u

                                                                                                                    x
                                                                                   Registers
                                      Instruction
                                                                                                        Data

                               PC                                                                                             ALU
                                        memory                                                                                                           memory             M

                                                                                               $2           $5
                                                                                                                                                                            u

                                                                                                                    M
                                                      x
                                                                                                                    u

                                                                                                                    x


                                                                                                2           2
                                                                                                6           5
                                                                                                                    M
                          2
                                                                                                4           4       u

                                                                                                                    x
                                                                                                                           Forwarding



11/27/2010
                        Clock cycle 4                                                                                         unit

                                                                                                                                                                                  59
                            Clock 4
                             after<1>                      add $9, $4, $2                                        or $4, $4, $2                        and $4, . . .           sub $2, . . .

                                                                                                           ID/EX
                                                                                                      10           10
                                                                                                            WB
                                                                                                                                                EX/MEM
                                                                                                                                                       10
                                                                                     Control                M                                    WB
                                                                                                                                                                          MEM/WB




Forwarding
                                                                                                                                                                                1
                                                       IF/ID                                                EX                                    M                        WB


                                                                             4                        $4           $4
                                                                                                                            M





                                                               Instruction
                                                                             2                                              u

                                                                                                                            x
                                                                                          Registers
                                        Instruction
                             2                                                                                Data

                                PC                                                                                                   ALU
                                          memory                                                                                                                 memory                M

                                                                                                      $2           $2
                                                                                                                                                                                       u

                                                                                                                            M
                                                         x
                                                                                                                            u

                                                                                                                            x


                                                                                                       4           4
                                                                                                       2           2
                                                                                                                            M
                         4                        2
                                                                                                       9           4        u


     Execution
                                                                                                                            x

                                                                                                                                  Forwarding


                        Clock cycle 5                                                                                                unit


    example                  Clock 5

    (cont.):                 after<2>                      after<1>                                              add $9, $4, $2                       or $4, . . .            and $4, . . .

                                                                                                           ID/EX
                                                                                                                   10
                                                                                                            WB
sub    $2,   $1,   $3                                                                                                                           EX/MEM
                                                                                                                                                       10
                                                                                     Control                M                                    WB
                                                                                                                                                                          MEM/WB
and    $4,   $2,   $5                                                                                       EX                                    M                        WB
                                                                                                                                                                                1
                                                       IF/ID
or     $4,   $4,   $2
                                                                                                                   $4

add    $9,   $4,   $2                                                                                                       M

                                                               Instruction




                                                                                                                            u

                                                                                                                            x
                                                                                          Registers
                                        Instruction
                             4                                                                                Data

                                PC                                                                                                   ALU
                                          memory                                                                                                                 memory                M

                                                                                                                   $2
                                                                                                                                                                                       u

                                                                                                                            M
                                                         x
                                                                                                                            u

                                                                                                                            x


                                                                                                                   4
                                                                                                                   2

                                                                                                                            M
                         4                        4
                                                                                                                   9        u

                                                                                                                            x
                                                                                                                                  Forwarding



11/27/2010
                        Clock cycle 6                                                                                                unit

                                                                                                                                                                                        60
                             Clock 6
 Data Hazards and Stalls
        Load word can still cause a hazard:
              an instruction tries to read a register following a load instruction that writes
              to the same register


lw    $2,     20($1)               Time (in clock cycles)
                          Program
          CC 1       CC 2   CC 3   CC 4   CC 5   CC 6   CC 7   CC 8   CC 9
and   $4,     $2, $5      execution

                          order

or    $8,     $2, $6      (in instructions)
                           lw $2, 20($1)    IM         Reg           DM     Reg
add   $9,     $4, $2
Slt   $1,     $6, $7
                           and $4, $2, $5               IM    Reg           DM     Reg


As even a pipeline
                           or $8, $2, $6                      IM     Reg           DM     Reg
dependency goes
backward in time
                           add $9, $4, $2                            IM     Reg           DM     Reg
forwarding will not
solve the hazard
                           slt $1, $6, $7                                   IM     Reg           DM     Reg




              therefore, we need a hazard detection unit to stall the pipeline after the
 11/27/2010
              load instruction                                                                                 61
Pipelined Datapath with Control II
(as before)
                      PCSrc



                                                                                               ID/EX
                     0
                     M

                      u
                                                                        WB
                      x                                                                                                                EX/MEM
                     1
                                                                                Control         M                                       WB
                                                                                                                                                                                    MEM/WB

                                                                                                EX                                       M                                           WB
                               IF/ID


                     Add

                                                                                                                              Add

           4                                                                                                            Add result

                                                                    RegWrite
                                                                                                                                                Branch
                                                                                                        Shift

                                                                                                       left 2




                                                                                                                                                              MemWrite
                                                                                                                             ALUSrc
                                                        Read





                                                                                                                                                                                             MemtoReg
                                       Instruction




      PC   Address                                      register 1
                                                                         Read

                                                                        data 1
                                                        Read

                                                        register 2                                                             Zero
                Instruction

                                                               Registers Read
                                             ALU ALU

                  memory                                Write
                                                   0                                                          Read

                                                                        data 2                                                result                Address                                       1
                                                        register                                                 M
                                                         data
                                                                                                                  u
                                          Data
                               M

                                                        Write
                                                    x                                          memory                                u

                                                        data                                                                                                                                       x
                                                                                                                 1
                                                                                                                                                                                                  0
                                                                                                                                                    Write

                                                                                                                                                    data

                                                     Instruction
16                       32                 6
                                                     [15– 0]                    Sign
                                   ALU

    Control signals                                                            extend                                  control
                                                                                                                                                                         MemRead


    emanate from                                     Instruction

                                                     [20– 16]
                                                                                                            0              ALUOp
    the control                                      Instruction

                                                                                                            M

                                                                                                             u

                                                                                                             x
    portions of the                                  [15– 11]
                                                                                                            1
                                                                                                                 RegDst
    pipeline registers
11/27/2010                                                                                                                                                                                              62
Hazard Detection Logic to Stall

       Hazard detection unit implements the following check if
       to stall

  if ( ID/EX.MemRead                          // if the instruction in the EX stage is
       a load…
       and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs )           // and the destination
       register
         or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source
       register
                                      // of the instruction in the ID stage, then…
  stall the pipeline

11/27/2010                                                                        63
Mechanics of Stalling
     If the check to stall verifies, then the pipeline needs to stall only 1
     clock cycle after the load as after that the forwarding unit can
     resolve the dependency
     What the hardware does to stall the pipeline 1 cycle:
             does not let the IF/ID register change (disable write!) – this will cause
             the instruction in the ID stage to repeat, i.e., stall
             therefore, the instruction, just behind, in the IF stage must be stalled
             as well – so hardware does not let the PC change (disable write!) –
             this will cause the instruction in the IF stage to repeat, i.e., stall
             changes all the EX, MEM and WB control fields in the ID/EX pipeline
             register to 0, so effectively the instruction just behind the load
             becomes a nop – a bubble is said to have been inserted into the
             pipeline
               note that we cannot turn that instruction into an nop by 0ing all the bits
               in the instruction itself – recall nop = 00…0 (32 bits) – because it has
               already been decoded and control signals generated
11/27/2010                                                                            64
  Hazard Detection Unit
                                                       Hazard
                    ID/EX.MemRead
                                                      detection

                                                         unit                                ID/EX

                                                                                              WB
                           IF/IDWrite
                                                                                                                             EX/MEM
                                                                                        M

                                                                   Control              u
    M                               WB
                                                                                        x                                                            MEM/WB
                                                                              0
                             IF/ID                                                            EX                               M                          WB
  PCWrite




                                                                                                          M

                                        Instruction




                                                                                                          u

                                                                                                          x
                                                                       Registers
            Instruction
                                                                                                                   Data

      PC                                                                                                          ALU
              memory                                                                                                                      memory               M

                                                                                                                                                               u

                                                                                                          M
                                                   x
                                                                                                          u

                                                                                                          x


                                                                     IF/ID.RegisterRs
                                                                     IF/ID.RegisterRt
                                                                     IF/ID.RegisterRt                Rt   M
                          EX/MEM.RegisterRd
                                                                     IF/ID.RegisterRd                Rd   u

                                                                                                          x
                                                                     ID/EX.RegisterRt                Rs        Forwarding
            MEM/WB.RegisterRd
                                                                                                     Rt           unit




           Datapath with forwarding hardware, the hazard detection unit and
           controls wires – certain details, e.g., branching hardware are omitted
11/27/2010                                                                                                                                                          65
           to simplify the drawing
Stalling Resolves a Hazard
      Same instruction sequence as before for which forwarding by
      itself could not resolve the hazard:

                      Program
          Time (in clock cycles)
                      execution
           CC 1          CC 2    CC 3   CC 4     CC 5   CC 6   CC 7   CC 8   CC 9   CC 10
                      order

                      (in instructions)

lw    $2,    20($1)     lw $2, 20($1)       IM          Reg              DM      Reg

and   $4,    $2, $5
or    $8,    $2, $6
                        and $4, $2, $5                   IM      Reg     Reg            DM     Reg
add   $9,    $4, $2
Slt   $1,    $6, $7
                        or $8, $2, $6                            IM      IM      Reg           DM     Reg

                                                                        bubble

                        add $9, $4, $2                                            IM    Reg           DM     Reg




                        slt $1, $6, $7                                                  IM     Reg           DM     Reg


                Hazard detection unit inserts a 1-cycle bubble in the pipeline, after
11/27/2010      which all pipeline register dependencies go forward so then the 66
                forwarding unit can handle them and there are no more hazards
                            and $4, $2, $5                          lw $2, 20($1)                                                     before<1>                            before<2>            before<3>
                                                                                        Hazard

                                                                                                               ID/EX.MemRead
                                                                                       detection

                                                                           1              unit                                  ID/EX
                                                                           X
                                                                                                                           11
                                                                                                                                 WB




                                                       IF/IDWrite
                                                                                                                                                                     EX/MEM




Stalling
                                                                                                                    M

                                                                                                    Control         u
           M                                    WB
                                                                                                                    x                                                                       MEM/WB
                                                                                                              0
                                                          IF/ID                                                                  EX                                    M                     WB


                                                                                   1                                       $1




                              PCWrite
                                                                                                                                                  M





                                                                     Instruction
                                                                                   X                                                              u

                                                                                                                                                  x
                                                                                                        Registers
                                        Instruction
                                                                                                                                Data

                                   PC                                                                                                                     ALU
                                          memory                                                                                                                                   memory             M

                                                                                                                           $X
                                                                                                                                                                                                      u

                                                                                                                                                  M
                                                  x
                                                                                                                                                  u

                                                                                                                                                  x



  Execution                                                                                                                 1
                                                                                                                            X
                                                                                                                            2
                                                                                                                                                  M


 example:
                                                                                                                                                  u

                                                                                                                                                  x
                                                                                                        ID/EX.RegisterRt                               Forwarding

                                                                                                                                                          unit


                        Clock cycle 2
                            Clock 2

 lw      $2,   20($1)       or $4, $4, $2                           and $4, $2, $5                                                    lw $2, 20($1)                        before<1>            before<2>
                                                                                        Hazard

 and     $4,   $2, $5                                                       2
                                                                                       detection

                                                                                          unit
                                                                                                               ID/EX.MemRead
                                                                                                                                ID/EX
                                                                            5
 or      $4,   $4, $2                                                                                                      00
                                                                                                                                 WB
                                                                                                                                             11
                                                       IF/IDWrite

                                                                                                                                                                     EX/MEM


 add     $9,   $4, $2                                                                               Control
                                                                                                                    M

                                                                                                                    u

                                                                                                                    x
                                                                                                                                 M                                    WB
                                                                                                                                                                                            MEM/WB
                                                                                                              0
                                                          IF/ID                                                                  EX                                    M                     WB


                                                                                   2                                       $2           $1
                              PCWrite




                                                                                                                                                  M

                                                                     Instruction




                                                                                   5                                                              u

                                                                                                                                                  x
                                                                                                        Registers
                                        Instruction
                                                                                                                                Data

                                   PC                                                                                                                     ALU
                                          memory                                                                                                                                   memory             M

                                                                                                                           $5           $X
                                                                                                                                                                                                      u

                                                                                                                                                  M
                                                  x
                                                                                                                                                  u

                                                                                                                                                  x


                                                                                                                            2           1
                                                                                                                            5           X
                                                                                                                                      2           M

                                                                                                                            4                     u

                                                                                                                                                  x
                                                                                                        ID/EX.RegisterRt                               Forwarding

                                                                                                                                                          unit
11/27/2010              Clock cycle 3                                                                                                                                                                  67
                            Clock 3
                                    or $4, $4, $2                    and $4, $2, $5                                                        bubble                               lw $2, . . .            before<1>
                                                                                         Hazard

                                                                                                                    ID/EX.MemRead
                                                                                        detection

                                                                           2               unit                                      ID/EX
                                                                           5
                                                                                                                                10           00
                                                                                                                                      WB




                                                       IF/IDWrite
                                                                                                                                                                          EX/MEM
                                                                                                                          M
                                                     11
                                                                                                         Control          u
          M                                    WB
                                                                                                                          x                                                                         MEM/WB
                                                                                                                   0




Stalling                                                  IF/ID


                                                                                    2                                           $2
                                                                                                                                      EX


                                                                                                                                           $2
                                                                                                                                                                            M                        WB




                         PCWrite
                                                                                                                                                     M





                                                                      Instruction
                                                                                    5                                                                u

                                                                                                                                                     x
                                                                                                             Registers
                                        Instruction
                                                                                                                                        Data

                               PC                                                                                                                              ALU
                                          memory                                                                                                                                           memory                M

                                                                                                                                $5         $5
                                                                                                                                                                                                                 u

                                                                                                                                                     M
                                                          x
                                                                                                                                                     u

                                                                                                                                                     x


                                                                                                                                 2         2

  Execution                                                                                                                      5         5

                                                                                                                                                     M
                          2
                                                                                                                                 4         4         u


 example
                                                                                                                                                     x
                                                                                                             ID/EX.RegisterRt                               Forwarding

                                                                                                                                                               unit

                    Clock cycle 4
 (cont.):                     Clock 4

                              add $9, $4, $2                         or $4, $4, $2                                                         and $4, $2, $5                       bubble                 lw $2, . . .
                                                                                         Hazard

                                                                                                                    ID/EX.MemRead
                                                                                        detection

                                                                            4
 lw      $2,   20($1)                                                       2
                                                                                           unit

                                                                                                                                10
                                                                                                                                     ID/EX
                                                                                                                                             10
                                                                                                                                      WB
                                                        IF/IDWrite




 and     $4,   $2, $5                                                                                    Control
                                                                                                                          M

                                                                                                                          u
          M
                                                                                                                                                                          EX/MEM

                                                                                                                                                                           WB
                                                                                                                                                                                 0
                                                                                                                                                                                                    MEM/WB
 or      $4,   $4, $2                                     IF/ID
                                                                                                                    0
                                                                                                                          x

                                                                                                                                      EX                                    M                        WB
                                                                                                                                                                                                          11


 add     $9,   $4, $2                                                               4
                          PCWrite




                                                                                                                                $4           $2
                                                                                                                                                     M

                                                                      Instruction




                                                                                    2                                                                u

                                                                                                                                                     x
                                                                                                             Registers
                                        Instruction
                                                 2                                                                                      Data

                               PC                                                                                                                              ALU
                                          memory                                                                                                                                           memory                M

                                                                                                                                $2           $5
                                                                                                                                                                                                                 u

                                                                                                                                                     M
                                                          x
                                                                                                                                                     u

                                                                                                                                                     x


                                                                                                                                 4           2
                                                                                                                                 2           5
                                                                                                                                                     M
                                                   2
                                                                                                                                 4           4       u

                                                                                                                                                     x
                                                                                                             ID/EX.RegisterRt                               Forwarding

                                                                                                                                                               unit

11/27/2010          Clock cycle 5                                                                                                                                                                                     68
                              Clock 5
                                after<1>                                add $9, $4, $2                                                               or $4, $4, $2                   and $4, . . .              bubble
                                                                                            Hazard
                   ID/EX.MemRead
                                                                                           detection

                                                                                4
                                                                                              unit                                      ID/EX
                                                                                2
                                                                                                                                   10           10
                                                                                                                                         WB




                                                           IF/IDWrite
                                                                                                                                                                               EX/MEM




Stalling
                                                                                                                             M
                                                       10
                                                                                                            Control          u
          M                                       WB
                                                                                                                             x                                                                           MEM/WB
                                                                                                                       0
                                                                                                                                                                                                                0
                                                              IF/ID                                                                      EX                                      M                         WB


                                                                                       4                                           $4




                                  PCWrite
                                                                                                                                                $4
                                                                                                                                                           M





                                                                         Instruction
                                                                                       2                                                                   u

                                                                                                                                                           x
                                                                                                                 Registers
                                            Instruction
                                                                                                                                         Data

                                       PC                                                                                                                            ALU
                                              memory                                                                                                                                            memory                   M

                                                                                                                                   $2           $2
                                                                                                                                                                                                                         u

                                                                                                                                                           M
                                                            x
                                                                                                                                                           u

                                                                                                                                                           x


                                                                                                                                    4           4


  Execution                                                                                                                         2


                                                                                                                                    9
                                                                                                                                                2


                                                                                                                                                4
                                                                                                                                                           M

                                                                                                                                                           u

                                                                                                                                                                                      4
                                                                                                                                                           x

 example                                                                                                        ID/EX.RegisterRt                                 Forwarding

                                                                                                                                                                    unit

                            Clock cycle 6
 (cont.):                       Clock 6

                                after<2>                                  after<1>                                                            add $9, $4, $2                         or $4, . . .              and $4, . . .
                                                                                            Hazard

                                                                                           detection
                 ID/EX.MemRead

   lw        $2,   20($1)                                                                     unit                                      ID/EX
                                                                                                                                   10           10
                                                                                                                                        WB
   and       $4,   $2, $5
                                                           IF/IDWrite

                                                                                                                                                                               EX/MEM
                                                                                                                             M
                                                       10
                                                                                                            Control          u
          M                                      WB
   or        $4,   $4, $2                                                                                              0
                                                                                                                             x                                                                           MEM/WB
                                                                                                                                                                                                                1
                                                             IF/ID                                                                       EX                                      M                        WB
   add       $9,   $4, $2
                                                                                                                                                $4
                                  PCWrite




                                                                                                                                                           M

                                                                        Instruction




                                                                                                                                                           u

                                                                                                                                                           x
                                                                                                                 Registers
                                            Instruction
                                                4                                                                                        Data

                                       PC                                                                                                                            ALU
                                              memory                                                                                                                                            memory                   M

                                                                                                                                                $2
                                                                                                                                                                                                                         u

                                                                                                                                                           M
                                                            x
                                                                                                                                                           u

                                                                                                                                                           x


                                                                                                                                                4
                                                                                                                                                2

                                                                                                                                                           M
                         4                         4
                                                                                                                                                9          u

                                                                                                                                                           x
                                                                                                                ID/EX.RegisterRt                                 Forwarding

                                                                                                                                                                    unit

11/27/2010                  Clock cycle 7                                                                                                                                                                                69
                                Clock 7
Control (or Branch) Hazards
    Problem with branches in the pipeline we have so far is that the
    branch decision is not made till the MEM stage – so what
    instructions, if at all, should we insert into the pipeline following the
    branch instructions?

    Possible solution: stall the pipeline till branch decision is known
         not efficient, slow the pipeline significantly!

    Another solution: predict the branch outcome
         e.g., always predict branch-not-taken – continue with next sequential
         instructions
         if the prediction is wrong have to flush the pipeline behind the branch –
         discard instructions already fetched or decoded – and continue
         execution at the branch target

11/27/2010                                                                     70
Predicting Branch-not-taken:
Misprediction delay
       Program
          Time (in clock cycles)
       execution
                 CC 1        CC 2   CC 3   CC 4   CC 5   CC 6   CC 7   CC 8   CC 9
       order

       (in instructions)


          40 beq $1, $3, 7        IM         Reg            DM     Reg



          44 and $12, $2, $5                  IM     Reg           DM     Reg



          48 or $13, $6, $2                          IM     Reg           DM     Reg



          52 add $14, $2, $2                                IM     Reg           DM     Reg



          72 lw $4, 50($7)                                         IM     Reg           DM     Reg


          The outcome of branch taken (prediction wrong) is decided only when
          beq is in the MEM stage, so the following three sequential instructions
11/27/2010already in the pipeline have to be flushed and execution resumes at lw                      71
Optimizing the Pipeline to
Reduce Branch Delay
 Move the branch decision from the MEM stage (as in our current
 pipeline) earlier to the ID stage
     calculating the branch target address involves moving the branch adder
     from the MEM stage to the ID stage – inputs to this adder, the PC value
     and the immediate fields are already available in the IF/ID pipeline
     register
     calculating the branch decision is efficiently done, e.g., for equality test,
     by XORing respective bits and then ORing all the results and inverting,
     rather than using the ALU to subtract and then test for zero (when there
     is a carry delay)
        with the more efficient equality test we can put it in the ID stage without
        significantly lengthening this stage – remember an objective of pipeline
        design is to keep pipeline stages balanced
         we must correspondingly make additions to the forwarding and hazard
         detection units to forward to or stall the branch at the ID stage in case
         the branch decision depends on an earlier result
11/27/2010                                                                        72
Flushing on Misprediction
     Same strategy as for stalling on load-use data hazard…
     Zero out all the control values (or the instruction itself) in pipeline
     registers for the instructions following the branch that are already
     in the pipeline – effectively turning them into nops – so they are
     flushed
             in the optimized pipeline, with branch decision made in the ID stage,
             we have to flush only one instruction in the IF stage – the branch
             delay penalty is then only one clock cycle




11/27/2010                                                                       73
  Optimized Datapath for Branch
  IF.Flush

                                         Hazard

                                        detection
                               IF.Flush control zeros out the instruction in the IF/ID
                                           unit
                       M
                                                ID/EX
                                                                                 pipeline register (which follows the branch)
                       u

                       x
                                                                         WB
                                                                                                          EX/MEM
                                                                    M

                                         Control                    u
    M                                WB
                                                                    x                                                       MEM/WB
                                                               0

                                IF/ID                                     EX                                M                WB



             4                                 Shift

                                              left 2
                                                                                     M

                                                                                     u

                                                                                     x
                                                        Registers    =
                 Instruction
                                                                                       Data

    PC                                                                                         ALU
                   memory                                                                                          memory              M

                                                                                                                                       u

                                                                                     M
                                                x
                                                                                     u

                                                                                     x


                                               Sign

                                              extend




                                                                                     M

                                                                                     u

                                                                                     x
                                                                                            Forwarding

                                                                                               unit




Branch decision is moved from the MEM stage to the ID stage – simplified drawing
 11/27/2010                                                                 74
not showing enhancements to the forwarding and hazard detection units
                                        and $12, $2, $5                                     beq $1, $3, 7                                               sub $10, $4, $8                        before<1>              before<2>

                                      IF.Flush




Pipelined                                                       72

                                                                48 x
                                                                     M

                                                                     u

                                                                                                  Hazard

                                                                                                 detection

                                                                                                    unit




                                                                                                                                      M

                                                                                                                                                ID/EX

                                                                                                                                                 WB
                                                                                                                                                                                        EX/MEM




Branch
                                                                                             Control                                  u
         M                                       WB
                                                                                                                                      x                                                                         MEM/WB
                                                                                                       28
                                                                                                                                 0
                                                                                    IF/ID                                                        EX                                       M                      WB
                                                                               48           44                       72

                                                            4
                                                                                                                                           $1
                                                                                                            Shift
                                              M
 $4
                                                                                                           left 2                                               u

                                                                                                                                                                x
                                                                                                                                      =
                                                                                                                          Registers
                                                                Instruction
                                                                                                                            Data

                                                  PC                                                                                                                         ALU
                                                                  memory                                         
                                                                                     memory              M

                                             72        44                                                                                  $3
                                                                                                                                                                                                                           u

                                                                                                                                                                M
   $8                                                    x
                                                                                                           7                                                    u

                                                                                                                                                                x



      Execution                                                                                         Sign

                                                                                                       extend




     example:                                                                                                                                            10                                                           


                                                                                                                                                                          Forwarding

                                                                                                                                                                             unit

                               Clock cycle 3
36   sub     $10,   $4,   $8          Clock 3


40   beq     $1,    $3,    7          lw $4, 50($7)                                              

                                                                                        bubble (nop)                                                  beq $1, $3, 7                           sub $10, . . .          before<1>

                                      IF.Flush
44   and     $12    $2,   $5                                                                      Hazard

                                                                                                 detection


48   or      $13    $2,   $6                                         

                                                                     M

                                                                     u

                                                                                                    unit
                                                                                                                                                ID/EX

                                                                76 x                                                                             WB

52   add     $14,   $4,   $2                                                                 Control
                                                                                                                                      M

                                                                                                                                      u
         M
                                                                                                                                                                                        EX/MEM

                                                                                                                                                                                         WB
                                                                                                                                                                                                                MEM/WB
                                                                                                                                      x

56   slt     $15,   $6,   $7                                                   76
                                                                                    IF/ID
                                                                                            72
                                                                                                       



                                                                                                                     

                                                                                                                                 0
                                                                                                                                                 EX                                       M                      WB




…                                                           4

                                                                                                            Shift
                                              M
   $1
                                                                                                           left 2                                               u


72 lw        $4,    50($7)                        PC
                                                                Instruction

                                                                                                                          Registers
                                                                                                                                      =                         x

                                                                                                                                                                             ALU
                                                                                                                                                                                                        Data

                                                                  memory                                         
                                                                                     memory              M

                                             76        72
                                                                                                                                                                                                                           u

                                                                                                                                                                M
   $3                                                    x
                                                                                                           
                                                    u

                                                                                                                                                                x

Optimized pipeline with                                                                                 Sign



only one bubble as a result
                                                                                                       extend




of the taken branch                                                                                                                                                                            10                     


                                                                                                                                                                          Forwarding

                                                                                                                                                                             unit

11/27/2010                     Clock cycle 4
                                      Clock 4
                                                                                                                                                                                                                                  75
Simple Example: Comparing
Performance
     Compare performance for single-cycle, multicycle, and pipelined
     datapaths using the gcc instruction mix
             assume 2 ns for memory access, 2 ns for ALU operation, 1 ns for
             register read or write
             assume gcc instruction mix 23% loads, 13% stores, 19% branches,
             2% jumps, 43% ALU
             for pipelined execution assume
               50% of the loads are followed immediately by an instruction that uses
               the result of the load
               25% of branches are mispredicted
               branch delay on misprediction is 1 clock cycle
               jumps always incur 1 clock cycle delay so their average time is 2 clock
               cycles

11/27/2010                                                                          76
Simple Example: Comparing
Performance
    Single-cycle (p. 373): average instruction time 8 ns
    Multicycle (p. 397): average instruction time 8.04 ns
    Pipelined:
       loads use 1 cc (clock cycle) when no load-use dependency
       and 2 cc when there is dependency – given 50% of loads
       are followed by dependency the average cc per load is 1.5
       stores use 1 cc each
       branches use 1 cc when predicted correctly and 2 cc when
       not – given 25% misprediction average cc per branch is 1.25
       jumps use 2 cc each
       ALU instructions use 1 cc each
       therefore, average CPI is
    1.5 × 23% + 1 × 13% + 1.25 × 19% + 2 × 2% + 1 × 43% = 1.18
       therefore, average instruction time is 1.18 × 2 = 2.36 ns
11/27/2010                                                      77

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:7
posted:4/26/2011
language:Vietnamese
pages:77
manhtung27m manhtung27m
About