# 07_2

Document Sample

```					              Computer Architecture
Nguy n Trí Thành
Information Systems Department
Faculty of Technology
College of Technology
ntthanh@vnu.edu.vn

11/27/2010                                    1
Enhancing Performance
with Pipelining

11/27/2010                    2
Pipelining
Start work ASAP!! Do not waste time!
6 PM   7   8   9   10   11   12    1   2 AM
Time

order
A
Not pipelined
B

C

D

Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM   7   8   9   10   11   12    1   2 AM
Time

order

A
Pipelined
B

C

D
11/27/2010                                                             3
Pipelined vs. Single-Cycle
Instruction Execution: the Plan
Program
execution                        2             4             6            8             10            12            14        16           18
order          Time
(in instructions)
Instruction                          Data                                                    Single-cycle
lw \$1, 100(\$0)        fetch
Reg         ALU
access
Reg

Instruction                           Data
lw \$2, 200(\$0)                             8 ns                               fetch
Reg         ALU
access
Reg

Instruction
lw \$3, 300(\$0)                                                                                    8 ns                            fetch
...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access:
therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution                       2              4             6            8             10            12            14
Time
order
(in instructions)
Instruction                                  Data
lw \$1, 100(\$0)                         Reg        ALU                     Reg
fetch                                   access

Instruction                                Data
Pipelined
lw \$2, 200(\$0)        2 ns                            Reg       ALU                       Reg
fetch                                  access

Instruction                                Data
lw \$3, 300(\$0)                      2 ns                         Reg          ALU                       Reg
fetch                                  access

11/27/2010                                                        2 ns        2 ns           2 ns          2 ns          2 ns                                      4
Pipelining: Keep in Mind
Pipelining does not reduce latency of a single
Pipeline rate limited by longest stage
potential speedup = number pipe stages
unbalanced lengths of pipe stages reduces
speedup
Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces
speedup

11/27/2010                                                5
Example Problem
Problem: for the laundry fill in the following table when
1.     the stage lengths are 30, 30, 30 30 min., resp.
2.     the stage lengths are 20, 20, 60, 20 min., resp.

Person        Unpipelined   Pipeline 1    Ratio unpipelined   Pipeline 2    Ratio unpiplelined
finish time   finish time   to pipeline 1       finish time   to pipeline 2
1
2
3
4

n

Come up with a formula for pipeline speed-up!

11/27/2010                                                                                  6
Pipelining MIPS

What makes it easy with MIPS?
all instructions are same length
so fetch and decode stages are similar for all instructions
just a few instruction formats
simplifies instruction decode and makes it possible in one
stage
memory operands appear only in load/stores
so memory access can be deferred to exactly one later stage
operands are aligned in memory
one data transfer instruction requires one memory access
stage

11/27/2010                                                                7
Pipelining MIPS
What makes it hard?
structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline

Before actually building the pipelined datapath and
control we first briefly examine these potential
hazards individually…
11/27/2010                                                                  8
Structural Hazards
Structural hazard: inadequate hardware to simultaneously support
all instructions in the pipeline in the same clock cycle
E.g., suppose single – not separate – instruction and data memory
in pipeline below with one read port
then a structural hazard between first and fourth lw instructions
Program
execution                       2             4             6              8            10            12            14
Time
order
(in instructions)
Instruction                                   Data
lw \$1, 100(\$0)                        Reg        ALU                       Reg
fetch                                    access
Pipelined
Instruction                                 Data
lw \$2, 200(\$0)        2 ns                           Reg        ALU                       Reg
fetch                                   access
Hazard if single memory
Instruction                                 Data
lw \$3, 300(\$0)                     2 ns                            Reg        ALU                       Reg
fetch                                   access
Instruction                                 Data
lw \$4, 400(\$0)                                                                    Reg      ALU                        Reg
2 ns           fetch                                   access

2 ns          2 ns          2 ns          2 ns          2 ns

MIPS was designed to be pipelined: structural hazards are easy to
avoid!
11/27/2010                                                                                                                                              9
Control Hazards
Control hazard: need to make a decision based on the
result of a previous instruction still executing in pipeline
Solution 1 Stall the pipeline

Program
execution                              2             4             6             8            10         12            14     16
order             Time
(in instructions)
Instruction                                    Data                                      Note that branch outcome is
add \$4, \$5, \$6                            Reg          ALU                        Reg
fetch                                      access                                     computed in ID stage with
beq \$1, \$2, 40                        fetch
Reg        ALU
access
Reg
2ns
Instruction                                 Data
lw \$3, 300(\$0)                                       bubble       fetch
Reg     ALU
access
Reg

4 ns                     2ns

Pipeline stall
11/27/2010                                                                                                                                            10
Control Hazards
Solution 2 Predict branch outcome
e.g., predict branch-not-taken :
Program

execution
2             4             6            8            10            12            14
order
Time
(in instructions)
Instruction
Data

Reg                ALU
access
Reg

Instruction
Data

beq \$1, \$2, 40                                Reg               ALU                      Reg
2 ns          fetch                                  access

Instruction
Data

lw \$3, 300(\$0)                                              Reg              ALU                       Reg
2 ns           fetch                                  access

Prediction success
Program

execution
2             4             6            8            10            12            14
order
Time
(in instructions)
Instruction
Data

add \$4, \$5 ,\$6                    Reg                ALU                    Reg
fetch                                 access

Instruction
Data

beq \$1, \$2, 40                              Reg                 ALU                   Reg
fetch                                    access
2 ns
bubble        bubble       bubble        bubble    bubble

Instruction
Data

or \$7, \$8, \$9                                                    Reg                 ALU                    Reg
fetch                                     access
4 ns
11/27/2010                                                                                                                     11
Prediction failure: undo (=flush) lw
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay –
compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome
MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Program
execution                               2             4             6            8            10            12     14
order             Time
(in instructions)

beq \$1, \$2, 40        Instruction                                    Data
Reg        ALU                        Reg
fetch                                      access

add \$4, \$5, \$6                      Instruction                                   Data
Reg          ALU                   Reg
(d elayed branch slot)    2 ns           fetch                                     access

Instruction                                   Data
lw \$3, 300(\$0)                                                     Reg         ALU                   Reg
2 ns         fetch                                     access

2 ns

Delayed branch beq is followed by add that is
independent of branch outcome
11/27/2010                                                                                                                   12
Data Hazards
Data hazard: instruction needs data from the result of a
previous instruction still executing in pipeline
Solution Forward data if possible…

2        4            6             8         10
Time
Instruction pipeline diagram:
add \$s0, \$t0, \$t1          IF       ID        EX           MEM           WB        shade indicates use –

Program
execution                    2        4        6             8        10
order          Time
(in instructions)
add \$s0, \$t0, \$t1   IF       ID       EX           MEM       WB
Without forwarding – blue line –
data has to go back in time;
with forwarding – red line –
sub \$t2, \$s0, \$t3
data is available in time
IF       ID            EX       MEM       WB

11/27/2010                                                                                                                  13
Data Hazards
Forwarding may not be enough
e.g., if an R-type instruction following a load uses the result of the load –
2            4            6            8          10         12        14
Program        Time
execution
order
(in instructions)
Without a stall it is impossible
lw \$s0, 20(\$t1)      IF         ID            EX         MEM          WB
to provide input to the sub
instruction in time
sub \$t2, \$s0, \$t3               IF            ID           EX        MEM          WB

2          4             6           8           10        12      14
Program
Time
execution

order

(in instructions)
With a one-stage stall, forwarding
lw \$s0, 20(\$t1)       IF         ID            EX         MEM          WB                      can get the data to the sub
instruction in time
bubble       bubble       bubble       bubble      bubble

sub \$t2, \$s0, \$t3                              IF           ID           EX        MEM        WB
11/27/2010                                                                                                                                            14
Reordering Code to Avoid
Pipeline Stall (Software Solution)
Example:
lw \$t0, 0(\$t1)
lw \$t2, 4(\$t1)      Data hazard
sw \$t2, 0(\$t1)
sw \$t0, 4(\$t1)

Reordered code:
lw \$t0, 0(\$t1)
lw \$t2, 4(\$t1)
sw \$t0, 4(\$t1)
Interchanged
sw \$t2, 0(\$t1)

11/27/2010                          15
Pipelined Datapath
We now move to actually building a pipelined datapath
First recall the 5 steps in instruction execution
1.       Instruction Fetch & PC Increment (IF)
2.       Instruction Decode and Register Read (ID)
3.       Execution or calculate address (EX)
4.       Memory access (MEM)
5.       Write result into register (WB)
Review: single-cycle processor
all 5 steps done in a single clock cycle
dedicated hardware required for each step

What happens if we break the execution into multiple cycles, but keep
the extra hardware?
11/27/2010                                                                      16
Review - Single-Cycle Datapath
“Steps”

PC                                                                       <<2
Instruction I
32   16     32
5    5      5
Instruction
Memory                         RN1     RN2   WN
RD1                                Zero
Register File                            ALU
WD
RD2              M
X
Data
RD
E                                           Memory               M
U
16   X     32                                                         X
T                                      WD
N
D

IF
11/27/2010                                       ID                          EX                   MEM              WB
17
Instruction Fetch                  Instruction Decode             Execute/ Address Calc.   Memory Access         Write Back
Pipelined Datapath – Key Idea
What happens if we break the execution into
multiple cycles, but keep the extra hardware?
Answer: We may be able to start executing a new
instruction at each clock cycle - pipelining
…but we shall need extra registers to hold data
between cycles – pipeline registers

11/27/2010                                                     18
Pipelined Datapath

Pipeline registers wide enough to hold data coming in

64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
Instruction I
32    16     32
5    5      5
Instruction
Memory                          RN1     RN2   WN
RD1
Zero
Register File                       ALU
WD
RD2             M
X
Data
Memory   RD          M
E                                                             U
16    X    32                                                       X
T                                   WD
N
D

IF/ID                                  ID/EX                 EX/MEM                 MEM/WB
11/27/2010                                                                                                        19
Pipelined Datapath

Pipeline registers wide enough to hold data coming in

64 bits                                128 bits
PC                                                                      <<2             97 bits                  64 bits
Instruction I
32    16     32
5    5      5
Instruction
Memory                          RN1     RN2   WN
RD1
Zero
Register File                       ALU
WD
RD2             M
X
Data
Memory   RD          M
E                                                             U
16    X    32                                                       X
T                                   WD
N
D

IF/ID                                  ID/EX                 EX/MEM                 MEM/WB
11/27/2010                                                                                                        20
Only data flowing right to left may cause hazard…, why?
Bug in the Datapath

IF/ID                                         ID/EX               EX/MEM                      MEM/WB

PC                                                                                 <<2
Instruction I
32           16     32
5    5      5
Instruction
Memory                                 RN1     RN2   WN
RD1
Register File                          ALU
WD
RD2                M
X
Data
Memory   RD             M
E                                                                    U
16   X     32                                                             X
T                                       WD
N
D

11/27/2010                                                                                                                     21

Write register number comes from another later instruction!
Corrected Datapath
IF/ID                               ID/EX               EX/MEM                  MEM/WB

4               64 bits                            133 bits
102 bits                69 bits
<<2
PC
RN1          RD1
32                                                                Zero
Instruction                      RN2
ALU
5
Memory                               Register
5
WN      File RD2             M
X
Data
E                                          Memory RD             M
U
16   X 32                                                             X
T                                    WD
N
5                    D

11/27/2010   Destination register number is also passed through ID/EX, EX/MEM                                  22
and MEM/WB registers, which are now wider by 5 bits
Pipelined Example
Consider the following instruction sequence:
lw    \$t0,   10(\$t1)
sw \$t3, 20(\$t4)
sub \$t8, \$t9, \$t10

11/27/2010                                          23
Single-Clock-Cycle Diagram:
Clock Cycle 1
LW

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    24
Single-Clock-Cycle Diagram:
Clock Cycle 2
SW                                LW

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    25
Single-Clock-Cycle Diagram:
Clock Cycle 3

IF/ID                              ID/EX                   EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                             Memory   RD         M
U
16   X   32                                                            X
T                                       WD
N
5
D

11/27/2010                                                                                                        26
Single-Clock-Cycle Diagram:
Clock Cycle 4

IF/ID                              ID/EX                   EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                             Memory   RD         M
U
16   X   32                                                            X
T                                       WD
N
5
D

11/27/2010                                                                                                        27
Single-Clock-Cycle Diagram:
Clock Cycle 5

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    28
Single-Clock-Cycle Diagram:
Clock Cycle 6

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    29
Single-Clock-Cycle Diagram:
Clock Cycle 7

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    30
Single-Clock-Cycle Diagram:
Clock Cycle 8
SUB

IF/ID                              ID/EX               EX/MEM                    MEM/WB

4
<<2
PC
32          5
ALU            Zero
Instruction                     RN2
5
Memory                              Register
WN      File RD2
5
M
X
Data
E                                         Memory   RD         M
U
16   X   32                                                        X
T                                   WD
N
5
D

11/27/2010                                                                                                    31
Alternative View –
Multiple-Clock-Cycle Diagram
CC 1   CC 2   CC 3   CC 4    CC 5    CC 6      CC 7      CC 8
Time axis
lw \$t0, 10(\$t1)       IM    REG     ALU     DM      REG

sw \$t3, 20(\$t4)              IM     REG     ALU     DM    REG

add \$t5, \$t6, \$t7                    IM    REG      ALU     DM       REG

sub \$t8, \$t9, \$t10                          IM      REG     ALU       DM      REG

11/27/2010                                                                         32
Notes
One significant difference in the execution of an R-type instruction
between multicycle and pipelined implementations:
register write-back for the R-type instruction is the 5th (the last
write-back) pipeline stage vs. the 4th stage for the multicycle
implementation. Why?
think of structural hazards when writing to the register file…
Worth repeating: the essential difference between the pipeline
and multicycle implementations is the insertion of pipeline
registers to decouple the 5 stages
The CPI of an ideal pipeline (no stalls) is 1. Why?
The RaVi Architecture Visualization Project of Dortmund U. has
As we develop control for the pipeline keep in mind that the text
does not consider jump – should not be too hard to implement!
11/27/2010                                                              33
Recall Single-Cycle Control –
the Datapath
0
M

u

x
ALU

PCSrc
RegDst                           left 2
4                                                              Branch
Instruction [31 26]              MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

register 1

Instruction [20 16]                               data 1

register 2                                      Zero
Instruction
ALU ALU

M
Write
Instruction
u
register                         M
data
u
M

memory                      Instruction [15 11]    x                                                                                                 u

1           Write
x                                     Data

data                                                                                   x
1                                     memory         0
Write

data
16            32
Instruction [15 0]                                   Sign

extend           ALU

control

Instruction [5 0]

11/27/2010                                                                                                                                                         34
Recall Single-Cycle – ALU Control
Instruction AluOp Instruction Funct Field Desired     ALU control
opcode            operation              ALU action input
SW          00      store word    xxxxxx       add             010
Branch eq   01      branch eq     xxxxxx       subtract        110
R-type      10      subtract      100010       subtract        110
R-type      10      AND           100100       and             000
R-type      10      OR            100101       or              001
R-type      10      set on less   101010       set on less     111

ALUOp             Funct field     Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0        0     X X X X X X            010
0        1     X X X X X X            110
1        X     X X 0 0 0 0            010
1        X     X X 0 0 1 0            110
1        X     X X 0 1 0 0            000
1        X     X X 0 1 0 1            001
1        X     X X 1 0 1 0            111
11/27/2010                                                               35
Truth table for ALU control bits
Recall Single-Cycle – Control Signals
Effect of control bits
Signal Name    Effect when deasserted                               Effect when asserted

RegDst          The register destination number for the              The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite        None                                              The register on the Write register input is written
with the value on the Write data input
AlLUSrc          The second ALU operand comes from the                 The second ALU operand is the sign-extended,
second register file output (Read data 2)            lower 16 bits of the instruction
PCSrc           The PC is replaced by the output of the adder          The PC is replaced by the output of the adder
that computes the value of PC + 4                    that computes the branch target
input are put on the first Read data output
MemWrite        None                                                           Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg        The value fed to the register Write data input                 The value fed to the register Write data input
comes from the ALU                                               comes from the data memory

Memto- Reg Mem Mem
Deter- Instruction RegDst ALUSrc  Reg   Write Read Write Branch ALUOp1 ALUp0
mining R-format      1      0      0     1     0    0       0      1     0
control lw           0      1      1     1     1    0       0      0     0
bits        sw
11/27/2010        X      1      X     0     0    1       0      0     0
36
beq      X      0      X     0     0    0       1      0     1
Pipeline Control
Initial design – motivated by single-cycle datapath control – use
the same control signals
Observe:
Will be
No separate write signal for the PC as it is written every cycle            modified
No separate write signals for the pipeline registers as they are written    by hazard
detection
every cycle                                                                 unit!!
No separate read signal for instruction memory as it is read every clock
cycle
No separate read signal for register file as it is read every clock cycle
Need to set control signals during each pipeline stage
Since control signals are associated with components active
during a single pipeline stage, can group control lines into five
groups according to pipeline stage

11/27/2010                                                                               37
Pipelined Datapath with Control I
PCSrc

0
M

u

x
1

IF/ID                                                 ID/EX                                      EX/MEM                                  MEM/WB

result
Branch
Shift

RegWrite                    left 2

MemWrite
Instruction

data 1                ALUSrc
Zero
Zero                                                    MemtoReg
Instruction
register 2
ALU ALU

memory                                Write

register                                       M
data
u
M

Data
u

Write
x                                              memory
data                                                                                                                        x
1
0
Write

data
Instruction

[15– 0]     16             32                 6
Sign
ALU

Same control                                          Instruction

[20– 16]
0
signals as the                                        Instruction

M

u

ALUOp

single-cycle                                          [15– 11]
1
x

datapath                                                                                         RegDst

11/27/2010                                                                                                                                                                                        38
Pipeline Control Signals

There are five stages in the pipeline
instruction fetch / PC increment                              Nothing to control as instruction memory
instruction decode / register fetch                           read and PC write are always enabled

memory access
write back
Write-back
Execution/Address Calculation Memory access stage    stage control
stage control lines          control lines          lines
Reg    ALU     ALU      ALU         Mem       Mem   Reg Mem to
Instruction    Dst    Op1     Op0      Src Branch Read Write       write    Reg
R-format        1       1      0        0     0      0        0      1       0
lw              0       0      0        1     0      1        0      1       1
sw              X       0      0        1     0      0        1      0       X
beq             X       0      1        0     1      0        0      0       X
11/27/2010                                                                                                39
Pipeline Control
Implementation
Pass control signals along just like the data – extend each pipeline
register to hold needed control bits for succeeding stages
WB

Instruction
Control    M       WB

EX       M       WB

IF/ID                    ID/EX   EX/MEM   MEM/WB

Note: The 6-bit funct field of the instruction required in the EX stage
to generate ALU control can be retrieved as the 6 least significant
bits of the immediate field which is sign-extended and passed from
the IF/ID register to the ID/EX register
11/27/2010                                                                         40
Pipelined Datapath with Control II
PCSrc

ID/EX
0
M

u
WB
x                                                                                                                EX/MEM
1
Control         M                                       WB
MEM/WB

EX                                       M                                           WB
IF/ID

RegWrite
Branch
Shift

left 2

MemWrite
ALUSrc

MemtoReg
Instruction

data 1

register 2                                                             Zero
Instruction

ALU ALU

memory                                Write

register                                                 M
data
u
Data
M

Write
x                                          memory                                u

data                                                                                                                                       x
1
0
Write

data

Instruction
16                       32                 6
[15– 0]                    Sign
ALU

Control signals                                                            extend                                  control

emanate from                                     Instruction

[20– 16]
0              ALUOp
the control                                      Instruction

M

u

x
portions of the                                  [15– 11]
1
RegDst
pipeline registers
11/27/2010                                                                                                                                                                                              41
IF: lw \$10, 20(\$1)            ID: before<1>                                                          EX: before<2>                            MEM: before<3>                               WB: before<4>

Pipelined
IF/ID                                                                ID/EX                                   EX/MEM                                          MEM/WB
0
M
00               00
u
WB
x
1                                                                    000              000                                    00
Control             M                                       WB
0                                          0                                            0
0000         00                                         0
EX                                       M                                              WB 0

Execution
0                                          0

RegWrite
Shift
Branch
left 2

MemWrite
ALUSrc

and

MemtoReg
Instruction

data 1
register 2                                                             Zero
Instruction

ALU ALU

memory                                     Write

register                                                 M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                          x
1

Control
0
Write

data

Instruction

[15– 0]                 Sign
ALU
extend                                     control

Instruction

[20– 16]
0             ALUOp

Clock cycle 1                                             Instruction

[15– 11]
M

u

x

Instruction
1
Clock 1                                                                                                   RegDst

sequence:                   IF: sub \$11, \$2, \$3           ID: lw \$10, 20(\$1)                                                     EX: before<1>                            MEM: before<2>                               WB: before<3>

IF/ID                                                                ID/EX                                   EX/MEM                                          MEM/WB
0
M

lw      \$10,   20(\$1)                         1
u

x
lw
11

010
WB
00

000                                    00
Control             M                                       WB

sub     \$11,   \$2, \$3                                                                                              0001
EX
0
00
0
M
0
0
0
0
WB 0

or      \$13,   \$6, \$7               4                                                                                                                Add result

RegWrite
Shift
Branch
left 2

MemWrite

ALUSrc

MemtoReg
Instruction

register 1
\$1
X                       data 1

register 2                                                             Zero
Instruction

\$X                                          ALU ALU

memory                                     Write

register                                                 M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                          x
1

Label “before<i>” means
0
Write

data

Instruction

i th instruction before                                                       20   [15– 0]

Instruction

Sign

extend
20                              ALU

control

lw                                                                            10   [20– 16]                           10
0             ALUOp

Clock cycle 2
M

Instruction
u

X    [15– 11]                              X                  x
1
11/27/2010                               Clock 2                                                                                                   RegDst                                                                                    42
IF: and \$12, \$4, \$5            ID: sub \$11, \$2, \$3                                                      EX: lw \$10, . . .                         MEM: before<1>                               WB: before<2>

Pipelined
IF/ID                                                                  ID/EX                                    EX/MEM                                          MEM/WB
0
M
10                11
u
WB
x
1                               sub                                    000               010                                     00
Control              M                                        WB
0                                           0                                            0
1100          00                                          0
EX                                        M                                              WB 0
1                                           0

Execution                                   4

RegWrite
Shift
Branch
left 2

MemWrite
ALUSrc

MemtoReg
Instruction
and
\$2                       \$1
data 1
register 2                                                               Zero
Instruction

\$3                                            ALU ALU

memory                                      Write

register                                                   M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                            x
1
0
Write

Control
data

Instruction

X     [15– 0]                 Sign
X            20                ALU
extend                                       control

Instruction

X     [20– 16]                              X            10
0              ALUOp

Clock cycle 3                                            11
Instruction

[15– 11]                              11
M

1
u

x

Instruction                                 Clock 3                                                                                                   RegDst

sequence:                        IF: or \$13, \$6, \$7             ID: and \$12, \$2, \$3                                                      EX: sub \$11, . . .                        MEM: lw \$10, . . .                           WB: before<1>

IF/ID                                                                  ID/EX                                    EX/MEM                                          MEM/WB
0
M
10                10
u
WB

lw        \$10,   20(\$1)                            1
x
and
Control
000
M
000
WB
11

1                                           0                                            0

sub       \$11,   \$2, \$3                                                                                                   1100
EX
10
0
M
1
0
WB 0

and       \$12,   \$4, \$7                  4

RegWrite
or        \$13,   \$6, \$7                                                                                                                       Shift

left 2
Branch

MemWrite
ALUSrc

MemtoReg
Instruction

register 1
\$4                       \$2
5                        data 1

register 2                                                               Zero
Instruction

\$5                       \$3                   ALU ALU

memory                                      Write

data 2                                                  result                                                                 1
register                                                   M
data
u
Data
M

Write
x                                                                                   u

memory                                x
data                                                       1
0
Write

data

Instruction

X     [15– 0]                 Sign
X                              ALU
extend                                       control

Instruction

X     [20– 16]                              X
0              ALUOp
M
10

Clock cycle 4
Clock 4
12
Instruction

[15– 11]                              12           11
1
u

x

11/27/2010                                                                                                                                                RegDst
43
IF: add \$14, \$8, \$9           ID: or \$13, \$6, \$7                                                     EX: and \$12, . . .                        MEM: sub \$11, . . .                          WB: lw \$10, . . .

IF/ID                                                                ID/EX                                    EX/MEM                                          MEM/WB
0

Pipelined
M
10                10
u
WB
x
1                               or                                   000               000                                     10
Control              M                                        WB
1                                           0                                            1
1100          10                                          0
EX                                        M                                              WB 1
0                                           0

Execution

RegWrite
Shift
Branch
left 2

MemWrite
ALUSrc

MemtoReg
Instruction
\$6                     \$4

and
data 1
register 2                                                             Zero
Instruction
\$5
\$7                                          ALU ALU

memory                             10       Write

register                                                 M
data
u
Data
M

Write
x                                             memory                               u

data                                                                                                                                         x
1
0
Write

data

Control
Instruction

X     [15– 0]               Sign
X                              ALU
extend                                       control

Instruction

X     [20– 16]                            X
0              ALUOp

Clock cycle 5                                         13
Instruction

[15– 11]                            13           12
M

u

x
11                                           10

Clock 5                                                                                           1

Instruction
RegDst

IF: after<1>                  ID: add \$14, \$8, \$9                                                    EX: or \$13, . . .                         MEM: and \$12, . . .                          WB: sub \$11, . . .
sequence:
IF/ID                                                                ID/EX                                    EX/MEM                                          MEM/WB
0
M
10                10
u
WB

lw        \$10,   20(\$1)                         1
x
Control
000
M
000
WB
10

1                                           0                                            1

sub       \$11,   \$2, \$3                                                                                              1100
EX
10
0
M
0
0
WB 0

and       \$12,   \$4, \$7                4

RegWrite
or        \$13,   \$6, \$7                                                                                                                  Shift

left 2
Branch

MemWrite
ALUSrc
8

MemtoReg
Instruction

register 1
\$8                     \$6
9                        data 1

register 2                                                              Zero
Instruction

\$9                     \$7                   ALU ALU

memory                             11       Write

register                                                 M
data
u
Data
M

Write
x                                             memory                               u

data                                                                                                                                         x
1

Label “after<i>” means
0
Write

data

Instruction

i th instruction after add                                                         X     [15– 0]               Sign

extend
X                              ALU

control

Instruction

X     [20– 16]                            X
0              ALUOp

Clock cycle 6                                         14
Instruction

[15– 11]                            14           13
M

u

x
1
12                                           11

11/27/2010                                    Clock 6                                                                                                RegDst                                                                                    44
IF: after<2>                   ID: after<1>                                                         EX: add \$14, . . .                        MEM: or \$13, . . .                           WB: and \$12, . . .

IF/ID                                                              ID/EX                                    EX/MEM                                          MEM/WB
0

Pipelined
M
00             10
u
WB
x
1                                                                      000            000                                     10
Control           M                                        WB
1                                           0                                            1
0000       10                                          0
EX                                        M                                              WB 0
0                                           0

Execution

RegWrite
Shift
Branch
left 2

MemWrite
ALUSrc

MemtoReg
Instruction
\$8

and
data 1
register 2                                                            Zero
Instruction
\$9
ALU ALU

memory                             12      Write

register                                                M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                         x
1
0
Write

data

Control Clock cycle 7
Instruction

[15– 0]                 Sign
ALU
extend                                    control

Instruction

[20– 16]
0              ALUOp
M
13                                           12
Instruction
u

[15– 11]                                        14     x
1
Clock 7                                                                                               RegDst

Instruction                  IF: after<3>                   ID: after<2>                                                         EX: after<1>                              MEM: add \$14, . . .                          WB: or \$13, . . .

sequence:                                      0
M

IF/ID

00
ID/EX

00
EX/MEM                                          MEM/WB

u
WB
x
1                                                                      000            000                                     10
Control           M                                        WB

lw       \$10,   20(\$1)                                                                                                 0000
EX
0
00
0
M
0
0
0
1
WB 0

and      \$12,   \$4, \$7                 4                                                                                                                Add result

RegWrite
Shift
Branch
left 2

MemWrite

ALUSrc

MemtoReg
Instruction

data 1

memory                             13

register 2
Write

data 2
0
Zero
ALU ALU

1
register                                                M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                         x
1
0
Write

data

Instruction

[15– 0]                 Sign
ALU
extend                                    control

Instruction

[20– 16]
0              ALUOp

Clock cycle 8                                                Instruction

[15– 11]
M

1
u

x
14                                           13

11/27/2010                                     Clock 8                                                                                               RegDst                                                                                        45
Pipelined Execution and Control

Instruction             IF: after<4>                   ID: after<3>                                                         EX: after<2>                              MEM: after<1>                                WB: add \$14, . . .

sequence:                                            IF/ID                                                              ID/EX                                    EX/MEM                                          MEM/WB
0
M
00             00
u
WB
x
1                                                                      000            000                                     00
lw     \$10,   20(\$1)                                                                                    Control

0000
M
0
00
WB
0
0
1
EX                                        M                                              WB 0
sub    \$11,   \$2, \$3                                                                                                         0                                           0

and    \$12,   \$4, \$7              4

or     \$13,   \$6, \$7

RegWrite
Shift
Branch
left 2

MemWrite

ALUSrc

MemtoReg
Instruction

data 1
register 2                                                            Zero
Instruction

ALU ALU

memory                             14      Write

register                                                M
data
u
Data
M

Write
x                                             memory                                u

data                                                                                                                                         x
1
0
Write

data

Instruction

[15– 0]                 Sign
ALU
extend                                    control

Instruction

[20– 16]
0              ALUOp
M
14
u

Clock cycle 9                                             Instruction

[15– 11]
1
x

Clock 9                                                                                               RegDst

11/27/2010                                                                                                                                                                                                                            46
Revisiting Hazards
So far our datapath and control have ignored
hazards
We shall revisit data hazards and control
hazards and enhance our datapath and control
to handle them in hardware…

11/27/2010                                          47
Data Hazards and Forwarding
Problem with starting an instruction before previous are finished:
data dependencies that go backward in time – called data hazards

Time (in clock cycles)
\$2 = 10 before sub;                Value of
CC 1              CC 2   CC 3   CC 4    CC 5     CC 6   CC 7   CC 8   CC 9
\$2 = -20 after sub                 register \$2: 10              10     10     10     10/– 20   – 20   – 20   – 20   – 20
Program

execution

order

(in instructions)
sub \$2, \$1, \$3     IM            Reg           DM      Reg

sub    \$2,         \$1, \$3
and    \$12,        \$2, \$5        and \$12, \$2, \$5                  IM     Reg             DM      Reg

or     \$13,        \$6, \$2
add    \$14,        \$2, \$2        or \$13, \$6, \$2                          IM     Reg              DM     Reg
sw     \$15,        100(\$2)
add \$14, \$2, \$2                                IM      Reg             DM     Reg

sw \$15, 100(\$2)                                         IM      Reg           DM     Reg

11/27/2010                                                                                                             48
Software Solution
Have compiler guarantee never any data hazards!
by rearranging instructions to insert independent instructions
between instructions that would otherwise have a data hazard
between them,
or, if such rearrangement is not possible, insert nops
sub         \$2,   \$1, \$3                    sub         \$2,      \$1, \$3
lw          \$10, 40(\$3)                     nop
slt         \$5, \$6, \$7                      nop
and         \$12, \$2, \$5            or       and         \$12,      \$2, \$5
or          \$13, \$6, \$2                     or          \$13,      \$6, \$2
sw          \$15, 100(\$2)                    sw          \$15,      100(\$2)
Such compiler solutions may not always be possible, and nops
slow the machine down

11/27/2010
MIPS: nop = “no operation” = 00…0 (32bits) = sll \$0, \$0, 0   49
Hardware Solution: Forwarding

Idea: use intermediate data, do not wait for result to
be finally written to the destination register. Two
steps:
1.      Detect data hazard
2.      Forward intermediate data to resolve hazard

11/27/2010                                                         50
Pipelined Datapath with Control
II (as before)
PCSrc

ID/EX
0
M

u
WB
x                                                                                                                EX/MEM
1
Control         M                                       WB
MEM/WB

EX                                       M                                           WB
IF/ID

RegWrite
Branch
Shift

left 2

MemWrite
ALUSrc

MemtoReg
Instruction

data 1

register 2                                                             Zero
Instruction

ALU ALU

memory                                Write

register                                                 M
data
u
Data
M

Write
x                                          memory                                u

data                                                                                                                                       x
1
0
Write

data

Instruction
16                       32                 6
[15– 0]                    Sign
ALU

Control signals                                                            extend                                  control

emanate from                                     Instruction

[20– 16]
0              ALUOp
the control                                      Instruction

M

u

x
portions of the                                  [15– 11]
1
RegDst
pipeline registers
11/27/2010                                                                                                                                                                                              51
Hazard Detection
Hazard conditions:
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Eg., in the earlier example, first hazard between sub \$2, \$1, \$3 and
and \$12, \$2, \$5 is detected when the and is in EX stage and the
sub is in MEM stage because
EX/MEM.RegisterRd = ID/EX.RegisterRs = \$2 (1a)

Whether to forward also depends on:
if the later instruction is going to write a register – if not, no need to forward,
even if there is register number match as in conditions above
if the destination register of the later instruction is \$0 – in which case
there is no need to forward value (\$0 is always 0 and never overwritten)
11/27/2010                                                                                  52
Data Forwarding
Plan:
allow inputs to the ALU not just from ID/EX, but also later
pipeline registers, and
use multiplexors and control signals to choose appropriate
inputs to ALU
Time (in clock cycles)
CC 1      CC 2    CC 3   CC 4    CC 5     CC 6   CC 7   CC 8   CC 9
Value of register \$2 : 10         10    10     10     10/– 20   – 20   – 20   – 20   – 20
Value of EX/MEM : X              X     X     – 20      X       X      X      X      X
Value of MEM/WB : X               X     X      X       – 20     X      X      X      X

Program

execution order

(in instructions)
sub \$2, \$1, \$3    IM         Reg           DM      Reg

sub   \$2,       \$1, \$3
and   \$12,      \$2, \$5        and \$12, \$2, \$5               IM    Reg             DM      Reg
or    \$13,      \$6, \$2
or \$13, \$6, \$2                      IM     Reg              DM     Reg
sw    \$15,      100(\$2)

add \$14, \$2, \$2                            IM      Reg             DM     Reg

sw \$15, 100(\$2)                                     IM      Reg           DM     Reg

11/27/2010                                                                                                           53
Dependencies between pipelines move forward in time
ID/EX                                 EX/MEM                  MEM/WB

Forwarding            Registers                     ALU

Hardware                                                                                Data

memory              M

u

x

a. No forwarding

ID/EX                                 EX/MEM                  MEM/WB

M

u

x
Registers
ForwardA         ALU

M
Data

u
memory
x                                                           M

u

x

Rs   ForwardB
Rt
Rt     M

u
EX/MEM.RegisterRd
Rd
x
Forwarding
MEM/WB.RegisterRd
unit

11/27/2010                                                                                                 54
b. With forwarding Datapath after adding forwarding hardware
Forwarding Hardware:
Multiplexor Control

Mux control     Source     Explanation
ForwardA = 00   ID/EX  The first ALU operand comes from the register file
ForwardA = 10   EX/MEM The first ALU operand is forwarded from prior ALU result
ForwardA = 01   MEM/WB The first ALU operand is forwarded from data memory
or an earlier ALU result
ForwardB = 00   ID/EX  The second ALU operand comes from the register file
ForwardB = 10   EX/MEM The second ALU operand is forwarded from prior ALU result
ForwardB = 01   MEM/WB The second ALU operand is forwarded from data memory
or an earlier ALU result

Depending on the selection in the rightmost multiplexor
(see datapath with control diagram)

11/27/2010                                                                              55
Data Hazard: Detection and
Forwarding
Forwarding unit determines multiplexor control according to the
following rules:

1.        EX hazard
if (      EX/MEM.RegWrite                       // if there is a write…
and ( EX/MEM.RegisterRd ≠ 0 )                // to a non-\$0 register…
and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then…
ForwardA = 10

if (    EX/MEM.RegWrite                        // if there is a write…
and ( EX/MEM.RegisterRd ≠ 0 )                 // to a non-\$0 register…
and ( EX/MEM.RegisterRd = ID/EX.RegisterRt ) ) // which matches, then…
ForwardB = 10

11/27/2010                                                                            56
Data Hazard: Detection and
Forwarding
2.        MEM hazard
if (      MEM/WB.RegWrite                       // if there is a write…
and ( MEM/WB.RegisterRd ≠ 0 )                 // to a non-\$0 register…
and ( EX/MEM.RegisterRd ≠ ID/EX.RegisterRs )      // and not already a register match
// with earlier pipeline register…
and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline
register, then…
ForwardA = 01

if (      MEM/WB.RegWrite                        // if there is a write…
and ( MEM/WB.RegisterRd ≠ 0 )                  // to a non-\$0 register…
and ( EX/MEM.RegisterRd ≠ ID/EX.RegisterRt )       // and not already a register match
// with earlier pipeline register…
and ( MEM/WB.RegisterRd = ID/EX.RegisterRt ) ) // but match with later pipeline
register, then…
ForwardB = 01

This check is necessary, e.g., for sequences such as add \$1, \$1, \$2; add \$1, \$1, \$3; add \$1, \$1, \$4;
(array
11/27/2010 summing?), where an earlier pipeline (EX/MEM) register has more recent data                       57
Forwarding Hardware with
Control                                  ID/EX
Called forwarding unit, not hazard detection unit,
because once data is forwarded there is no hazard!

WB
EX/MEM

Control           M                                  WB
MEM/WB

IF/ID                                     EX                                  M                          WB

M

Instruction

u

x
Registers
Instruction
Data

PC                                                                                    ALU
memory                                                                                                 memory              M

u

M
x
u

x

IF/ID.RegisterRs           Rs
IF/ID.RegisterRt           Rt
IF/ID.RegisterRt           Rt
M
EX/MEM.RegisterRd
IF/ID.RegisterRd           Rd     u

x
Forwarding
MEM/WB.RegisterRd
unit

Datapath with forwarding hardware and control wires – certain details,
e.g., branching hardware, are omitted to simplify the drawing
11/27/2010                                                                      58
Note: so far we have only handled forwarding to R-type instructions…!
or \$4, \$4, \$2                and \$4, \$2, \$5                                   sub \$2, \$1, \$3                       before<1>              before<2>

ID/EX
10           10
WB
EX/MEM

Control               M                                    WB

Forwarding
MEM/WB

IF/ID                                           EX                                    M                       WB

2                   \$2           \$1
M

Instruction
5                                        u

x
Registers
Instruction
Data

PC                                                                                             ALU
memory                                                                                                           memory             M

\$5           \$3
u

M
x
u

x

2           1
5           3
M

4           2       u

x
Forwarding

Clock cycle 3                                                                                         unit

Execution               Clock 3

example:                add \$9, \$4, \$2               or \$4, \$4, \$2                                    and \$4, \$2, \$5                       sub \$2, . . .          before<1>

ID/EX
10           10
WB
sub    \$2,   \$1,   \$3                                                                                                                    EX/MEM
10
Control               M                                    WB
MEM/WB
and    \$4,   \$2,   \$5                                                                                EX                                    M                       WB
IF/ID
or     \$4,   \$4,   \$2                                                      4                   \$4           \$2

Instruction

6                                        u

x
Registers
Instruction
Data

PC                                                                                             ALU
memory                                                                                                           memory             M

\$2           \$5
u

M
x
u

x

2           2
6           5
M
2
4           4       u

x
Forwarding

11/27/2010
Clock cycle 4                                                                                         unit

59
Clock 4
after<1>                      add \$9, \$4, \$2                                        or \$4, \$4, \$2                        and \$4, . . .           sub \$2, . . .

ID/EX
10           10
WB
EX/MEM
10
Control                M                                    WB
MEM/WB

Forwarding
1
IF/ID                                                EX                                    M                        WB

4                        \$4           \$4
M

Instruction
2                                              u

x
Registers
Instruction
2                                                                                Data

PC                                                                                                   ALU
memory                                                                                                                 memory                M

\$2           \$2
u

M
x
u

x

4           4
2           2
M
4                        2
9           4        u

Execution
x

Forwarding

Clock cycle 5                                                                                                unit

example                  Clock 5

(cont.):                 after<2>                      after<1>                                              add \$9, \$4, \$2                       or \$4, . . .            and \$4, . . .

ID/EX
10
WB
sub    \$2,   \$1,   \$3                                                                                                                           EX/MEM
10
Control                M                                    WB
MEM/WB
and    \$4,   \$2,   \$5                                                                                       EX                                    M                        WB
1
IF/ID
or     \$4,   \$4,   \$2
\$4

Instruction

u

x
Registers
Instruction
4                                                                                Data

PC                                                                                                   ALU
memory                                                                                                                 memory                M

\$2
u

M
x
u

x

4
2

M
4                        4
9        u

x
Forwarding

11/27/2010
Clock cycle 6                                                                                                unit

60
Clock 6
Data Hazards and Stalls
Load word can still cause a hazard:
an instruction tries to read a register following a load instruction that writes
to the same register

lw    \$2,     20(\$1)               Time (in clock cycles)
Program
CC 1       CC 2   CC 3   CC 4   CC 5   CC 6   CC 7   CC 8   CC 9
and   \$4,     \$2, \$5      execution

order

or    \$8,     \$2, \$6      (in instructions)
lw \$2, 20(\$1)    IM         Reg           DM     Reg
Slt   \$1,     \$6, \$7
and \$4, \$2, \$5               IM    Reg           DM     Reg

As even a pipeline
or \$8, \$2, \$6                      IM     Reg           DM     Reg
dependency goes
backward in time
add \$9, \$4, \$2                            IM     Reg           DM     Reg
forwarding will not
solve the hazard
slt \$1, \$6, \$7                                   IM     Reg           DM     Reg

therefore, we need a hazard detection unit to stall the pipeline after the
11/27/2010
Pipelined Datapath with Control II
(as before)
PCSrc

ID/EX
0
M

u
WB
x                                                                                                                EX/MEM
1
Control         M                                       WB
MEM/WB

EX                                       M                                           WB
IF/ID

RegWrite
Branch
Shift

left 2

MemWrite
ALUSrc

MemtoReg
Instruction

data 1

register 2                                                             Zero
Instruction

ALU ALU

memory                                Write

register                                                 M
data
u
Data
M

Write
x                                          memory                                u

data                                                                                                                                       x
1
0
Write

data

Instruction
16                       32                 6
[15– 0]                    Sign
ALU

Control signals                                                            extend                                  control

emanate from                                     Instruction

[20– 16]
0              ALUOp
the control                                      Instruction

M

u

x
portions of the                                  [15– 11]
1
RegDst
pipeline registers
11/27/2010                                                                                                                                                                                              62
Hazard Detection Logic to Stall

Hazard detection unit implements the following check if
to stall

if ( ID/EX.MemRead                          // if the instruction in the EX stage is
and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs )           // and the destination
register
or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source
register
// of the instruction in the ID stage, then…
stall the pipeline

11/27/2010                                                                        63
Mechanics of Stalling
If the check to stall verifies, then the pipeline needs to stall only 1
clock cycle after the load as after that the forwarding unit can
resolve the dependency
What the hardware does to stall the pipeline 1 cycle:
does not let the IF/ID register change (disable write!) – this will cause
the instruction in the ID stage to repeat, i.e., stall
therefore, the instruction, just behind, in the IF stage must be stalled
as well – so hardware does not let the PC change (disable write!) –
this will cause the instruction in the IF stage to repeat, i.e., stall
changes all the EX, MEM and WB control fields in the ID/EX pipeline
register to 0, so effectively the instruction just behind the load
becomes a nop – a bubble is said to have been inserted into the
pipeline
note that we cannot turn that instruction into an nop by 0ing all the bits
in the instruction itself – recall nop = 00…0 (32 bits) – because it has
already been decoded and control signals generated
11/27/2010                                                                            64
Hazard Detection Unit
Hazard
detection

unit                                ID/EX

WB
IF/IDWrite
EX/MEM
M

Control              u
M                               WB
x                                                            MEM/WB
0
IF/ID                                                            EX                               M                          WB
PCWrite

M

Instruction

u

x
Registers
Instruction
Data

PC                                                                                                          ALU
memory                                                                                                                      memory               M

u

M
x
u

x

IF/ID.RegisterRs
IF/ID.RegisterRt
IF/ID.RegisterRt                Rt   M
EX/MEM.RegisterRd
IF/ID.RegisterRd                Rd   u

x
ID/EX.RegisterRt                Rs        Forwarding
MEM/WB.RegisterRd
Rt           unit

Datapath with forwarding hardware, the hazard detection unit and
controls wires – certain details, e.g., branching hardware are omitted
11/27/2010                                                                                                                                                          65
to simplify the drawing
Stalling Resolves a Hazard
Same instruction sequence as before for which forwarding by
itself could not resolve the hazard:

Program
Time (in clock cycles)
execution
CC 1          CC 2    CC 3   CC 4     CC 5   CC 6   CC 7   CC 8   CC 9   CC 10
order

(in instructions)

lw    \$2,    20(\$1)     lw \$2, 20(\$1)       IM          Reg              DM      Reg

and   \$4,    \$2, \$5
or    \$8,    \$2, \$6
and \$4, \$2, \$5                   IM      Reg     Reg            DM     Reg
Slt   \$1,    \$6, \$7
or \$8, \$2, \$6                            IM      IM      Reg           DM     Reg

bubble

add \$9, \$4, \$2                                            IM    Reg           DM     Reg

slt \$1, \$6, \$7                                                  IM     Reg           DM     Reg

Hazard detection unit inserts a 1-cycle bubble in the pipeline, after
11/27/2010      which all pipeline register dependencies go forward so then the 66
forwarding unit can handle them and there are no more hazards
and \$4, \$2, \$5                          lw \$2, 20(\$1)                                                     before<1>                            before<2>            before<3>
Hazard

detection

1              unit                                  ID/EX
X
11
WB

IF/IDWrite
EX/MEM

Stalling
M

Control         u
M                                    WB
x                                                                       MEM/WB
0
IF/ID                                                                  EX                                    M                     WB

1                                       \$1

PCWrite
M

Instruction
X                                                              u

x
Registers
Instruction
Data

PC                                                                                                                     ALU
memory                                                                                                                                   memory             M

\$X
u

M
x
u

x

Execution                                                                                                                 1
X
2
M

example:
u

x
ID/EX.RegisterRt                               Forwarding

unit

Clock cycle 2
Clock 2

lw      \$2,   20(\$1)       or \$4, \$4, \$2                           and \$4, \$2, \$5                                                    lw \$2, 20(\$1)                        before<1>            before<2>
Hazard

and     \$4,   \$2, \$5                                                       2
detection

unit
ID/EX
5
or      \$4,   \$4, \$2                                                                                                      00
WB
11
IF/IDWrite

EX/MEM

M

u

x
M                                    WB
MEM/WB
0
IF/ID                                                                  EX                                    M                     WB

2                                       \$2           \$1
PCWrite

M

Instruction

5                                                              u

x
Registers
Instruction
Data

PC                                                                                                                     ALU
memory                                                                                                                                   memory             M

\$5           \$X
u

M
x
u

x

2           1
5           X
2           M

4                     u

x
ID/EX.RegisterRt                               Forwarding

unit
11/27/2010              Clock cycle 3                                                                                                                                                                  67
Clock 3
or \$4, \$4, \$2                    and \$4, \$2, \$5                                                        bubble                               lw \$2, . . .            before<1>
Hazard

detection

2               unit                                      ID/EX
5
10           00
WB

IF/IDWrite
EX/MEM
M
11
Control          u
M                                    WB
x                                                                         MEM/WB
0

Stalling                                                  IF/ID

2                                           \$2
EX

\$2
M                        WB

PCWrite
M

Instruction
5                                                                u

x
Registers
Instruction
Data

PC                                                                                                                              ALU
memory                                                                                                                                           memory                M

\$5         \$5
u

M
x
u

x

2         2

Execution                                                                                                                      5         5

M
2
4         4         u

example
x
ID/EX.RegisterRt                               Forwarding

unit

Clock cycle 4
(cont.):                     Clock 4

add \$9, \$4, \$2                         or \$4, \$4, \$2                                                         and \$4, \$2, \$5                       bubble                 lw \$2, . . .
Hazard

detection

4
lw      \$2,   20(\$1)                                                       2
unit

10
ID/EX
10
WB
IF/IDWrite

and     \$4,   \$2, \$5                                                                                    Control
M

u
M
EX/MEM

WB
0
MEM/WB
or      \$4,   \$4, \$2                                     IF/ID
0
x

EX                                    M                        WB
11

PCWrite

\$4           \$2
M

Instruction

2                                                                u

x
Registers
Instruction
2                                                                                      Data

PC                                                                                                                              ALU
memory                                                                                                                                           memory                M

\$2           \$5
u

M
x
u

x

4           2
2           5
M
2
4           4       u

x
ID/EX.RegisterRt                               Forwarding

unit

11/27/2010          Clock cycle 5                                                                                                                                                                                     68
Clock 5
after<1>                                add \$9, \$4, \$2                                                               or \$4, \$4, \$2                   and \$4, . . .              bubble
Hazard
detection

4
unit                                      ID/EX
2
10           10
WB

IF/IDWrite
EX/MEM

Stalling
M
10
Control          u
M                                       WB
x                                                                           MEM/WB
0
0
IF/ID                                                                      EX                                      M                         WB

4                                           \$4

PCWrite
\$4
M

Instruction
2                                                                   u

x
Registers
Instruction
Data

PC                                                                                                                            ALU
memory                                                                                                                                            memory                   M

\$2           \$2
u

M
x
u

x

4           4

Execution                                                                                                                         2

9
2

4
M

u

4
x

example                                                                                                        ID/EX.RegisterRt                                 Forwarding

unit

Clock cycle 6
(cont.):                       Clock 6

after<2>                                  after<1>                                                            add \$9, \$4, \$2                         or \$4, . . .              and \$4, . . .
Hazard

detection

lw        \$2,   20(\$1)                                                                     unit                                      ID/EX
10           10
WB
and       \$4,   \$2, \$5
IF/IDWrite

EX/MEM
M
10
Control          u
M                                      WB
or        \$4,   \$4, \$2                                                                                              0
x                                                                           MEM/WB
1
IF/ID                                                                       EX                                      M                        WB
\$4
PCWrite

M

Instruction

u

x
Registers
Instruction
4                                                                                        Data

PC                                                                                                                            ALU
memory                                                                                                                                            memory                   M

\$2
u

M
x
u

x

4
2

M
4                         4
9          u

x
ID/EX.RegisterRt                                 Forwarding

unit

11/27/2010                  Clock cycle 7                                                                                                                                                                                69
Clock 7
Control (or Branch) Hazards
Problem with branches in the pipeline we have so far is that the
branch decision is not made till the MEM stage – so what
instructions, if at all, should we insert into the pipeline following the
branch instructions?

Possible solution: stall the pipeline till branch decision is known
not efficient, slow the pipeline significantly!

Another solution: predict the branch outcome
e.g., always predict branch-not-taken – continue with next sequential
instructions
if the prediction is wrong have to flush the pipeline behind the branch –
execution at the branch target

11/27/2010                                                                     70
Predicting Branch-not-taken:
Misprediction delay
Program
Time (in clock cycles)
execution
CC 1        CC 2   CC 3   CC 4   CC 5   CC 6   CC 7   CC 8   CC 9
order

(in instructions)

40 beq \$1, \$3, 7        IM         Reg            DM     Reg

44 and \$12, \$2, \$5                  IM     Reg           DM     Reg

48 or \$13, \$6, \$2                          IM     Reg           DM     Reg

52 add \$14, \$2, \$2                                IM     Reg           DM     Reg

72 lw \$4, 50(\$7)                                         IM     Reg           DM     Reg

The outcome of branch taken (prediction wrong) is decided only when
beq is in the MEM stage, so the following three sequential instructions
11/27/2010already in the pipeline have to be flushed and execution resumes at lw                      71
Optimizing the Pipeline to
Reduce Branch Delay
Move the branch decision from the MEM stage (as in our current
pipeline) earlier to the ID stage
from the MEM stage to the ID stage – inputs to this adder, the PC value
and the immediate fields are already available in the IF/ID pipeline
register
calculating the branch decision is efficiently done, e.g., for equality test,
by XORing respective bits and then ORing all the results and inverting,
rather than using the ALU to subtract and then test for zero (when there
is a carry delay)
with the more efficient equality test we can put it in the ID stage without
significantly lengthening this stage – remember an objective of pipeline
design is to keep pipeline stages balanced
we must correspondingly make additions to the forwarding and hazard
detection units to forward to or stall the branch at the ID stage in case
the branch decision depends on an earlier result
11/27/2010                                                                        72
Flushing on Misprediction
Same strategy as for stalling on load-use data hazard…
Zero out all the control values (or the instruction itself) in pipeline
registers for the instructions following the branch that are already
in the pipeline – effectively turning them into nops – so they are
flushed
in the optimized pipeline, with branch decision made in the ID stage,
we have to flush only one instruction in the IF stage – the branch
delay penalty is then only one clock cycle

11/27/2010                                                                       73
Optimized Datapath for Branch
IF.Flush

Hazard

detection
IF.Flush control zeros out the instruction in the IF/ID
unit
M
ID/EX
pipeline register (which follows the branch)
u

x
WB
EX/MEM
M

Control                    u
M                                WB
x                                                       MEM/WB
0

IF/ID                                     EX                                M                WB

4                                 Shift

left 2
M

u

x
Registers    =
Instruction
Data

PC                                                                                         ALU
memory                                                                                          memory              M

u

M
x
u

x

Sign

extend

M

u

x
Forwarding

unit

Branch decision is moved from the MEM stage to the ID stage – simplified drawing
11/27/2010                                                                 74
not showing enhancements to the forwarding and hazard detection units
and \$12, \$2, \$5                                     beq \$1, \$3, 7                                               sub \$10, \$4, \$8                        before<1>              before<2>

IF.Flush

Pipelined                                                       72

48 x
M

u

Hazard

detection

unit

M

ID/EX

WB
EX/MEM

Branch
Control                                  u
M                                       WB
x                                                                         MEM/WB
28
0
IF/ID                                                        EX                                       M                      WB
48           44                       72

4
\$1
Shift
M
\$4
left 2                                               u

x
=
Registers
Instruction
Data

PC                                                                                                                         ALU
memory
memory              M

72        44                                                                                  \$3
u

M
\$8                                                    x
7                                                    u

x

Execution                                                                                         Sign

extend

example:                                                                                                                                            10

Forwarding

unit

Clock cycle 3
36   sub     \$10,   \$4,   \$8          Clock 3

40   beq     \$1,    \$3,    7          lw \$4, 50(\$7)

bubble (nop)                                                  beq \$1, \$3, 7                           sub \$10, . . .          before<1>

IF.Flush
44   and     \$12    \$2,   \$5                                                                      Hazard

detection

48   or      \$13    \$2,   \$6

M

u

unit
ID/EX

76 x                                                                             WB

52   add     \$14,   \$4,   \$2                                                                 Control
M

u
M
EX/MEM

WB
MEM/WB
x

56   slt     \$15,   \$6,   \$7                                                   76
IF/ID
72

0
EX                                       M                      WB

…                                                           4

Shift
M
\$1
left 2                                               u

72 lw        \$4,    50(\$7)                        PC
Instruction

Registers
=                         x

ALU
Data

memory
memory              M

76        72
u

M
\$3                                                    x

u

x

Optimized pipeline with                                                                                 Sign

only one bubble as a result
extend

of the taken branch                                                                                                                                                                            10

Forwarding

unit

11/27/2010                     Clock cycle 4
Clock 4
75
Simple Example: Comparing
Performance
Compare performance for single-cycle, multicycle, and pipelined
datapaths using the gcc instruction mix
assume 2 ns for memory access, 2 ns for ALU operation, 1 ns for
assume gcc instruction mix 23% loads, 13% stores, 19% branches,
2% jumps, 43% ALU
for pipelined execution assume
50% of the loads are followed immediately by an instruction that uses
25% of branches are mispredicted
branch delay on misprediction is 1 clock cycle
jumps always incur 1 clock cycle delay so their average time is 2 clock
cycles

11/27/2010                                                                          76
Simple Example: Comparing
Performance
Single-cycle (p. 373): average instruction time 8 ns
Multicycle (p. 397): average instruction time 8.04 ns
Pipelined:
and 2 cc when there is dependency – given 50% of loads
are followed by dependency the average cc per load is 1.5
stores use 1 cc each
branches use 1 cc when predicted correctly and 2 cc when
not – given 25% misprediction average cc per branch is 1.25
jumps use 2 cc each
ALU instructions use 1 cc each
therefore, average CPI is
1.5 × 23% + 1 × 13% + 1.25 × 19% + 2 × 2% + 1 × 43% = 1.18
therefore, average instruction time is 1.18 × 2 = 2.36 ns
11/27/2010                                                      77

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 7 posted: 4/26/2011 language: Vietnamese pages: 77
manhtung27m