# ECE 445 – Computer Organization

Pipelined Datapath and Control
(Lecture #13)

The slides included herein were taken from the materials accompanying
Computer Organization and Design, 4th Edition, by Patterson and Hennessey,
and were used with permission from Morgan Kaufmann Publishers.
Material to be covered ...

Chapter 4: Sections 5 – 9, 13 – 14

Performance of the Single-Cycle MIPS

Example: MIPS Clock Rate

   Determine the clock rate for the MIPS
architecture, assuming the following:
   The MIPS is a Single Cycle Machine
   1 clock cycle per instruction
   CPI = 1
   Access time for memory units = 200 ps
   Operation time for ALU and adders = 100 ps
   Access time for register file = 50 ps

Example: MIPS Clock Rate

Instruction Class                 Functional Units used by the Instruction Class
ALU Instruction        Inst. Fetch      Register            ALU        Register
Load Word              Inst. Fetch      Register            ALU        Memory     Register
Store Word             Inst. Fetch      Register            ALU        Memory
Branch                 Inst. Fetch      Register            ALU
Jump                   Inst. Fetch

Example: MIPS Clock Rate

Instruction Class    Instr   Register           ALU           Data    Register     Total
ALU Instruction      200      50              100             0        50         400 ps
Load Word            200      50              100             200      50         600 ps
Store Word           200      50              100             200      0          550 ps
Branch               200      50              100             0        0          350 ps
Jump                 200      0               0               0        0          200 ps

Example: MIPS Clock Rate

    The clock cycle time for a machine with a
single clock cycle per instruction will be
determined by the longest instruction.
   In this example, the load word instruction
requires 600 ps.
    The clock rate is then
Clock rate = 1 / Clock Cycle Time
Clock rate = 1 / 600 ps = 1.67 GHz

Performance Issues
   Longest delay determines clock period
     Critical path: load word (lw) instruction
     Instruction memory  register file  ALU  data
memory  register file
   Not feasible to vary clock period for different
instructions
   Violates design principle
     Making the common case fast
   Improve performance by pipelining
How does pipelining work?

§4.5 An Overview of Pipelining
Pipelining Analogy
   Pipelined laundry: overlapping execution
   Parallelism improves performance

   Speedup
= 8/3.5 = 2.3
   Non-stop:
   Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages

Objective:

Keep all stages of the pipeline busy at all times.

Pipelining: Improving Performance

Latency          Max. Throughput
Non-Pipelined            2 hours                    0.5
Pipelined                2 hours                     2

Assuming all stages of pipeline
Length of time for each                       are busy at all times.

Latency = time from start of one load to the end of same load.
Maximum Throughput = # of loads completed per hour.

Pipelining: Improving Performance

Pipelining improves performance by increasing
instruction throughput, rather than decreasing
execution time of an individual instruction.

The MIPS Pipeline

MIPS Pipeline

      Five stages, one step per stage
–      IF    : Instruction fetch from memory
–      ID    : Instruction decode & register read
–      EX    : Execute operation or calculate address
–      MEM   : Access memory operand
–      WB    : Write result back to register

MIPS Pipeline

Pipeline Performance
     Assume time for stages is
   100ps for register read or write
   200ps for other stages
     Compare pipelined datapath with single-cycle
datapath

Instr          Instr fetch   Register     ALU op           Memory    Register   Total time
lw             200ps         100 ps       200ps            200ps     100 ps     800ps
sw             200ps         100 ps       200ps            200ps                700ps

R-format       200ps         100 ps       200ps                      100 ps     600ps

beq            200ps         100 ps       200ps                                 500ps

Pipeline Performance
Single-cycle (Tc= 800ps)

Why is the clock period 800ps?

Pipelined (Tc= 200ps)

Why is the clock period 200ps?

Pipeline Speedup
    If all stages are balanced
    i.e., all take the same time
    Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
    If not balanced, speedup is less
    Speedup due to increased throughput
    Latency (time for each instruction) does not
decrease

Pipelining and ISA Design
    MIPS ISA designed for pipelining
    All instructions are 32-bits
   Easier to fetch and decode in one cycle
   c.f. x86: 1- to 17-byte instructions
    Few and regular instruction formats
   Can decode and read registers in one step
   Can calculate address in 3rd stage, access memory in
4th stage
    Alignment of memory operands
   i.e. on word boundaries
   Memory access takes only one cycle
Pipeline Summary
The BIG Picture
    Pipelining improves performance by increasing
instruction throughput
    Executes multiple instructions in parallel
    Each instruction has the same latency
    Subject to hazards                        hazards will be discussed in upcoming lectures

    Structure, data, control
    Instruction set design affects complexity of
pipeline implementation

§4.6 Pipelined Datapath and Control
MIPS Pipelined Datapath

Pipeline registers
   Need registers between stages                            Why?
     To hold information produced in previous cycle

Pipeline Operation
    Cycle-by-cycle flow of instructions through the
pipelined datapath
    “Single-clock-cycle” pipeline diagram
   Shows pipeline usage in a single cycle
   Highlight resources used
    “Multi-clock-cycle” diagram
   Graph of operation over time

    We’ll look at “single-clock-cycle” diagrams for
Wrong
register
number
Why?
EX for Store

MEM for Store

WB for Store

Multi-Cycle Pipeline Diagram
    Form showing resource usage

Multi-Cycle Pipeline Diagram

Single-Cycle Pipeline Diagram
    State of pipeline in a given cycle

Questions?

