Lecture21

Document Sample
Lecture21 Powered By Docstoc
					                                       Data Dependence Distances
The following table shows instruction pair hazard interaction
                                                               Write to register file
                                                       Data Available Normal/Earliest stage

                                       Instruction          alu      load      ladr     brl
Data Available Normal/Earliest stage




                                       class
                                                            6/4      6/5       6/4      6/2
      Read from register file




                                       alu           2/3    4/1      4/2       4/1      4/1




                                                                                              Normal/forwarded
                                       load          2/3    4/1      4/2       4/1      4/1




                                                                                                 No hazard
                                       ladr          2/3    4/1      4/2       4/1      4/1
                                       store(rb)     2/3    4/1      4/2       4/1      4/1
                                       store(ra)     2/4    4/1      4/1       4/1      4/1
                                       branch        2/2    4/2      4/3       4/2      4/1            1
Review



         2
      CS501
Advanced Computer
   Architecture

       Lecture21

Dr.Noor Muhammad Sheikh
                          3
SRC hazard correction:data forwarding


• The hazard detection is required
  between stage 3-4, and between
  stage 3-5
• The testing and forwarding circuit
  employ wider IRs to store the data
  required in later stages

                                        4
add r1, r2, r3
add r2, r3, r4

• nop avoided
• Faster execution


                     5
                   Example
Execution time= ET=IC x CPI x T
% speedup= (ET without P-ET with P)/ ET with P x 100
Let ET without P=5
And ET with P=1
Hence % speedup= ((5-1)/1)x100
                    = 400%
If we assume ET with P=1.25 due to pipeline stalls,
 then % speedup= ((5-1.25)/1.25)x100
                   = 375%
                                                       6
   RTL for data forwarding
                    Hazard
                                            Data
                   detection
                                         forwarding
dependence RTL

Stage 3-5   alu5&alu3:((ra5=rb3):XZ5,
            (ra5=rc3)&!imm3: Y  Z5);

Stage 3-4   alu4&alu3:((ra4=rb3):XZ4,
            (ra4=rc3)&!imm3: Y  Z4);


                                                      7
Data                                                    Instruction
                                                          Fetch
forwarding          IR2               PC2
                                                        Decode and
hardware                                                 Operand
                                                          Read
                   IR3           X3     Y3        MD3

                           Mp7 MUX          MUX   Mp6     ALU
                                                        Operation

                                        ALU
             IR4
                                        Z4        MD4
                       Hazard                            Memory
                     Det/forward                         Access
                         unit
             IR5                             Z5
                            Hazard
                                                        Register
                          Det/forward
                                                        Writeback
                                                                8
                              unit
   Difference between Pipelining and
      Instruction-Level Parallelism
Pipelining               Instruction-Level
                         Parallelism
Single functional unit   Multiple functional units
Instructions issued      Instructions issued in
sequentially             parallel
Overlapping of           Parallel execution of
instructions             instructions
Very little extra        Multiple functional units
hardware required        within the cpu are required
                                                     9
       Instruction-Level
          Parallelism
• Superscalar Architecture
  issues multiple instruction
  simultaneously
• VLIW Architecture
  based on a very long instruction
  word.
                                     10
 Superscalar Architecture

• It has one or more IUs (integer units) ,
  FPUs (floating point units), BPUs
  (branch prediction units)
• It divides instructions into three
  classes
   Integer
   Floating point
   Branch prediction

                                         11