processor architecture slides

Document Sample
processor architecture slides Powered By Docstoc
					       Sixth Lecture: Chapter 3: CISC Processors
     (Tomasulo Scheduling and IBM System 360/91)
  Please recall:
 Multicycle instructions lead to the requirement of out-of-order
  execution

 Control flow scheduling, when performed centrally at the time of
  decode:
  ==> Scoreboarding technique implemented in CDC 6600

 Dataflow scheduling, if performed in a distributed manner by the
  FUs themselves at execute time.
  Instructions are decoded and issued to reservation stations awaiting
  their operands.
  ==> Tomasulo scheme in the IBM System/360 Model 91
  processor is the basis of modern superscalar processors
                                                                         1
                          Scoreboard Summary

 Main advantage:
     managing multiple  FUs
    out-of-order execution of multi-cycle operations
    maintaining all data dependences (RAW, WAW, WAR)
 Scoreboard limitations:
    single issue scheme, however: scheme is extendable to multiple-issue
    in-order issue
    no renaming  antidependences and output dependences may lead to WAR
     and WAW stalls,
    no forwarding hardware  all results go through the registers
 General limitations (not only valid for scoreboarding)
    number and types of FUs since contention for FUs leads to structural hazards
    the amount of parallelism available in code (dependences lead to stalls)


                                                                                    2
Tomasulo scheme removes some of the scoreboard limitations

  by forwarding and
  renaming hardware,

   but is still

  single issue and
  in-order issue



                                                             3
                           Register Renaming

 A name dependence occurs when two instructions Inst1 and Inst2 use the
    same register (or memory location), but there is no data transmitted
    between Inst1 and Inst2.
   If the register is renamed so that Inst1 and Inst2 do not conflict, the two
    instructions can execute simultaneously or be reordered.
   The technique that dynamically eliminates name dependences in registers
    to avoid WAR and WAW hazard, is called register renaming.
   Register renaming can be done statically (= by compiler) or dynamically
    (= by hardware).
   Tomasulo’s algorithm performs register renaming per hardware!

 Dynamic renaming in memory is much harder to perform!

    Why??

    Pointer aliasing problems.
                                                                                  4
                         Tomasulo Algorithm

 Developed for IBM 360/91 in 1967 (about 3 years after CDC 6600)
 Hazard detection and execution control are distributed among the functional
    units (vs. centralized in scoreboard)
   Reservation stations at each functional unit control when an instruction can
    begin execution at that unit.
   Common Data Bus broadcasts results to all reservation stations (of all FUs)
   Load and Stores treated as FUs as well.
   Each Register has additional flags.




                                                                                   5
                           Memory           Instruction
                                            Unit




                                                  Instructions
                      Load                                               Registers
                      Buffers


 Tomasulo
                                        Control
Organization




                                                                                     Common Data Bud (CDB)
                                                                 Load/Store
                                                                 Reservation
                                                                 Stations

                                        Operand Bus                       Memory




                                    Reservation Stations


                                        …
               Functional Unit                                     Functional Unit
                Reservation Station Components

 Each FU has one or more reservation stations
 The reservation station holds:
    instructions that have been issued and are awaiting execution at a functional unit,
    the operands for that instruction if they have already been computed (or the source of
     the operands otherwise),
    the information needed to control the instruction once it has begun execution.
 The reservation stations buffer the operands of instructions waiting to issue,
  eliminating the need to get the operands from registers (similar to forwarding).
 The register specifications store register values (scoreboarding: only pointers to
  the registers!) or pointers to reservation stations that produce the result.
     WAR hazards are avoided because an operand is already stored in reservation station
      even when a write to the same register is performed out-of-order
     WAW hazards are avoided because of the use of pointers to reservation stations
      instead of register pointers as tags on the CDB



                                                                                              7
                  Reservation Station Entries


Empty: Indicates reservation station is empty or not
InFU: Indicates the instruction is executed in the FU, remains until completion
Op: Operation to perform in the unit (e.g., + or –)
Dest: Tag of the Reservation
Src1, Src2: Value of source operands
RS1, RS2: Tag of the Reservation stations producing source registers
Vld1, Vld2: Valid flags indicating whether the values are available




                                                                                  8
                                                                     registers
                                        R          1         2        …                r         …             m
                                       Value                             …                       …
                                        Vld                              …                       …
                                                                         …                       …
 Tomasulo                               RS

Organization                                                             register status


                                                Empty InFU   Op   Dest       Src1   Vld1   RS1   Src2   Vld2   RS2
                                            1
                                  S1        2
           reservation stations




                                  Sf        s




                                  Sn
                                            k

                                                                              RS status
                   CBD and Reservation Stations

 After completion of the instruction from RS, a result token is formed and
    passed on the common data bus (CDB) to the register file and, by
    snooping, directly to all RSs (thus eliminating the need to get the operand
    value from a register).
   The traffic passing on the CDB is continually monitored.
   A result on the CDB is copied into all RSs awaiting it.
   CDB allows all units that are waiting for an operand to be loaded
    simultaneously. Hence, the RS fetches and buffers an operand as soon it
    becomes available (dataflow principle).
   The load buffers and load/store reservation stations hold data or addresses
    coming from and going to memory.

 Register result status in register set:
    Indicates which reservation station will write each register, if one exists.
    Blank when no pending instructions that will write that register.

                                                                                   10
            Three Stages of Tomasulo Algorithm

1. Issue—get instruction from Instruction Queue
    If reservation station free, the Tomasulo algorithm issues the instruction and
    fetches operands from registers if possible.
     In-order issue!
2. Execution—operate on operands (EX)
    When both operands ready then dispatch to FU and execute;
    if not ready, watch CDB for result (check for RAWs).
      Out-of-order dispatch and execution!
3. Write result—finish execution (WB)
    Write on Common Data Bus to all awaiting units;
    mark reservation station available.




                                                                                     11
                                         Tomasulo Scheduling

                                            registers                        mul   Reg1,    Reg3,   Reg5
                R          1       2        3       4      5     6           sub   Reg2,    Reg4,   Reg3
              Value        -       -      (R3)    (R4)   (R5)    -           div   Reg6,    Reg1,   Reg4
               Vld         1       1        1       1      1     1           add   Reg4,    Reg2,   Reg3
               RS          0       0        0       0      0     0
                                         register status

                         Empty    InFU     Op    Dest   Src1    Vld1   RS1   Src2    Vld2     RS2
                     1     1
reservation




              Sadd
                     2     1
stations




              Smul   3     1
              Sdiv   4     1
                                                          RS status
cycle            0
                                 token.tag                        We assume:
                                 token.data                       mul and div need 4 EX cycles,
                                                                  sub and add need 1 EX cycle.


                                                                                                           12
                                         Tomasulo Scheduling

                                            registers                        mul   Reg1,    Reg3,   Reg5
                R          1       2        3       4      5     6           sub   Reg2,    Reg4,   Reg3
              Value        -       -      (R3)    (R4)   (R5)    -           div   Reg6,    Reg1,   Reg4
               Vld         0       1        1       1      1     1           add   Reg4,    Reg2,   Reg3
               RS          3       0        0       0      0     0
                                         register status

                         Empty    InFU     Op    Dest   Src1    Vld1   RS1   Src2    Vld2     RS2
                     1     1
reservation




              Sadd
                     2     1
stations




              Smul   3     0       0      mul      1     (R3)    1      0     (R5)    1        0
              Sdiv   4     1
                                                          RS status
cycle            1
                                 token.tag
                                 token.data




                                                                                                           13
                                         Tomasulo Scheduling

                                            registers                        mul   Reg1,    Reg3,   Reg5
                R          1       2        3       4      5     6           sub   Reg2,    Reg4,   Reg3
              Value        -       -      (R3)    (R4)   (R5)    -           div   Reg6,    Reg1,   Reg4
               Vld         0       0        1       1      1     1           add   Reg4,    Reg2,   Reg3
               RS          3       1        0       0      0     0
                                         register status

                         Empty    InFU    Op     Dest   Src1    Vld1   RS1   Src2    Vld2     RS2
                     1     0        0     sub     2     (R4)     1      0    (R3)     1        0
reservation




              Sadd
                     2     1
stations




              Smul   3     0       1      mul      1     (R3)    1      0     (R5)    1        0
              Sdiv   4     1
                                                          RS status
cycle            2
                                 token.tag
                                 token.data




                                                                                                           14
                                         Tomasulo Scheduling

                                            registers                        mul   Reg1,    Reg3,   Reg5
                R          1       2        3       4      5     6           sub   Reg2,    Reg4,   Reg3
              Value        -       -      (R3)    (R4)   (R5)    -           div   Reg6,    Reg1,   Reg4
               Vld         0       0        1       1      1     0           add   Reg4,    Reg2,   Reg3
               RS          3       1        0       0      0     4
                                         register status

                         Empty    InFU    Op     Dest   Src1    Vld1   RS1   Src2    Vld2     RS2
                     1     0        1     sub     2     (R4)     1      0    (R3)     1        0
reservation




              Sadd
                     2     1
stations




              Smul   3     0       1      mul      1     (R3)     1     0     (R5)    1        0    3
              Sdiv   4     0       0      div      6              0     3     (R4)    1        0
                                                          RS status
cycle            3
                                 token.tag
                                 token.data                                    remaining cycles in FU




                                                                                                           15
                                             Tomasulo Scheduling

                                                registers                          mul   Reg1,    Reg3,   Reg5
                R          1        2           3       4      5       6           sub   Reg2,    Reg4,   Reg3
              Value        -     (R4)-(R3)    (R3)      -    (R5)      -           div   Reg6,    Reg1,   Reg4
               Vld         0        1           1       0      1       0           add   Reg4,    Reg2,   Reg3
               RS          3        0           0       2      0       4
                                             register status

                         Empty    InFU        Op     Dest    Src1     Vld1   RS1    Src2   Vld2     RS2
                     1     1        1         sub     2      (R4)      1      0     (R3)    1        0
reservation




              Sadd
                     2     0        0         add     4                1      0     (R3)    1        0
stations




                                                            (R4)-(R3)

              Smul   3     0        1         mul     1       (R3)     1      0     (R5)    1        0  2
              Sdiv   4     0        0         div     6                0      3     (R4)    1        0
                                                               RS status
cycle            4
                                 token.tag          1               sub writes result on CDB and frees RS;
                                 token.data         (R4)-(R3)       add is issued to RS 2 and gets result
                                                                    from CDB in same cycle


                                                                                                                 16
                                             Tomasulo Scheduling

                                                registers                          mul   Reg1,    Reg3,   Reg5
                R          1        2           3       4      5       6           sub   Reg2,    Reg4,   Reg3
              Value        -     (R4)-(R3)    (R3)      -    (R5)      -           div   Reg6,    Reg1,   Reg4
               Vld         0        1           1       0      1       0           add   Reg4,    Reg2,   Reg3
               RS          3        0           0       2      0       4
                                             register status

                         Empty    InFU        Op     Dest    Src1     Vld1   RS1   Src2    Vld2     RS2
                     1     1        1         sub     2      (R4)      1      0    (R3)     1        0
reservation




              Sadd
                     2     0        1         add     4                1      0    (R3)     1        0
stations




                                                            (R4)-(R3)

              Smul   3     0        1         mul     1       (R3)     1      0    (R5)     1        0  1
              Sdiv   4     0        0         div     6                0      3    (R4)     1        0
                                                               RS status
cycle            5
                                 token.tag
                                 token.data




                                                                                                                 17
                                             Tomasulo Scheduling

                                               registers                               mul Reg1, Reg3, Reg5
                R          1        2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                      (R4)-
              Value        -     (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld         0        1           1       1      1           0           add Reg4, Reg2, Reg3
               RS          3        0           0       0      0           4
                                             register status

                         Empty    InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1        1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1        1         add     4                    1      0    (R3)    1      0
stations




                                                                (R4)-(R3)
              Smul   3     0        1         mul     1           (R3)     1      0    (R5)    1      0  0
              Sdiv   4     0        0         div     6                    0      3    (R4)    1      0
                                                                   RS status
cycle            6
                                 token.tag          2          add and mul complete in the same cycle
                                 token.data                    and compete for the CDB;
                                                    (R4)-(R3)+(R3)
                                                               add gets the CDB, mul is deferred;
              Please note the WAR hazard which is automatically solved:
              add updates Reg4 before div starts executing; however, div has already stored
              the previous value in its reservation station (only works with in-order issue!)        18
                                               Tomasulo Scheduling

                                                 registers                               mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                        (R4)-
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           0           add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           4
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1      0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1      0    (R5)    1      0
              Sdiv   4     0          0         div     6         (R3)*(R5)  1      0    (R4)    1      0
                                                                     RS status
cycle            7
                                  token.tag           3
                                  token.data          (R3)*(R5)




                                                                                                                19
                                               Tomasulo Scheduling

                                                 registers                               mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                        (R4)-
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           0           add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           4
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1      0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1      0    (R5)    1      0
              Sdiv   4     0          1         div     6         (R3)*(R5)  1      0    (R4)    1      0
                                                                     RS status
cycle            8
                                  token.tag
                                  token.data




                                                                                                                20
                                               Tomasulo Scheduling

                                                 registers                               mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                        (R4)-
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           0           add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           4
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1      0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1      0    (R5)    1      0
              Sdiv   4     0          1         div     6         (R3)*(R5)  1      0    (R4)    1      0  3
                                                                     RS status
cycle            9
                                  token.tag
                                  token.data




                                                                                                                21
                                               Tomasulo Scheduling

                                                 registers                               mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                        (R4)-
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           0           add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           4
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1      0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1      0    (R5)    1      0
              Sdiv   4     0          1         div     6         (R3)*(R5)  1      0    (R4)    1      0  2
                                                                     RS status
cycle           10
                                  token.tag
                                  token.data




                                                                                                                22
                                               Tomasulo Scheduling

                                                 registers                               mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6           sub Reg2, Reg4, Reg3
                                                        (R4)-
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)          -           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           0           add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           4
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1   RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1      0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1      0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1      0    (R5)    1      0
              Sdiv   4     0          1         div     6         (R3)*(R5)  1      0    (R4)    1      0  1
                                                                     RS status
cycle           11
                                  token.tag
                                  token.data




                                                                                                                23
                                               Tomasulo Scheduling

                                                 registers                                  mul Reg1, Reg3, Reg5
                R           1         2          3       4          5        6              sub Reg2, Reg4, Reg3
                                                        (R4)-             (R3)*(R5)
              Value      (R3)*(R5) (R4)-(R3)    (R3)  (R3)+(R3)(R5)         /(R4)           div Reg6, Reg1, Reg4
               Vld          1         1           1       1      1           1              add Reg4, Reg2, Reg3
               RS           0         0           0       0      0           0
                                               register status

                         Empty      InFU        Op     Dest        Src1     Vld1      RS1   Src2   Vld2   RS2
                     1     1          1         sub     2          (R4)      1         0    (R3)    1      0
reservation




              Sadd
                     2     1          1         add     4                    1         0    (R3)    1      0
stations




                                                                  (R4)-(R3)
              Smul   3     1          1         mul     1           (R3)     1         0    (R5)    1      0
              Sdiv   4     1          1         div     6         (R3)*(R5)  1         0    (R4)    1      0  0
                                                                     RS status
cycle           12
                                  token.tag           4
                                  token.data          (R3)*(R5) /(R4)




                                                                                                                   24
      Comment on the Original Tomasulo Scheme

 In the original Tomasulo scheme, the CDB is reserved at least two cycles
  in advance
 each instruction stays at least two cycles in the EX phase
 CDB resource conflicts are solved at CDB reservation time (before
  execution)
  In contrast, we assume CDB resource conflict resolution in WB stage
  (see cycle 6 in example).

 What happens when an instruction is issued and one of its operands is on
  the CDB in the same cycle?
  Uncertain in original Tomasulo paper!
  We assume the instruction snoops the CDB already in issue phase
  (see cycle 4 in example).




                                                                             25
                           Tomasulo Summary

 Prevents register as bottleneck (forwarding from CDB to reservation
  stations)
 Avoids WAR and WAW hazards
 Not limited to basic blocks (provided branch prediction)
 Lasting Contributions
     Dynamic scheduling
     Register renaming   in reservation stations
 However: single-issue scheme, in-order issue scheme!

 Implementation in IBM 360/91




                                                                        26
                                    IBM 360/91

 Belongs to the family of the IBM System/360 architecture which all share the ISA.
 The IBM System/360 Model 91 was deeply pipelined
    (overall pipeline length was 20 stages).
   Floating-point execution unit: two separate, fully pipelined floating-point FUs, the
    adder and the multiplier/divider. The FUs could be used concurrently.
   Addition took two cycles, multiplication three cycles, and division eleven cycles.
   Three reservation stations (RS) associated to adder, and two to the
    multiplier/divider.
   A speculative branch prediction was used that speculated the target will be taken,
    when the branch target instruction is within the last eight instructions.
   Memory had a 10-cycle access, it was fully buffered and 32-way interleaved.
    The processor could have up to 32 memory accesses pending to reduce latency.
   But no cache.


                                                                                           27
                 From                      From
IBM 360/91       Store Unit                Instruction Unit


        Floating-Point                           Floating-Point
        Buffers                                  Operating
        (FLB)                                    Stack




                                                                     Floating-Point
                                     Decoder                         Registers
                                                                     (FLR)




                                                                                      Common Data Bus (CDB)
       To                                Reservation Stations
       Store
       Unit
                                                                  Multiply/Divide
                              Add Unit
                                                                       Unit
            IBM 360/91 Implementation Details

 The processor had about 120 000 gates implemented in ECL technology
  with a 60 ns basic CPU clock.
 IBM produced about 12 of the IBM System/360 Model 91 and
  perhaps twice that number of Model 195
  (which was based on Model 91 but had a faster cycle and incorporated a
  cache).




                                                                           29
                   Lessons Learned from CISC

 Modern processors use ideas from RISC and CISC approach.
 Out-of-order execution is not a new concept - it existed twenty-five years
  ago on CISC machines CDC6600 as scoreboarding and on IBM
  System/360 Model 91 as Tomasulo scheme.
 Out-of-order scheduling is quite similar to dataflow and is referred to as
  micro dataflow by microprocessor researchers.

 Next: Chapter 4: Multiple-issue (Superscalar Processors)




                                                                               30