Imagine

Document Sample
Imagine Powered By Docstoc
					                          TurboROB
A Low Cost Checkpoint/Restore Accelerator

              Patrick Akl and Andreas Moshovos
                    AENAO Research Group
               Department of Electrical and Computer Engineering

                        University of Toronto
              {pakl, moshovos}@eecg.toronto.edu

HIPEAC 2008                        TurboROB                        1/25
Recovering From Control Flow Mispredictions
                                                 Execution Timeline

                                            Predict a Branch
                                               Outcome


                                             Misprediction
                                              Discovered
                                                               Recover Processor
Predicted Path




                             Correct Path
                                                                     State

                                                                 Redirect Fetch
                                                 Resume
                                                Execution




                 • Accelerate Recovery – Improve Performance
HIPEAC 2008                                      TurboROB                          2/25
                   State-of-the-Art Recovery

                          Log of Changes                   State Snapshot

       Predict a Branch




                              ROB
          Outcome


         Misprediction
          Discovered


                                what           old value




              • Scalability and/or Performance Issues

HIPEAC 2008                         TurboROB                                3/25
                             Turbo-ROB

                           Log of Changes

        Predict a Branch




                               ROB
           Outcome


          Misprediction
           Discovered                           Partial Log
                                                of Changes



• Make common case fast:
      – Recover only at branches
• Store only as much as needed:
      – Partial Log
HIPEAC 2008                          TurboROB                 4/25
                     Outline

• Control Flow Mispeculation Recovery

• TurboROB

• Methodology and Results

• Summary




HIPEAC 2008            TurboROB         5/25
State Recovery Example: Register Alias Table
    Original Code                     Lg(# arch. regs)
  A add r1, r2, 100
                           RAT
  B breq r1, E
                                            p1
                                            p5
                                            p4
                                             p4
  C sub r1, r2, r2
                      Architectural




                                                         # arch. regs
                                            p2
                        Register
                                            p3


    Renamed Code
  A add p4, p2, 100
  B breq p4, E
  C sub r5, p2, p2

                                         Physical
                                         Register


HIPEAC 2008           TurboROB                                      6/25
                   ROB: Slow, Fine-Grain Recovery
Each entry contains
1. Architectural destination register
2. Its previous RAT map                          Program Order
                    3. Undo RAT updates in reverse order


        B          B              B                 B               Reorder
                                                                    Buffer


       1. Misprediction                          2. Locate newest
           discovered                               instruction

         RAT
     • Too slow: recovery latency proportional to number
       of instructions to squash
     HIPEAC 2008                      TurboROB                         7/25
Global Checkpoints: Fast, Coarse-Grain Recovery
                              Program Order

checkpoint       checkpoint     checkpoint     checkpoint


      B          B              B              B            Reorder
                                                            Buffer


     1. Misprediction
         discovered


       RAT
   • Branch w/ GC: Recovery is “Instantaneous”
   HIPEAC 2008                      TurboROB                   8/25
                 Impact of More Checkpoints

            Concept                       Actual Implementation




         RAT             architectural
                           register

   Working Copy                              physical register


• More checkpoints ?
   – Power hungry structure
   – Increased delay


• Only a few checkpoints can practically be implemented
   – Cannot always cover all branches
  HIPEAC 2008                  TurboROB                           9/25
          Intelligent Checkpointing & BranchTap


checkpoint       checkpoint       checkpoint     checkpoint


      B          B                B              B




   • Use Few Checkpoints Effectively

   • BranchTap:
         – Throttle Speculation


   HIPEAC 2008                        TurboROB                10/25
Conventional Mechanisms: Recovery Scenarios


               B            B                B

                         checkpoint




               B            B                B

                                       checkpoint
                   Re-Execution

               B            B                B

         checkpoint
 HIPEAC 2008                      TurboROB          11/25
                     Outline

• Background

• Turbo-ROB

• Methodology and Results

• Summary




HIPEAC 2008            TurboROB   12/25
                      Turbo-ROB




                                 ~ Recovery Cost

ROB Recovery      B    R1 R2          R1        R2     R1


                      useful               redundant




  We only need to reverse the first subsequent change
                  for every RAT entry

 HIPEAC 2008               TurboROB                         13/25
              Turbo-ROB Replacing the ROB


              B          B                  B

      TROB




                  Re-Execution

              B           B                 B

       TROB




HIPEAC 2008                      TurboROB       14/25
                  Selective Turbo-ROB w/ ROB


              B          B              B

        TROB




                  Selective Turbo-ROB w/ GCs

              B          B              B

       TROB

                                   checkpoint
HIPEAC 2008                  TurboROB           15/25
                   Outline

• Background

• TurboROB

• Methodology and Results

• Summary




HIPEAC 2008          TurboROB   16/25
                      Results Overview

• TROB as an ROB replacement
      – BranchTap offers better performance than ROB
      – Fewer resources
      – Even for smaller windows



• Selective TROB as a GC reduction mechanism
      – TROB reduces pressure for GCs
      – Offload a critical structure: RAT


• In the paper:
      – Selective TROB as an ROB accelerator
      – Even the smallest TROB accelerates recovery

HIPEAC 2008                     TurboROB               17/25
                         Methodology

• Simulator based on Simplescalar
      – Alpha/OSF


• 24 SPEC CPU 2000 benchmarks

• Reference Inputs

• Processor configurations
      – 4-way OoO core
      – 128/256/512 in-flight instructions
      – 1K-entry confidence table for low confidence branch
        identification / similar results with Anyweak


• 1B committed instructions after skipping 2B
HIPEAC 2008                     TurboROB                      18/25
       “Perfect Checkpointing” Configuration

• A checkpoint is auto-magically taken at all
  mispredicted branches
      – All recoveries are fast


• We report the “deterioration relative to perfect
  checkpointing”




HIPEAC 2008                       TurboROB           19/25
TROB Replacing the ROB/512-Entry Window

      ROB       TROB_32      TROB_64     TROB_128      TROB_256       TROB_512

 50%
                                                                                 better
 40%

 30%
 20%

 10%
   0%
              164.gzip    176.gcc   179.art   197.parser   301.apsi    AVG

• 64-entry TROB == ROB on the Average
• Pathological cases exist  256-entry needed
• 512-Entry TROB better than ROB

HIPEAC 2008                              TurboROB                                    20/25
TROB Replacing the ROB/128-Entry Window

                   ROB        TROB_32         TROB_64        TROB_128
  50%

  40%                                                            better

  30%

  20%

  10%

   0%
              164.gzip   176.gcc   179.art 197.parser 301.apsi   AVG

• 64-Entry 50% better than ROB
• Fewer pathological cases
• 128-Entry TROB better than ROB

HIPEAC 2008                             TurboROB                          21/25
sTROB and Global Checkpoints/128-Entry Window

           better




  • TROB + 1 GC better than 4GCs



  HIPEAC 2008          TurboROB           22/25
                                    Summary
• TROB vs. ROB
      – Replacement
              • Same resources  better performance
              • Fewer resources  often better performance
                  – Except when accuracy is high
      – Acceleration:
              • ¼ resources  35% improvement


• TROB vs. GCs
      – Reduce pressure from the critical path
      – With just 1 GC match the performance of four GCs


• One more alternative for designers
      – Allows different area/performance/power tradeoffs

HIPEAC 2008                               TurboROB           23/25
                              TurboROB
              A Low Cost Checkpoint/Restore Accelerator


               Patrick Akl and Andreas Moshovos
                     AENAO Research Group
                Department of Electrical and Computer Engineering

                         University of Toronto
              {pakl, moshovos}@eecg.toronto.edu

HIPEAC 2008                         TurboROB                        24/25
TROB Replacing the ROB/512-Entry Window



                                            better




• 64-entry TROB == ROB on the Average
• Pathological cases exist  256-entry needed
• 512-Entry TROB better than ROB
HIPEAC 2008            TurboROB                      25/25
TROB Replacing the ROB/128-Entry Window


                                   better




• 64-Entry 50% better than ROB
• Fewer pathological cases
• 128-Entry TROB better than ROB
HIPEAC 2008          TurboROB               26/25

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:3
posted:4/9/2010
language:English
pages:26