Hybrid Transactional Memory

Document Sample
Hybrid Transactional Memory Powered By Docstoc
					Hybrid Transactional Memory


      Sanjeev Kumar,    Intel Labs
         Michael Chu,   University of Michigan
  Christopher Hughes,   Intel Labs
        Partha Kundu,   Intel Labs
     Anthony Nguyen,    Intel Labs
      Promise of Transactional Memory (TM)
                                                        lock(l1); lock(l2);
        1    Easier to program                          A = A – 10;
             Compose naturally                          B = B + 10;
                                                        ...
        2    Easier to get parallel                     if ( error )
             performance                                    recovery_code();
        3                                               unlock(l1); unlock(l2);
             No deadlocks
                                                   transaction {
        4    Maintain consistency in                   A = A – 10;
             the presence of errors                    B = B + 10;
                                                       ...
        5    Avoid priority inversion                  if ( error )
                                                           abort_transaction;
             and convoying
                                                   }
        6    Supports fault tolerance
                   Simplify Parallel Programming
Intel Labs                Hybrid Transactional Memory                        2
      Flavors of Transactional Memory
        1    Easier to program
             Compose naturally
        2    Easier to get parallel                         Basic
             performance
        3    No deadlocks
        4    Maintain consistency in            Support programmer abort
             the presence of errors
        5    Avoid priority inversion
             and convoying                              Support nonblocking
        6    Supports fault tolerance
                         Our Work:
Efficient support for a TM that supports all these features
Intel Labs                Hybrid Transactional Memory                     3
      TM Implementations
      Requires versioning support and conflict detection
       Hardware approach [ Herlihy’93 ]
                Bounded number of locations
                Maintain versions in cache → Low overhead
            Pure-software approach [ Herlihy’03, Harris’03 ]
                Unbounded number of locations can be accessed within a
                 transaction
                Slow due to overhead of maintaining multiple copies
                 ─   Potentially orders of magnitude
            Unbounded hardware approach [ Hammond’04,
             Ananian’05, Rajwar’05, Moore’06 ]
                Require significant hardware support
                Discussed in more detail in the paper

Intel Labs                      Hybrid Transactional Memory               4
      Hardware vs. Software TM
              Hardware Approach                           Software Approach
            Low overhead                               High overhead
                Buffers transactional                      Uses Object copying to
                 state in Cache                              keep transactional state
            More concurrency                           Less Concurrency
                Cache-line granularity                     Object granularity
            Bounded resource                           No resource limits
                Assembly                                   High-level languages
                Within a module                            Across modules

             Useful BUT Limited                           Useful BUT Limited
              to library writers                       to special data structures

                      Neither is satisfactory for broader use

Intel Labs                     Hybrid Transactional Memory                              5
      This Work
      A Hybrid Transactional Memory Scheme

            Requires modest hardware support
                Changes are localized


            Supports unbounded number of locations
                Performance of hardware when within hardware resource
                 limits ( Low Overhead of pure Hardware TM )
                Gracefully fall back to software if the hardware resource limits
                 are exceeded ( Unbounded resources of Pure software TM )


      Experimentally demonstrate effectiveness of our approach

Intel Labs                     Hybrid Transactional Memory                          6
    Outline
   Motivation
   Proposed Architectural Support
   Hybrid Transactional Memory
   Performance Evaluation
   Conclusions
      ISA Extensions
            Start of a Transaction
                Begin Transaction All ( XBA ) or Select ( XBS )
                Save Register State ( SSTATE )
                Specify handler on abort due to conflict ( XHAND )

            During a Transaction
                Perform memory loads and store
                Override defaults ( LDX, STX, LDR, STR )

            On Transaction Abort
                Explicit Abort Transaction ( XA )
                Restore Register State ( RSTATE )

            On Transaction Commit
                Commit Transaction ( XC )
Intel Labs                     Hybrid Transactional Memory            8
      Baseline CMP Architecture
   Core      Core                     Core
                                                         Our proposed changes
                                                             Modest and Localized
                                                         Modifications to
                                                             Core
   L1 $       L1 $                    L1 $
                                                             L1 $
                                                         No changes to
                                                             Interconnect
                                                             Coherence Protocol
             Interconnect                                    L2 $
                                                             Memory



                     L2 $

Intel Labs                  Hybrid Transactional Memory                              9
        Hardware Support for TM
                         Core
                                                                            Three requirements:
                                                                             Maintain two versions

 Regular                                                    Transactional    Detect conflict
Accesses                                                    Accesses               Same core: Tag
                        L1 $                                                       Another core: Cache
                                                                                    coherence
        L1 $                     Transactional $
                                                                               Atomic commit and
                                                               New Data
                                                 Old Data
                                     Addl. Tag



                                                                                abort
             Data




                               Tag
  Tag




                                                                               Bounded
                                                                                   Capacity of TM $
                                                                                   Associativity of TM $
                    To Interconnect
                                                                                    and L2

Intel Labs                                 Hybrid Transactional Memory                                  10
    Outline
   Motivation
   Proposed Architectural Support
   Hybrid Transactional Memory
       Existing pure software scheme
       Our hybrid scheme
   Performance Evaluation
   Conclusions
      Pure Software TM [ Herlihy’03 ]
            We use this Pure Software TM as a starting point
            Implemented without any special architectural support
             using two techniques
                Use copies of objects to keep transactional state
                 ─   Make modifications on the copy during a transaction
                Add a level of indirection
                 ─   Switch the versions on when a transaction is committed



                                                        State      State       Valid Copy
 Object Pointer          State Pointer
                         Object Contents                           Active         Old
                              Old
                             New                                   Aborted        Old
                                                                   Committed      New
                                                 Object Contents


Intel Labs                      Hybrid Transactional Memory                         12
      Pure Software TM Scheme Cont’d
    Before accessing an object within a transaction


                                                           State
     Object Pointer   X   State Pointer
                               Old                Object Contents   Valid Copy
                              New

                                                  Object Contents


                                                           State
                          State Pointer
                               Old
                              New
                                                  Object Contents   Modify

Intel Labs                   Hybrid Transactional Memory                     13
      Our Hybrid Transactional Memory
            Two modes: Hardware and Software mode
                The two modes need to coexist
                Non-solution: Make all threads transition modes in lockstep
            Avoid versioning overheads (allocation and copying) in
             the hardware mode
                Still incur the indirection overheads


            Tricky because it needs to bridge the hardware and
             software schemes
                Hardware mode needs to modify data in-place
                 ─   Pure Software TM assumes data is never modified in-place
                Different sharing granularity
                 ─   Cache-line (Hardware) vs. Object (Software)
                Different conflict detection scheme
                 ─   Data (Hardware) vs. State (Software)

Intel Labs                      Hybrid Transactional Memory                     14
      Hybrid Scheme Example
             Conflict detected by the threads in the hardware mode

                                                       State      In the Hardware Mode
 Object Pointer    X    State Pointer
                             Old
                                                                       Modify in place
                                                Object Contents
                            New                                   Thread 1: HW mode
                                                                  Thread 2: HW mode
                                                Object Contents


                                                       State      In the Software Mode
                        State Pointer                                Copy and Modify
                             Old
                                                                  Thread 3: SW mode
                            New

                                                Object Contents




Intel Labs                     Hybrid Transactional Memory                         15
      Hybrid Scheme Summary
                                             Active Thread Mode
             Sharing Granularity
                                        Hardware                Software
       Conflicting Hardware Cache line                          Object
      Thread Mode Software Object                               Object


                                                                                 State
                              Object Pointer                State Pointer
                                                                 Old        Object Contents
                                                                New

                                                                            Object Contents
                                             Active Thread Mode
             Conflict Detection
                                        Hardware                Software
       Conflicting Hardware Contents      State
      Thread Mode Software Object Pointer State

Intel Labs                        Hybrid Transactional Memory                            16
    Outline
   Motivation
   Proposed Architectural Support
   Hybrid Transactional Memory
   Performance Evaluation
   Conclusions
      Experimental Framework
            Infrastructure
                Cycle-accurate execution-driven Multi-core simulator
                Modified GCC
            Three microbenchmarks
            Two scenarios: Low and High Contention
            Compare four synchronization implementations
                Lock
                Pure Hardware Transactional Memory
                Pure Software Transactional Memory
                Hybrid Transactional Memory


Intel Labs                  Hybrid Transactional Memory             18
                              Performance
                                   Benchmark: Vector-Reduce
                                   Contention: Low                                 Lock
                                                                                   TM Pure Hardware
  Normalized Execution Time




                               6
                                                                                   TM Pure Software
                               5                                                   TM Hybrid

                               4

                               3

                               2

                               1

                               0
                                    1       2            4            8       16     32      64
                                                       Number of Cores

Intel Labs                                      Hybrid Transactional Memory                       19
    Outline
   Motivation
   Proposed Architectural Support
   Hybrid Transactional Memory
   Performance Evaluation
   Conclusions
      Conclusions
            Transactional Memory is a promising approach
                Makes parallel programming an easier task
                Easier to achieve parallel speedup
            Hybrid Transactional Memory approach works
                Requires only modest hardware support
                Common case: Good performance for most
                 transactions
                Uncommon case: Graceful fallback to software mode
                 when a transaction cannot complete within the
                 hardware bounds



Intel Labs                  Hybrid Transactional Memory         21
Questions ?
      Transactions
       A Synchronization Mechanism to coordinate accesses to shared data
       by concurrent threads (An alternative to locks)



      Transaction: A group of operations on           Transaction {
      shared data                                         A = A – 10;
                                                          B = B + 10;
                                                          ...
      An API Enhancement:                                 if (error)
        1. Abort in middle of a transaction                   abort_transaction;
           o On encountering a error                  }




Intel Labs                  Hybrid Transactional Memory                       23
      Transactional Memory (TM)
            A transaction satisfies the following properties
             1) Atomicity: All-or-nothing
                     On Commit: all operations become visible
                     On Abort: none of the operations are performed
             2)   Isolation (Serializable)
                     The transactions committed appear to have been
                      performed in some serial order
            Additional Properties
             3) Optimistic concurrency control
                     Necessary for achieving good parallel speedup
             4) Non-blocking (Optional)
                     Avoid Priority Inversion
                     Avoid Convoying
Intel Labs                       Hybrid Transactional Memory           24
      Advantage 1: Performance
                 Locks                                           Transactions
    L1
                                                                                        Data
             A                                               A        C         A      Conflict
             B                                               B        D
    L1
                 L1
                      C                                                         A
                      D
                 L1       L1
                               A
                          L1


      Serialized on Locks                               Optimistically execute concurrently
      Finer granularity locks helps                     Abort and restart on data conflict
      Burden on programmer                              Automatically done by runtime
Intel Labs                     Hybrid Transactional Memory                               25
      Advantage 2: Reduces Bugs
            With locks, programmers need to
                Remember mapping between shared data and
                 locks that guard them
                 ─   Make sure the appropriate locks are held while accessing
                     shared data
                Make lock granularity as small as possible
                Avoid deadlocks due to locks
            All of these can cause subtle bugs

            With TM, programmer does not have to deal
             with these problems

Intel Labs                     Hybrid Transactional Memory                      26
      Other Advantages
            Allows new programming paradigms
                Simplifies error handling
                A new style of programming: Speculate and Verify
             Programmer can abort offending transactions

            Avoids other problems that locks suffer from
                Priority Inversion: A low-priority thread can grab a lock and
                 block a higher-priority thread
                Convoying: If a thread holding a lock blocks on a high-latency
                 event (like context-switch or I/O), it can cause other threads to
                 wait for long periods
                Fault Tolerant: If a process holding a lock dies, other
                 processes will hang forever
             Runtime system can abort offending transactions


Intel Labs                      Hybrid Transactional Memory                          27
             ABCDEF
                                                    Abcdef Ghijk
             ABCDEF            Abcdef Ghijk         Abcdef Ghijk

                               Abcdef Ghijk         Abcdef Ghijk
             ABCDEF
                                                    Abcdef Ghijk




Intel Labs            Hybrid Transactional Memory                  28