ppt Slide by nikeborome

VIEWS: 9 PAGES: 65

									Hardware Transactional Memory



               Eyal Widder
               Nimrod Reiss

            Instructor: Yehuda Afek

            Tel Aviv University   10/06/2007
References

   Thread-Level Transactional Memory
    –   Kevin E. Moore, Mark D. Hill & David A. Wood
        [2005]


   LogTM : Log-based Transactional Memory
    –   Kevin E. Moore, Jayaram Bobba, Michelle J.
        Moravam, Mark D. Hill & David A. Wood [2006]
Outline

   Locks Vs. Transactional Memory
   Introduction to LogTM
   LogTM Version Management
   LogTM Conflict Detection
   Conclusions
The Challenge of Multithreaded SW

Goal: Parallelization
Problem: Unrestricted concurrency  bugs
Solution: Synchronization
New problem: Synchronization
  •   Tension between performance and correctness
Current Mechanism: Locks

•   Locks: objects only one thread can hold at a time
    •   Organization: lock for each shared structure
    •   Usage: (block)  acquire  access  release
•   Correctness issues
    –   Under-locking  data races
    –   Acquires in different orders  deadlock
•   Performance issues
    –   Conservative serialization
    –   Overhead of acquiring
    –   Difficult to find right granularity
Transactions vs. Locks

Lock issues:                      How transactions help:
–  Under-locking                  – Simpler interface
–  Acquires in different orders   – No ordering
–  Blocking                       – Can cancel transactions
–  Conservative serialization     – Serialization only on conflicts



   Locks  simplicity/performance tension
   Transactions  (potentially) simple and efficient
Transaction Semantics -
ACI Properties

•   Atomicity – All or Nothing

•   Consistency – Correct at beginning and end

•   Isolation – Partially done work not visible to
    other threads
Thread-Level Transactional Memory

•   Separate semantics from implementation

•   Adapt DBMS(database management systems) concepts:
     •   Concurrency control algorithms
     •   Conflict detection
     •   Taking the appropriate action
         (commit\abort\delay)
    Main challenge: Reduce the overhead of
    enforcing the ACI properties!
Basic Idea

Module TM like virtual memory:

•   A thread level abstraction
•   Use 3 types of interfaces – User,
    System\Library, Low-level
•   An interface independent of implementation
•   Combine HW and SW in implementation
How Do Transactional Memory
Systems Differ?

   (Data) Version Management
     –   Keep old values for abort AND new values for commit

     –   Eager: record old values “elsewhere”; update “in place”    Fast
     –   Lazy: update “elsewhere”; keep old values “in place”      commit

   (Data) Conflict Detection
     –   Find read-write, write-read or write-write conflicts
         among concurrent transactions

     –   Eager: detect conflict on every read/write     Less
     –   Lazy: detect conflict at end (commit/abort)   wasted work
Outline

   Locks Vs. TM
   Introduction to LogTM
   LogTM Version Management
   LogTM Conflict Detection
   Conclusions
Log Based Transactional Memory –
LogTM

•   (Hardware) Transactional Memory promising
    –   Most use lazy version management
         •   Old values “in place”
         •   New values “elsewhere”
    –   Commits slower than aborts
•   New LogTM: Log-based Transactional Memory
    –   Uses eager version management (like most databases)
         •   Old values to log in thread-private virtual memory
         •   New values “in place”
    –   Makes common commits fast!
    –   Hardware traps to Software handler
    –   Aborts handled in software
Outline

   Locks Vs. TM
   Introduction to LogTM
   LogTM Version Management
   LogTM Conflict Detection
   Conclusions
LogTM’s Eager Version Management

   Old values stored in the transaction log
    –   A per-thread linear (virtual) address space (like
        the stack)
    –   Filled by hardware (during transactions)
    –   Read by software (on abort)
   New values stored “in place”
Transaction Log Example

                             VA    Data Block         R W
    Initial State
    LogBase = LogPointer    00    12--------------   0   0


    TM count > 0            40    --------------23   0   0


                             C0    34--------------   0   0




                            1000
          Log Base   1000
                            1040
           Log Ptr   1000
                            1080

          TM count     1
                       0
Transaction Log Example

•   Store r2, (c0) /* r2 = 56 */      VA     Data Block          R W

    –   Set W bit for block (c0)
    –   Store address (c0) and old    00     12--------------    0   0

        data on the log
                                      40     --------------23    0   0
    –   Increment Log Ptr to 1048
    –   Update memory
                                      C0     56--------------
                                             34--------------    0   1
                                                                     0




                                     1000    c0 34------------
            Log Base     1000
                                     1040   --
             Log Ptr     1048
                         1000        1080

           TM count        1
Transaction Log Example
                               VA         Data Block      R W
Commit transaction
   –   Clear R & W for all     00     12--------------    0   0

       blocks
   –   Reset Log Ptr to Log    40     --------------23    0   0
       Base (1000)
   –   Clear TM count          C0     56--------------    0   1
                                                              0




                              1000    c0 34------------
          Log Base     1000
                              1040   --
            Log Ptr    1000
                       1048   1080

          TM count       0
                         1
Transaction Log Example

Abort transaction                  VA     Data Block          R W

   –   Replay log entries to
       “undo” the transaction      00     12--------------    0   0

   –   Reset Log Ptr to Log
       Base (1000)                 40     --------------23    0   0

   –   Clear R & W bits for all
       blocks                      C0     34--------------
                                          56--------------    0   0
                                                                  1

   –   Clear TM count

                                  1000    c0 34------------
            Log Base      1000
                                  1040   --
              Log Ptr     1000
                          1048
                          1048    1080

            TM count         0
                             1
Eager Version Management
Discussion

   Advantages:
    –   Fast Commits
            No copying
            Common case
   Disadvantages:
    –   Slow/Complex Aborts
            Undo aborting transaction
    –   Relies on Eager Conflict Detection/Prevention
Outline

   Locks Vs. TM
   Introduction to LogTM
   LogTM Version Management
   LogTM Conflict Detection
   Conclusions
LogTM’s Eager Conflict Detection

1)   Requesting processor sends a coherence request
     to the directory.
2)   The directory responds and possibly forwards the
     request to one or more processors.
3)   Each responding processor examines some local
     state to detect a conflict.
4)   The responding processors each ack or nack the
     request.
5)   The requesting processor resolves any conflict.
Conflict Detection

   Validation is retained by using the R,W bits
    and the directory MOESI states.

   A “Sticky State” is used to detect possible
    conflicts from overflows
Conflict Detection (example)

P0 store
   –   P0 sends get exclusive              Directory
       (GETX) request
   –   Directory responds with              I
                                           M@P0 [old]
                                                [old]
       data (old)
   –   P0 executes store      GETX
                                                  DATA




                       P0     TM mode 01    P1    TM mode 0
                              Overflow 0          Overflow 0

                        M (-W) [new]
                           (--) [old]
                        I (--) [none]        I (--) [none]
Conflict Detection (example)

   In-cache transaction
    conflict                                         Directory
                                   Fwd_GETS
    –   P1 sends get shared                          M@P0 [old]
        (GETS) request
    –   Directory forwards to P0                             GETS
    –   P1 detects conflict and
        sends NACK
                        P0     TM mode 01            P1    TM mode 0
                               Overflow 0                  Overflow 0

                          M (-W) [new]                I (--) [none]


                                                                 Conflict!
                                              NACK
Conflict Detection (example)

Cache overflow
   –   P0 sends put exclusive
       (PUTX) request                           Directory
                                  PUTX
   –   Directory acknowledges                   M@P0 [old]
                                              Msticky@P0 [new]
   –   P0 sets overflow bit
   –   P0 writes data back to         ACK
       memory                                DATA


                       P0       TM mode 01      P1      TM mode 0
                                Overflow 0
                                         1              Overflow 0

                         M (-W) [new]
                        I (--) [none]               I (--) [none]
Conflict Detection (example)

   Out-of-cache conflict
    –   P1 sends GETS request                    Directory
    –   Directory forwards to P0                 M@P0 [old]
                                               Msticky@P0 [new]
    –   P0 detects a (possible)
        conflict
                                                          GETS
    –   P0 sends NACK              Fwd_GETS



                        P0     TM mode 01            P1   TM mode 0
                               Overflow 0
                                        1                 Overflow 0

                            (-W) [new]
                          M (--) [old]
                         I (--) [none]               I (--) [none]

                                              NACK
                                                                  Conflict!
Conflict Detection (example)

   Commit
    –   P0 clears TM mode and            Directory
        Overflow bits
                                         M@P0 [old]
                                       Msticky@P0 [new]




                      P0        TM mode 01        P1      TM mode 0
                                Overflow 0
                                         1                Overflow 0

                          (-W) [new]
                        M (--) [old]
                       I (--) [none]                 I (--) [none]
Conflict Detection (example)

   Lazy cleanup
    –   P1 sends GETS request
                                  Fwd_GETS
                                                      Directory
    –   Directory forwards
        request to P0                                 S(P1) [new]
                                                    Msticky@P0 [new]
    –   P0 detects no conflict,
        sends CLEAN                        CLEAN                  GETS
    –   Directory sends Data to                    DATA
        P1
                        P0    TM mode 0               P1      TM mode 0
                              Overflow 0                      Overflow 0

                          M (--) [old]
                            (-W) [new]
                         I (--) [none]                    S (--) [new]
                                                          I (--) [none]
LogTM’s Conflict Detection w/ Cache
Overflow

   At overflow at processor P
    –   Set P‟s overflow bit (1 bit per processor)
    –   Allow writeback, but set directory state to Sticky@P

   At transaction end (commit or abort) at processor P
    –   Reset P‟s overflow bit

   At (potential) conflicting request by processor R
    –   Directory forwards R‟s request to P.
    –   P tells R “no conflict” if overflow is reset
    –   But asserts conflict if set (w/ small chance of false positive)
Conflict Resolution

   Conflict Resolution
    –   Can wait risking deadlock
    –   Can abort risking livelock
    –   Wait/abort transaction at requesting or responding proc?

   LogTM resolves conflicts at requesting processor
    –   Requesting processor waits (using coherence nacks/retries)
    –   But aborts if other processor is waiting (deadlock possible)
        & it is logically younger (using timestamps)

   Future: Requesting processor traps to software
    contention manager that decides who waits/aborts
Outline

   Locks Vs. TM
   Introduction to LogTM
   LogTM Version Management
   LogTM Conflict Detection
   Conclusions
Conclusion

   Commits are far more common than aborts
    –   Conflicts are rare
    –   Most conflicts can be resolved w/o aborts
    –   Software aborts do not impact performance
   Overflows are rare (in current benchmarks)
   LogTM
    –   Eager Version Management makes the common case
        (commit) fast
    –   Sticky States/Lazy Cleanup detects conflicts outside the
        cache (if overflows are infrequent)
QUESTIONS?
Break Time!
References

   LogTM : Log-based Transactional Memory
    –   Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill &
        David A. Wood [2006]

   Supporting Nested Transactional Memory
    in LogTM
    –   Michelle J. Moravam, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D.
        Hill, Ben Liblit, Michael M. Swift & David A. Wood [2006]
Motivation

Till now: Transactional Memory promises lock-free
   atomic, consistent and isolated execution.




 But what should occur when a transaction
executes another transaction within ?
LogTM enables flattening

   In the last lecture we‟ve introduced LogTM
    which enables subsuming inner transactions
    into the top-level transaction.
   A counter is used to count the nesting level,
    Transaction_begin() increments and
    Transaction_end() decrements.
   A conflict on an inner transaction may cause
    a complete abort to the beginning of the top-
    level one.
Challenges in nesting transactions

   Facilitating Software Composition.
   Enhancing Concurrency.
   Escaping to non-transactional
    systems.
Facilitating Software Composition

   Calling modules that use locks within
    requires caller knowledge of internal module
    implementation details.
   In order to aid modular programming,
    transactional memory should support
    nesting.
Challenges in nesting transactions

   Facilitating Software Composition.
   Enhancing Concurrency.
   Escaping to non-transactional
    systems.
Enhancing Concurrency

   Closed nesting does not eliminate all
    problems posed by modular software.
    –   Concurrency is limited by maintaining isolation
        until the top-level transaction commits.
    Example

     P2                                                          P1

Transaction T                                              Transaction L
conflict                                                            conflict

                                           Transaction S

                 pNextFree
                                           Transaction S
    M@P1


 How would you do it differently ?
Ideally S should release pNextFree so that other transactions can access the
allocator without conflicting with transaction L.
Challenges in nesting transactions

   Facilitating Software Composition.
   Enhancing Concurrency.
   Escaping to non-transactional
    systems.
Escaping to non-transactional
systems.

   Many TM systems will run on top of non-
    transactional base systems that may include:
    –   Runtime libraries
    –   Operation systems
    –   Language virtual machines (e.g. JVM)
   STMs handle such escapes easily.
   An escape to non-transactional system must disable
    HTM mechanisms to allow correct operation.
   Allow Inter-Transaction / Device communication.
Outline

   Motivation and challenges.
   Closed vs. Open Nesting.
   Nested LogTM.
    –   Supporting Closed Nesting.
            Partial aborts.
    –   Supporting Open Nesting.
            Abort actions / Commit actions.
            Condition O1.
    –   Escape actions.
   Conclusions.
Closed vs. Open nesting

   Closed Nested Transactions
    extends isolation of an inner transaction until
    the top-level transaction commits.
   Open Nested Transactions
    allow committing inner transaction to
    immediately release isolation.
Closed Nested Transactions

   May flatten transactions into the top-level
    one (as we‟ve already seen) .
   May allow partial roll-back.
Open Nested Transactions

   Increase concurrency and expressiveness.
   May increase both SW & HW complexity.
   Higher-level atomicity
    –   Child‟s memory updates not undone if parent aborts
    –   Use abort action to undo the child‟s forward action at a
        higher-level of abstraction
            E.g., malloc() compensated by free()

   Higher-level isolation
    –   Release memory-level isolation
    –   Programmer enforce isolation at higher level (e.g., locks)
    –   Use commit action to release isolation at parent commit
Outline

   Motivation and challenges.
   Closed vs. Open Nesting.
   Nested LogTM.
    –   Supporting Closed Nesting.
            Partial aborts.
    –   Supporting Open Nesting.
            Compensating actions / commit actions.
            Condition O1.
    –   Escape actions.
   Conclusions.
Nested LogTM 

   Nested LogTM extends Flat LogTM (last
    lecture).
   Splits the log into “frames”.
    –   Header contains Frame Pointer to the parent‟s
        Header.
                                            Header
    –   Header contains register          Undo record
        checkpoint.                       Undo record
                                           Header
                         Log Frame        Undo record   Level 1
                                          Undo record
                           Log Ptr
Nested LogTM 

   Replicates R/W bits.
    –   Maintains a separate Read set, Write set for each
        nesting level.
    –   Use constant (k) number of R/W sets, and flatten
        transactions whose nesting level is bigger than k.
Closed Nested LogTM

On Commit :
       Top Level Transactions commit normally.
                       t
    If ( 1 < curr_level ≤ k) :

       Merge the current log frame with parent‟s.
       “Flash – OR” R/W bits of curr_level – 1 with curr_level „s.
       Decrement curr_level .
 Otherwise:
       Merge the current log frame with parent‟s.
       Decrement curr_level .
Closed Nested LogTM

Conflict detection :
     An incoming read from memory location m
      conflicts with another thread‟s level j
      transaction if j is the minimal level where
      block(m)‟s Write bit is set.
     An incoming write to memory location m
      conflicts with another thread‟s level j
      transaction if j is the minimal level where
      block(m)‟s Write or Read bit is set.
Closed Nested LogTM

On Abort :
     An abort of the current transaction at curr_level
      traps to a software handler.
     Suppose the transaction aborts for a conflict in
      abort_level transaction.
     The software handler walks the log frame backwards
      and undoes curr_level – abort_level + 1 log
      frames.
     Finally it restores the register state save in header.
                                                                     Log
                                       frame pointer


                                       end pointer                   2, a
                                                                garbage header
                                                                       6, c

// thread i at level 0 (Non-transactional)                             4, b

a = 2; b = 4; c = 6;               // Initialize                       5, a

transaction_begin()             // top-level (level 1)

   a = b + 1;         // a gets 5.

   transaction_begin();                // level 2
                                                                   Cache
   c = b – 3; // c gets 1.                              Var    R1 R2 W1 W2 Val
   b = a + 2; // b gets 7.                                 a   1 1
                                                               0 0     0 0
                                                                       1 1     8
                                                                               2
                                                                               5
   a = c + 7; // a gets 8.                                 b   0 0
                                                               1 1     1 1
                                                                       0 0     7
                                                                               4
                                                           c   1 1
                                                               0 0     0 0
                                                                       1 1     1
                                                                               6
   transaction_commit();                       // level 2.

transaction_commit(); // level 1.                              Level       1
                                                                           0
                                                                           2
Supporting Open Transactions

When an open nested transaction Topen at level
 j commits:
   Its frame is discarded from the log.
   R/W bits for level j are cleared.
   (Optionally) Append commit and abort action
    records, Copen and Aopen to the newly exposed
    end of Topen‟s parent‟s frame.
Commit and Abort Actions

   To ensure consistency, open nested
    transactions must raise the abstraction level
    of both isolation and rollback.
   Commit actions are executed in FIFO order
    while Abort actions are executed in LIFO
    order.
                                                                    Log
                                       frame pointer


                                       end pointer                      2, a
                                                                        Aopen
                                                                        6, c

// thread i at level 0 (Non-transactional)                              4, b

a = 2; b = 4; c = 6;               // Initialize                      5, a

transaction_begin()             // top-level (level 1)

   a = b + 1;         // a gets 5.

   transaction_begin();                // level 2
                                                                    Cache
   c = b – 3; // c gets 1.                                 Var R R W W
                                                                1   2     1     2   Val
   b = a + 2; // b gets 7.                                    a    0
                                                                 0 1 1 1  0         8
   a = c + 7; // a gets 8.                                   b     0
                                                                 1 1 0 1  0         7
                                                              c    0
                                                                 0 1 0 1  0         1
   transaction_commit();                       // level 2.

transaction_commit(); // level 1.                               Level     2
                                                                          1
Condition O1

No Writes to Data Written by Ancestors
 Neither an open transaction Topen nor its commit and abort actions, Copen
          and Aopen writes any data written by Topen‟s ancestors.
counter = 0;          // initialize
transaction_begin ( ) ;      // top-level

  Example
     counter++;              // counter gets 1.
                                                           O1 Violation
         open_begin ( ) ;    // level 2
                   counter++;         // counter gets 2.
         // commit with an abort action.
         open_commit (
                   abort_action( decr(counter) ) );
…..
// Abort and run abort action
// Expect counter to be 0.
….


transaction_commit(); // not executed.
Escape Actions

   “Real world” is not transactional
   Current OS‟s are not transactional

   Systems should allow non-transactional
    escapes from a transaction

   Interact with OS, VM, devices, etc.
Escape Actions – First Class

   Keep a per-thread “Escape” bit.
   Escape Actions read most recent values from
    memory (Even uncommitted).
   Escape Actions never aborts or stalls.
   Similar to Open Transaction, an escape
    action may register Commit/Abort actions.
Conclusions

   Closed Nesting is easy to implement, and
    may allow partial rollback to improve
    efficiency.
   Open Nesting improves concurrency in cost
    for higher level atomicity and isolation and
    the complexity of software implementation.
   Using open nesting it is possible to provide
    non-transactional operations inside
    transactions.
QUESTIONS?
The End




          10/06/2007

								
To top