A Survey Paper on Transactional Memory Elan Dubrofsky CPSC 508 Course Project Department of Computer Science University of British Columbia Vancouver, B.C., Canada, V6T 1Z4 email@example.com Abstract The necessity to write concurrent programs is increasing as sys- tems are getting more complex while processor speed increases are slowing down. The current popular solution for parallel program- ming is to use locks but they contain many known drawbacks that make them a suboptimal solution. Transactional memory is a re- cent alternative to locks that is gaining a lot of attention in the research community. In this survey paper I explain the concept of transaction memory and identify its various beneﬁts and lim- itations. Work on software, hardware and hybrid approaches to transactional memory is presented as well as a way to combine transactional code with code that uses locks. I conclude with my thoughts on the future of this potentially groundbreaking mecha- nism for shared-variable synchronization. 1 Introduction For a couple of decades now, developers have been able to rely on the fact that their computers would get faster. Processing speeds have increased consistenly over the years according to Moore’s law and as such we have been able to develop systems of increasing complexity without requiring groundbreaking innovation on the software side. Unfortunately, it appears that the free lunch is over. According to Simon Peyton Jones  we can no longer assume that our programs will run faster just by purchesing the newest generation processor. While individual processor improve- ments may be declining there is still hope that comes from parallel programming. Multi-core processors are becoming very prevelant and it is up to software and operating system developers to ﬁnd a solution to using them to their full capacity. The hardest problem that needs to be overcome when writing parallel programs is that of synchronization. Multiple threads may need to access the same locations in memory and if careful measures aren’t taken the result can be disasterous; if two threads try to modify the same variable at the same time, the data can become corrupted. Most of today’s software use locks to solve this problem. Locks ensure that a critical section, which is a block of code that contains variables that may be accessed by multiple threads, can only be accessed by one thread at a time. When a thread tries to enter a critical section, it must ﬁrst acquire that section’s lock. If another thread is already holding the lock, the former thread must wait until the lock-holding thread releases the lock, which it does when it leaves the critical section. While locks do solve the problem of multiple threads accessing the same data at the same time, they have several well known drawbacks with respect to performance and ease of implementation. One big problem with locks is the potential for deadlock which can cause a program to freeze. Deadlock occurs when one thread is waiting for a lock in order to proceed and that lock is being held by a second thread that cannot proceed because it is waiting for a lock that is being held by the ﬁrst thread. According to Birrell , the most eﬀective way to avoid deadlocks is to apply a partial order to the acquisition of locks in your program. While this solution works it can be very tedious for a programmer to ensure that his code adheres to this rule. Another issue with locks is that in order to promote concurrency, ﬁne-grained locking is required. This can lead to very complicated code with numourus locks being acquired and released all over the place. A popular example of this is in the Linux kernel where there are pages of comments just explaining what all of the locks are for. Simon Payton Jones  as well as others comment that the fundamental shortcoming of locks is that locks do not support modular programming. This means that large programs cannot be built using small programs without modifying the smaller programs. This survey paper discusses transactional memory, which is an alternative to us- ing locks to enforce proper synchronization. Transactions do not suﬀer from the problems associated with locks that were mentioned above. Simon Payton Jones  and Aguilera et al.  point out that the important guarantees to the execution of critical sections provided by transactions are atomicity, consistency, isolation and durability (the ACID properties). Atomicity means that a critical section will exe- cute completely or not at all, or in other words, no other threads will be able to see a state of memory where a critical section is only partially complete. Consistency means that data will never get corrupted and islotation means that the execution of a critical section will never be aﬀected by the actions of other threads. Durability simply means that any committed memory modiﬁcations must be reliable. Another big advantage of transactional memory is that it makes synchronization simple to implement and code using transactions is very readable and understandable, which is deﬁnitely not the case with locks. The next section of this paper explains what transactions are and how transactional memory can be used by an application developer to write parallel programs. It also elaborates on some of the beneﬁts that transactions provide over the use of locks. In section 3 I discuss the problem of where transactions should be implemented, in hardware or in software. The pros and cons of each option are compared and I present work being done on a hybrid approach which tries to take advantage of the best features in both options. Section 4 then discusses some drawbacks of transactional memory and work that has been done in evaluating its performance and section 5 presents work that has been done to overcome some of the problems by having systems dynamically decide when to use transactions and when not to. I then conclude with my opinion on the future of transactional memory and what I think needs to be done to increase its chances of widespread adoption. 2 Transactional Memory Overview Larus and Rajwar  point out in their book that database systems have success- fully been exploiting concurrency for decades using transactions. This has led to many people trying to get this programming model used by databases to function as a more general parallel programming model. In 1993 Herlihy and Moss in  proposed hardware-supported transactional memory as a mechanism for lock-free data synchronization and since then transaction memory has been a very hot topic in the systems community. The concept of using transactions is pretty straight forward. Any critical section of code that one wants made atomic by a transac- tion must be surrounded with, for example, xbegin and xend tags. When inside a transaction, any attempts to read or write to memory are not actually executed but are instead buﬀered to some sort of log (conceptually). When a transaction ends, the system checks to see if the memory locations that were accessed inside the transaction were modiﬁed by another thread between the time that xbegin and xend were called. If there is no conﬂict detected, the transaction is free to commit all of its memory modiﬁcations from the log and exit. In the case of a conﬂict, the transaction wipes the log clean and reverts back to the beginning of the transaction. This revert mechanism serves to make it appear as if the critical section had never been executed. An implentation of transactional memory such as this is called optimistic execution. It is considered to be optimistic because when an xbegin tag is reached, the system enters the transaction with the hope that it will be able to commit all of its changes at the end. It is important to note that a transaction does not worry about obtaining any locks. It simply executes right away and records any memory reads or writes to the log. The veriﬁcation step at the end checks that the log is valid before the changes are committed. To check that the log is valid, the system must go through every variable that was read or written to ensure that their values are consistent with what they were when the transaction began (thus ensuring isolation). It is up to the implementation to ensure that the veriﬁcation step is done atomically. Overall this is a very clean solution to parallel programming, as concurrency is dealt with simply by surrounding all critical sections with xbegin and xend tags. Unfortunately, transactional memory has some major limitations that have kept it from replacing locks as of yet. One limitation is that of performance. Whenever a transaction reverts, all of the work that it had done is essentially wasted. Cascaval et al.  say that transactional memory systems have yet to produce consistent results that indicate they can work without introducing unnacceptable overheads that cause the systems to be to slow. The overhead in using transactions is a major hurdle that must be overcome before transactions will ever be adopted. Another limitation is that I/O operations cannot be supported with transactions. For example, there is no way to revert printing something to the console for the user to see. These issues, as well as others are discussed in sections 4 and 5 of this paper. First though, I will discuss the issue of wether to implement transactional memory in hardware or software. 3 Hardware vs. Software An important topic in the transactional memory research is wether transactions should be implemented in hardware or software. This section will discuss some implementations for both options and explore the beneﬁts and drawbacks of each approach. Recent work on a hybrid implementation is also covered. 3.1 Hardware Transactional Memory 3.1.1 Herlihy and Moss 1993 The key to transactional memory is for the system to know when a transaction can be committed and when it must be aborted. Herlihy and Moss  proposed a very clever way to implement transactional memory in hardware. They do so by modifying standard multiprocessor cache coherence protocols in order to work for transactional memory. Multiprocessor cache coherence protocols ensure that diﬀerent processors cannot contain inconsistent values in their caches for the same location in memory. Herlihy and Moss proposed that any protocol capable of de- tecting accessibility conﬂicts can also detect transaction conﬂicts at no extra cost. The implementation discussed in  provides three primitives: Load Transactional (LT), Load Transactional Exclusive (LTX) and Store Transactional (ST). LT reads the value of a shared memory location into a private register, LTX does the same thing but “hints” that the location will be updated and ST writes from a private register to memory but does not makes the change visible to other processors until the transaction commits. It also provides three instructions: COMMIT (attempt to make changes permanent), ABORT (discard changes) and VALIDATE (test current transaction status). In standard multiprocessor cache coherence protocols access may be non-exclusive (permitting reads) or exclusive (permitting writes). At any time, a memory location will either be not immediately accessible by any processor (in memory only), ac- cessible non-exclusively by one or more processors or accessible exclusively by only one processor. If a processor P has non-exclusive access to a location in memory and processor Q wants to store to that location, Q must obtain exclusive access and does so by revoking P’s access. Herlihy and Moss’ implementation of hard- ware transactional memory aborts any transaction that tries to revoke access of a transactional entry from another active transaction. By extending Goodman’s snoopy protocol for a shared bus , Herlihy and Moss have each processor maintain a regular and transactional cache. The transactional cache holds all tentative writes and only propagates changes to other processors or main memory if the transaction commits. They augment each transactional cache line with a transactional state of EMPTY (no data), NORMAL (contains committed data), XCOMMIT (discard on commit) and XABORT (discard on abort). Trans- actional operations then put two entries in cache, one with tag XCOMMIT and one with XABORT. All modiﬁcations are made to the XABORT entry and upon COM- MIT, entries marked XCOMMIT are set to EMPTY and entries marked XABORT are set to NORMAL. Upon an ABORT instruction, entries marked XABORT are set to EMPTY and those marked XCOMMIT are set to NORMAL. On a bus cycle, the cache acts like a regular cache except it ignores entries that are not marked NORMAL. This ensures that all changes will be propagated back to main memory if the transaction was committed. Of course this implementation also has a mechanism to discard changes if there is a transaction conﬂict. It does so by having each processor maintain two ﬂags: TACTIVE (is a transaction active?) and TSTATUS (has the active transaction aborted?). TACTIVE is set to True whenever a transaction executes its ﬁrst trans- actional operation. Upon an LT instruction, the transactional cache is probed for an XABORT entry (this will be an entry that was modiﬁed but not yet committed). If there is no XABORT entry for the memory location but there is a NORMAL one, they change the state from NORMAL to XABORT and create a second entry with the same data and assign it the XCOMMIT state. If there is no NORMAL entry either, the data is read from memory and XCOMMIT and XABORT entries are made as discussed above. If this memory read fails (because of a conﬂict), TSTA- TUS is set to false, all XABORT entries are dropped and all the XCOMMIT entries are set to NORMAL. LTX and ST instructions work in very similar ways. When VALIDATE is called, if TSTATUS is False, TACTIVE is set to False and TSTA- TUS is set to True (this basically aborts the transaction). When ABORT is called, transactional cache entries are discarded, TACTIVE is set to False and TSTATUS is set to True. But, if COMMIT is called, TSTATUS is set to True, TACTIVE is set to false, all XABORT entries are dropped and all XABORT entries are set to NORMAL (so they can be read on the next bus cycle). This idea of using cache coherence protocols to implement hardware transactional memory was considered very novel in its time and sparked quite a lot of research in the area. Unfortunately there are some drawbacks to hardware transactional memory that are still considered major issues today. 3.1.2 Other Ways to do Hardware Transactional Memory The Wikipedia entry for transactional memory currently says that Load-Link and Store-Conditional operations can be viewed as the most basic hardware transac- tional memory support. Load-Link returns the value of a memory location and Store-Conditional stores a new value in memory only if no updates have occurred to that location since the last Load-Link call. This is an example of a simple ”up- date and commit” operation . Of course, these operations operate on data the size of a native machine word and therefore they are ineﬀective in providing the functionality of regular transactions. Rajwar and Goodman  comment that Herlihy and Moss’ implementation is not optimal because it requires special instructions, programmer support and coherence protocol extensions. Lev and Maessen  add that it is not robust and only works for transactions up to a ﬁxed size. They also point out that it is architecture speciﬁc and not portable. Some of these problems are dealt with in Rajwar et al.’s work on Virtualizing Transactional Memory . In  the authors virtualize transactional memory in much the same way that vir- tual memory virtualizes physical memory. That is, programmers write applications without concern for the hardware limitations. Though their results are impressive, they still admit that there are some open challenges such as requiring a mechanism to support interactions among processes from diﬀerent virtual address spaces. They also mention the issue of I/O which will be discussed in section 5. According to  and others, many of the limitations mentioned above are simply due to the fact that hardware transactional memory (alone) is not the best solution. In the next section I discuss an alternative, software transactional memory. 3.2 Software Transactional Memory Shavit and Touitou in  propose a software-based implementation of transac- tional memory. They call their approach Software Transactional Memory (STM) and describe it as a novel design that supports ﬂexible transactional programming of synchronization operations in software. While they admit in the introduction that they cannot aim for the same performance as hardware-based implementa- tions, they comment that STM has advantages in terms of applicability to today’s machines, portability and resiliency in the face of timing anomalies and processor failures. Their implementation supports static transactions, which are transactions that access a pre-determined set of memory locations. It is also non-blocking, which means that threads competing for a shared resource will never have their execution indeﬁnitely postponed by mutual exclusion. The implementation in  uses two data structures of size M (M being the number of memory locations): M emory and Ownerships. The Memory data structure is a vector which conatins the data stored in the transactional memory while Ownerships is a vector which stores records which identify which transaction owns a particular cell in memory. Each process i keeps a record (pointed to by Reci ) that stores information about the current transaction in progress (this can of course be null). Reci contains a number of ﬁelds including: Add, which is a vector of the addresses in the transaction, Size, which stores the size of the data set, and OldV alues, which is a vector that will contain the former values stored in the involved locations. V ersion is an integer ﬁeld, initially zero, which is incremented every time a process terminates a transaction. This ﬁeld is used to determine the instance number of a transaction. A process initiates a transaction by calling the StartT ransaction routine which is given in the paper. This routine initializes the process’ record, executes the transaction with the T ransaction routine and checks if the transaction succeeded. If it did, it returns the vector of OldV alues. The T ransaction routine (also given in the paper) ﬁrst tries to acquire ownership on the data set’s locations. If it succeeds, the process writes the old values into the transaction’s record, calculates the new values to be stored and writes them to memory. If it fails, it returns the location that caused the failure. Shavit and Touitou provide detailed correctness proofs for their algorithm as well as an empirical evaluation that compares the performance of STM to other software methods and they conclude that STM is very much competitive. Since Shavit and Touitou’s work in 1997, there have been many papers written proposing various improvements to STM. In 2003 Herlihy et al. published work on software transactional memory for dynamic-sized data structures . While prior STM designs require both the memory usage and transactions to be stati- cally deﬁned in advance, this work allows transactions and transactional objects to be created dynamically. They contend that their Dynamic Software Transac- tional Memory (DTSM) system is much better suited than the previous work to the implementation of dynamic-sized data structures such as lists and trees. Another interesting contribution to STM comes from Robert Ennals’ 2005 paper called ”Eﬃcient Software Transactional Memory” . Ennals points out that on modern multi-processor machines on which STM implementations are designed to run, cache behavior has a signiﬁcant eﬀect on performance. His work aims to min- imize cache contention and does so by making some deviations from previous STM designs including storing object versioning information inline and not guaranteeing that a transaction will make progress while another transaction is descheduled by the operating system. He notes that the latter deviation is theoretically inelegant but the testing shows that his algorithm signiﬁcantly improves on the performance of previous STM algorithms. 3.3 Hybrid Approach Damron et al.  propose a hardware/software hybrid approach to transactional memory. This is motivated by the fact that both hardware and software solutions present signiﬁcant limitations. While Herlihy and Moss  showed that bounded- size atomic transactions could be supported using simple modiﬁcations to current cache mechanisms, they cannot guarantee that any implementation will be suﬃcient for all transactions; there can always be a small fraction of transactions that will not be supported. This requires programmers to account for architecture-related limitations of hardware transactional memory and that erodes the beneﬁts of trans- actional memory with regards to ease of use. Software transactional memory allows transactions to be unbounded, but comes with a signiﬁcant increase in overheads. Not only do both approaches have these deﬁciencies, but they also are unlikely to be widely accepted. Hardware manufacturers are unlikely to be willing to produce chips with transactional support because there are no software implementations that would make use of it. At the same time, software developers are unlikely to write software that use transactions since there is no hardware support available. This chicken and egg problem is one of the main issues that Damron et al.’s Hybrid Transactional Memory (HyTM) aims to resolve. The main idea of HyTM is that the system attempts to execute a transaction in hardware if hardware support is available and the transaction does not exceed the hardware’s limitations. If this fails then the system transparently executes the transaction in software. The programmer does not need to be concerned with hardware limitations and can just use transactions whenever he sees ﬁt. Since any transaction that cannot be handled by hardware will be handled in software instead, the prevalence of HyTM will allow hardware designers to focus on solutions that will handle the majority of transactions without having to worry about extreme cases. In other words, HyTM will allow the hardware designers to build best-eﬀort hardware transactional memory instead of having to provide guarantees on bounds. HyTM will also allow programmers to write programs using transactional memory even before hardware support is available. The performance of these programs will progressively improve as chips with better HTM support are released. This will motivate the hardware designers to design chips with better HTM support since there will be programs out there that can be improved. The results in  demonstrate that HyTM in software-only mode (no hardware support) provides much better scalability than simple coarse-grained locking and even comparable scalability to ﬁne-grained locking which is very diﬃcult to pro- gram. They also show that meager hardware support improves performance even more, which is what was expected. The hope is that programmers will start writing transactional code using HyTM and this will motivate processor designers to sup- port transactions in hardware and put more eﬀort into researching ways to make best-eﬀort HTM even faster. 4 Limitations of and Skepticism Towards Transactional Memory With all of the beneﬁts that have been associated with transactional memory, it may seem a bit surprising that this parallel programming paradigm has yet to take the multi-core world by storm. A very recent paper by Cascaval et al.  explores why transactional memory is still only a “research toy”. Some of the limitations of transactional memory they discuss have already been covered in this report. For example, they point out that STM leads to too much overhead with respect to performance. They also discuss how HTM capacity constraints lead to signiﬁcant performance degradation when overﬂow occurs. Cascaval et al. also point out some transactional semantics issues independent of the hardware vs. software decision that break the ideal transactional programming model. The ﬁrst issue is the problem of transactional code interacting with non- transactional code. There will always be systems with legacy code and thus this issue needs to be considered. It is unclear how to deal with shared data outside of a transaction (i.e. how to tolerate weak atomicity) and how to deal with locks being used inside transactions. Another issue is how to deal with exceptions. There needs to be an elegant mechanism to handle exceptions and propagate exception information from within a transactional context. Yet another issue is that of code that cannot be transactionalized, such as when I/O is required. They also note that the non-determanism introduced by aborting transactions makes debugging very complicated as it may be diﬃcult to reproduce bugs when they occur. They conclude that given all of these issues and the high transactional overheads, transactional memory has not yet matured to the point where it will be widely adopted. Aside from discussing all of these drawbacks related to transactional memory, Cas- caval et al. also perform a number of tests comparing STM implementations to purely sequential code using a number of popular benchmarks. Their results are not very promising as in most cases the STM implementations perform equally well or worse than the sequential code. There is certainly an uphill battle ahead for transactional memory but there is certainly still hope. I have already discussed HyTM which may lead to improved performance if it is adopted by programmers and hardware developers. The next section of this survey paper describes work that has been done to deal with some of the semantic issues mentioned above such as integrating transactions with locks and dealing with code that requires I/O. 5 Deciding When to use Transactions Dynamically Rossbach et al. present a very interesting way to deal with some of the issues discussed above with their work titled TxLinux . Their work is again motivated by the fact that programming with locks is very diﬃcult. They thus decided to replace as many critical sections covered by locks in Linux as they could with hardware transactions. Unfortunately, as we have seen, not all critical sections can be replaced with transactions. For example, if the code contains I/O a transaction cannot be used because the action cannot be undone. They also had issues of idiosyncratic locking that were just to complicated to replace with transactions. The task of converting Linux to use transactions took them a year with 6 developers working full time. This is mainly because they had to spend so much time ﬁguring out which critical sections could be replaced (in the end 30% of lock calls were replaced). This experience motivated them to invent an ingenious new parallel programming mechanism called Cooperative Transactional Spinlocks (Cxspinlocks). Cxspinlocks allow critical sections to dynamically decide wether to use locks or transactions. Most critical sections will attempt to use transactions. If the code attempts an I/O operation, the transaction will rollback and a lock will be used instead. Using Cxspinlocks the developers were able to convert Linux to TxLinux in 1 month with only one developer. The performance of TxLinux shows very small speedups over Linux according to the testing in . This is especially impressive considering the goal of Cxspinlocks is more to make parallel programming easier than anything else. Also Linux is highly optimized for performance and thus any small improvement is an impressive feat. The Cxspinlock API contains three main operations. cx optimistic is an instruction used to optimistically attempt to execute a critical section using a transaction. If I/O is encountered, the transaction reverts and is called with cx exclusive which acquires a lock for the critical section. cx end signals the end of a critical section. A contention manager is used to decide which process should proceed if more than one process is trying to modify a shared variable using locks or transactions. This contention manager can be optimized to satisfy diﬀerent goals of the system. Another contribution of  is that they suggest that the TxLinux contention man- ager should communicate with the OS scheduler in order to support OS goals such as avoiding priority inversion. They accomplish this by introducing the os-prio contention management policy. With os-prio the OS communicates priority to the transactional memory hardware and the contention manager always decides in fa- vor of higher priority processes. os-prio defaults to other policies when necessary. According to the tests presented in the paper, os-prio eliminates 100% of priority inversion and introduces a negligible performance cost. Rossbach et al.’s work on TxLinux brings a lot of hope to the potential adoption of transaction memory. It allows locks and transactions to cooperate with negli- gible performance costs and thus resolves some of the semantic issues regarding transactional memory that were discussed in section 4 of this paper. 6 Conclusion Transactional memory has been shown in many ways to be a good alternative to using locks for writing parallel programs. While locks are messy and complicated, transactional memory primitives are elegant and allow code synchronization sections to be easily implemented and understood by developers. This survey paper has discussed both hardware and software transactional memory implementations and has identiﬁed beneﬁts and drawbacks to each approach. It appears that the best solution is the HyTM hybrid approach which contains the performance beneﬁts from HTM and the unboundedness of STM. I have also discussed semantic problems with transactional memory that are unrelated to the hardware vs. software discussion. A really good solution for many of these issues is the TxLinux work which uses Cxspinlocks to allow locks and threads to work together and to use transactions only when they are appropriate. In my opinion the future of transactional memory will be a combination of HyTM and Cxspinlocks. While it may still take a while to work out the various kinks, I feel that the necessity for better parallel programming solutions will drive the eventual adoption of transactional memory. As they predict in the HyTM paper, it appears that once the adoption of transactional memory begins it will have the potential to pick up momentum and make a very large impact on software development in the long run. References  Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Chris- tos Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. In SOSP ’07: Proceedings of twenty-ﬁrst ACM SIGOPS symposium on Operating systems principles, pages 159–174, New York, NY, USA, 2007. ACM.  Andrew D. Birrell. An introduction to programming with threads. Technical report, Research Report 35, Digital Equipment Corporation Systems Research, 1989.  Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, and Siddhartha Chatterjee. Software transactional memory: why is it only a research toy? Commun. ACM, 51(11):40–46, 2008.  Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. In ASPLOS-XII: Pro- ceedings of the 12th international conference on Architectural support for pro- gramming languages and operating systems, pages 336–346, New York, NY, USA, 2006. ACM.  Robert Ennals. Eﬃcient software transactional memory. Technical Report IRC-TR-05-051, Intel Research Cambridge Tech Report, Jan 2005.  James R. Goodman. Using cache memory to reduce processor-memory traf- ﬁc. In ISCA ’83: Proceedings of the 10th annual international symposium on Computer architecture, pages 124–131, Los Alamitos, CA, USA, 1983. IEEE Computer Society Press.  Maurice Herlihy, J. Eliot, and B. Moss. Transactional memory: architectural support for lock-free data structures. In in Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289–300, 1993.  Maurice Herlihy, Victor Luchangco, Mark Moir, and III William N. Scherer. Software transactional memory for dynamic-sized data structures. pages 92– 101, Jul 2003.  Simon Peyton Jones. Beautiful Code, chapter 24. O’Reilly, 2007.  Jim Larus and Ravi Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007.  Yossi Lev and Jan-Willem Maessen. Toward a safer interaction with trans- actional memory by tracking object visibility. In Proceedings, Workshop on Synchronization and Concurrency in Object-Oriented Languages. San Diego, CA, October 2005.  Ravi Rajwar and James R. Goodman. Transactional lock-free execution of lock-based programs. In Proceedings of the Tenth Symposium on Architectural Support for Programming Languages and Operating Systems, pages 5–17. Oct 2002.  Ravi Rajwar, Maurice Herlihy, and Konrad Lai. Virtualizing transactional memory. In ISCA ’05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 494–505, Washington, DC, USA, 2005. IEEE Computer Society.  Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ra- madan, Bhandari Aditya, and Emmett Witchel. Txlinux: using and managing hardware transactional memory in an operating system. In SOSP ’07: Proceed- ings of twenty-ﬁrst ACM SIGOPS symposium on Operating systems principles, pages 87–102, New York, NY, USA, 2007. ACM.  N. Shavit and D. Touitou. Software transactional memory. Distributed Com- puting, Special Issue, 10:99–116, 1997.