Bytepawn Marton Trencseni on Software, Systems and other Ideas.

Transactions in Memory

2008/09/01

I recently re-read Chapter 24 of Beautiful Code, titled Beautiful Concurrency (PDF). It's very well written, you won't regret reading it. In it, Simon Peyton Jones makes the case for Software Transactional Memory vs. explicit locking by the programmer.

Getting locking right in a non-trivial system is pretty hard. SPJ's list of pitfalls is almost exhaustive:

  1. Taking too few locks
  2. Taking too many locks
  3. Taking the wrong locks
  4. Taking the locks in the wrong order
  5. Freeing locks on error

I would a zeroth point: figuring out what to lock. This precedes everything else you do with locks.

Software Transactional Memory is, per Wikipedia:

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It functions as an alternative to lock-based synchronization. A transaction in this context is a piece of code that executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instant in time; intermediate states are not visible to other (successful) transactions.

An example is worth a thousand words:

// change a1 and a2 such that a1 + a2 = const
// like transferring money between bank accounts
 
// lock-based version
lock l;
double a1, a2;
void transfer(double amount)
{
 lock(l);
 a1 = a1 - amount; // suppose this is atomic
 a2 = a2 + amount;
 unlock(l);
 }
}
 
// STM
stm_double a1, a2;
void transfer(double amount)
{
 transaction_start(); // like 'atomic' in Haskell
  a1 = a1 - amount;
  a2 = a2 + amount;
 transaction_end();
}

No big difference. But wait, what if you have several a's, say several accounts. You want transfer(a1, a2, amount). With locks, your code just got a whole lot more complicated. With STM, it's a free lunch.

STM solves most of the problems listed by SPJ, but cannot solve the zeroth problem: you still have to figure out what the lockspots are, declare them to be transactional variables, and then guard them with transactions. But with STMs, all you have to do is guard with, you don't have to worry about locking policy.

But remember, there are no silver bullets. There are two problems with STM:

1. No transactional I/O. M stands for Memory, meaning that it can only guard memory areas, but not I/O. On some systems, like Haskell, the STM library does not even let you mix STM and I/O operations, so you can't have I/O operations inside atomically { } (it won't compile). The reason is, you don't want to execute the I/O code twice in case the transaction has to be re-run.

2. Library overhead. You just went from a few instructions to write a variable to including a whole library that manages reading and writing variables for you. If the code you are writing is not high-performance, this is probably acceptable. I'd consider using STM when working in Java, .NET or Python --- here, raw speed is not possible anyway, so trading some more CPU time for simpler and less buggy code is acceptable.

How much speed are you sacrifying when using STM? To find out, I downloaded and compiled TinySTM, a C library implementing the basic STM primitives like stm_start() and stm_commit().

It turns out that the transaction manager's design matters a whole lot. Some good papers are:

To test performance I created the simplest program possible (source code), with just one long integer called counter, NTHREADS many threads each wanting to increment counter NLOOPS many times. (This test is by no means representative, but it's something. You have to see what happens for your real workload.)

For comparison, I measured

  1. a naive, non-locking implementation which is not correct, ie. the final value of counter is not NTHREADS * NLOOPS.
  2. pthread mutex based implementation with one lock for counter
  3. TinySTM based implementation with transactions using Write-back + Encounter Time Locking (ETL)
  4. TinySTM based implementation with transactions using Write-back + Commit Time Locking (CTL)

The results are in milliseconds, first number is for naive, second for lock-based, third for STM-based implementation:

NTHREADS = 10 NTHREADS = 100 NTHREADS = 1000
NLOOPS = 10 0 | 0 | 40 | 10 7 | 7 | 1440 | 18 112 | 153 | 1148071 | 147
NLOOPS = 100 0 | 6 | 54 | 8 6 | 100 | 2065 | 20 93 | 1037 | 1204106 | 875
NLOOPS = 1000 0 | 95 | 54 | 11 6 | 898 | 1200 | 40 99 | 9937 | 566714 | 379

Note that I didn't repeat the experiment many times and take an average value, but the trend is quite clear (and repeatable). As you get more and more threads competing for the same resource, the STM ETL implementation becomes unfeasable (third number), while the STM CTL implementation (fourth number) beats my locking code (second number).

What's the difference between the two STM architectures? With ETL, as soon as the transaction reads a variable, that variable is locked and not released until the transaction finishes. With CTL, the variables are not locked until the transaction attempts to commit. The problem with ETL is that whenever a thread acquires a lock on counter, and all the other threads start aborting and re-executing, hogging down the CPU. It's not clear why the CTL version is faster than ETL, I assume it's because threads get lots of timeslices during the execution of one one loop, so the ETL version aborts many more times than the CTL version. Why is it faster then the pthread version? TinySTM uses atomic_ops, which contains machine-specific optimizations, which the pthread library probably does not.

The lesson is:


- Marton Trencseni


blog comments powered by Disqus