Wednesday, May 7, 2008

Blame the Management

So I've come up with a new reason that C++'s lack of garbage collection is problematic. The strange thing is that the reason is in one of C++'s usual sweet spots: performance.

Really I should qualify the previous statement, I'm talking about performance in multi-threaded programs where you actually want your software to work in some sane fashion as opposed to just flailing around and crashing. In order to accomplish this you need to be making liberal use of boost::shared_ptr or something similar in order to avoid having the rug pulled out from under you on a regular basis1.

Turns out that the shared_ptr has a dirty little secret: copying one in a multi-threaded program is orders of magnitude slower than copying a bare pointer. The reason for this is that the shared_ptr has a reference count in it. When you copy the shared_ptr that reference count needs to be incremented in a thread-safe manner. On platforms that support it this is done using atomic integer operations, otherwise OS synchronization primitives come into play. Either way you're talking a lot more cycles than just a pointer copy since at minimum you need to wait for the atomic operations to maintain some semblance of processor cache sanity.

So does this mean that we abandon shared_ptr in multi-threaded programs? In C++ the alternative (bare pointers) is much, much worse so I would say no.

Now if we had a modern2 garbage collected system we wouldn't need to sacrifice cycles to the cache sanity gods. Instead we could copy pointers to our hearts content. Occasionally we would need to give the garbage man some cycles so in a way the garbage collector is amortizing the cycles we need to spend on memory management. But if well implemented this is a much lower tax to pay. In extreme circumstances you could even take control of the collector and only allow it to run when it's not going to get in the way of more important things.

So how do we mitigate things? One option would be to hook up the Boehm garbage collector to our programs. At some point I'm going to sit down and benchmark the collector versus boost::shared_ptr. But that might not be an option depending on how much legacy code we're talking about.

If you're stuck with no garbage collection then the thing to do is minimize how often a shared_ptr is being copied. In the vast majority of cases you should take a const reference to a shared_ptr as parameters. Unless you're doing something wrong then somewhere either on the stack or shudder in a global variable3 there is an instance of the shared_ptr so you shouldn't have to worry about losing the object pointed at while your function is running. The exception to this is when passing the shared_ptr to another thread. In that case you will need to create a copy of the shared_ptr for the other thread. Note that taking a const reference saves you from a double-whammy. If you take the pointer by value then you pay for the new shared_ptr instance when the function is called and again when the shared_ptr instance is destroyed as the function exits.

You should also prefer plain old dynamic_cast over dynamic_pointer_cast. The latter will increment the reference count on the pointer being casted. If you have the original shared_ptr then you know the object being pointed at isn't going away so just cast the underlying pointer. Obviously if you are going to store the pointer away some where you'll have to pay for the copy.

Another thing to watch out for is locking boost::weak_ptr. You pay every time since the reference count will be locked. If you are by chance using the pointer value without using the pointed to object all the time then consider storing the bare pointer along with the weak_pointer. Then you can use the pointer value whenever without paying to lock but still have the object lifetime monitoring of weak_ptr when you need it.

boost::shared_ptr is great but be aware that it's use isn't free. And while it isn't free the alternative is worse, as long as we're not referring to real garbage collection, which is better.


  1. I suppose one could argue that by using a reference counted smart pointer one is using garbage collection. Of course if you're driving a Model T you're driving car also.
  2. read non-reference counted.
  3. I'm looking at you singleton.

2 comments:

Gavin Beatty said...

...
The Boehm-Demers-Weiser GC uses these apparently god awful atomic operations.

http://www.hpl.hp.com/research/linux/atomic_ops/

brett said...

I never said that atomic ops are god awful. Just that messing with the reference count via the atomic ops requires many more cycles, three to four times as many in my experience, then not having to worry about the reference count at all. Atomic ops like these are definitely preferable over locking a mutex when you can get away with it. We're probably talking 10X slow down versus 4x slowdown since the locking of the mutex will require more cache synchronization on top of the synchronization that is already being done on the reference count.

I would also expect that the Boehm collector has to do some sort of thread synchronization when running in a multi-threaded program. Due to the amount of legacy code I've got at work integrating the Boehm collector isn't really an option so giving it a good look hasn't been to high of a priority. Sounds like they have some form of lock-free algorithm for the collection if they're using atomic ops though.