Saturday, November 17, 2007

Poor-man's Closure

So C++ is lacking in some areas of modern programming language design, that's not a big secret. But given the flexibility of C++ the question becomes: how hard is it to add missing features? Here I'm going to take a look at closures and lambda expressions.

First off we can talk about closures. As far as I know there it isn't possible to add full closures to C++. But we can get a poor man's closure using the boost::bind library. If you don't know about this library yet you should run (not walk) over to boost.org and check it out. If you work at all with STL algorithms it will cut down on the amount of code you write immensely. So for a demonstration of the usefulness of boost::bind as a sort of closure mechanism lets say that you want to add two to all the elements in some std::vector<int>. You could do this using std::binder1st and std::mem_fun but that's kind of a pain in the ass. Instead you can do

std::transform(values.begin(), values.end(), 
               values.begin(),
               boost::bind(std::plus<int>(), _1, 2));

Things start getting interesting when you want to use bind to modify a value from outside of your STL algorithm. Take a look at this:

int lastVal = 0;
std::for_each(values.begin(), values.end(),
              boost::bind(sum, boost::ref(lastVal), _1));

Assuming that sum is a function that takes two ints and puts the result into it's first argument then this would leave the sum of the vector in lastVal 1.

This last example also can be used to illustrate why boost::bind doesn't create a true closure. Suppose that instead of using our bind expression immediately we return it using boost::function 2. When we try to use the bind object very bad things will happen, a crash if we're lucky. This happens because the boost::ref object in the bind object is referencing a variable that no longer exists3. If we had a true closure then the variable would continue to exist as long as the closure did and we'd be safe. Without garbage collection I don't see how to make this work4.

Once you have closures then lambda expressions (or local function definitions) become very useful. Unfortunately C++ has neither. There is the boost::lambda library which can be useful in this area but it is somewhat limited and in my experience it doesn't work so hot with VC++, even the newest incarnation of the compiler5. We can get local function definitions though by making a local structure definition6:

int sum(const std::vector<int>& values)
{   
    int total = 0;

    struct Calculate
{
    Calculate(int& total_) : m_total(total_) {}

    int& m_total;

    void operator()(int v2)
    {
    m_total += v2;
    }           
}

std::for_each(values.begin(), values.end(), Calculate(total));

return total;
}

One thing to note about the above is that in order to pull total in to the Calculate closure we had to have Calculate hold a reference to it7. Local structures like Calculate can refer to local variables in the enclosing function but only if they are static. Now we probably don't want to go declaring total static since that will lead to reentrancy problems. This is unfortunate, if we didn't have this static only restriction then Calculate would be much more compact. We can use bind to build the closure though and make things a bit more compact8:

int sum(const std::vector<int>& values)
{   
    int total = 0;

    struct Calculate
{
    static void run(int& total_, int v)
    {
    total_ += v;
        }           
}

std::for_each(values.begin(), values.end(), 
              boost::bind(Calculate::run, boost::ref(total), _1));

return total;
}

While this makes things a bit more compact the difference when you're binding more variables is much more dramatic. If we could make local function definitions the change in size would be slightly better still. If we didn't have the refer only to static local variables restriction it would be even better.

So we can get pretty close to closures and local function definitions in C++ without having them built into the language. Obviously you have to be careful to keep your local function definitions short in order to maintain readability. While it's nice to have a short block of code that you're passing into a function defined right where you pass it in, once the code gets too long it will start obscuring the enclosing functions flow and really belongs off on its own. But even if the block of code in question is defined outside of your function you can still close over it with bind.


  1. I realize this is a really contrived example and that there are much better ways to sum a sequence (std::accumulate comes to mind). But this example is a simple illustration of a technique that is useful when combined with the poor man's lambda that I talk about later.
  2. Note that we can't return the bind expression directly because the type of boost::bind(sum, boost::ref(lastVal), _1)) is some hideous thing that the documentation calls various variations on unspecified. Instead we need to capture the bind expression in a boost::function<void(int)> and return that.
  3. You might be debating the usefulness of this construct and you'd be right. In this case it's not so useful to create a bind the binds to a local variable and is returned from the function where that local variable exists. But we could instead bind to a member variable in an object and register that bind object as a callback somewhere. This is useful but puts you on dangerous ground where we need the object that contains the bound member variable to outlast the callback registration.
  4. One could address this by cluttering up the code with boost::shared_ptr's but that can quickly get messy. I've heard that the C++ standardization committee is talking about adding closures to the language but for the life of me I can't figure out how that's going to work without garbage collection. Maybe they're only talking about adding closures to Managed C++.
  5. Visual C++ * with the service pack at the time of this writing.
  6. Yep, you can make structure definitions within a function definition. I didn't know about this until a few months ago and it has made my life a lot easier. One thing to note is that you cannot have static members in a local structure so their usefulness can be somewhat curtailed in some situations. You also can't use templates at all in the local structure.
  7. The underscore on the end of total_ is only there to make explicit that the constructor parameter is different from the total local variable. The constructor parameter could just as well have been called total since nothing in Calculate can refer to the local variable total.
  8. Note that if you're using VC++ 2005 without the service pack a compiler bug could prevent this code from compiling.