placeholder: programming

Showing posts with label programming. Show all posts

Friday, February 13, 2009

Hoops to Jump Through

Whose idea was it for C++ classes that have virtual functions to not automatically have a virtual destructor? There is just no reason for this not to be the case. Not having a virtual destructor is just not going to ever do that right thing when you're dealing with a polymorphic class. Most of the time it's probably going to be a silent problem that you're dealing with. My last encounter with this wonderful bit of the C++ obstacle course resulted in a not insignificant memory leak when I failed to jump through the hoop. I have yet to see a case where it would make sense for a polymorphic class to not a virtual destructor. Sometimes I really do think that C++ is just some colossal joke that Stroustrup is playing on us.

disable_shared_from_this

About once a year I end up thinking that using boost::enable_shared_from_this is a good idea. And every time it ends up wasting almost an hour of my time. You see there's a restriction on shared_from_this that disallows its use in the constructor of the object that is being shared from this. Unfortunately this restriction is not mentioned in the documentation. I always try to use shared_from_this from the constructor and get bad weak pointer errors at run time. I then spend twenty minutes flailing around in the source code trying to figure out what I'm doing wrong until finally a bit of googling brings the not in the constructor restriction out in the open. Once I realize this and start trying to fix things so I don't need to use shared_from_this in the constructor I find that I don't need enable_shared_from_this at all. This happens every time. I'm guessing that most of the boost folks have had the same experience as well since they don't go out of their way to call attention to enable_shared_from_this. Maybe I should stay away from libraries whose documentation can only be reached from a link in an answer to a FAQ about another library. That is just not a good sign.

Update 3-11-09: Of course now that I've posted this I've had two occasions where boost::enable_shared_from_this turned out to be useful.

Wednesday, May 7, 2008

Blame the Management

So I've come up with a new reason that C++'s lack of garbage collection is problematic. The strange thing is that the reason is in one of C++'s usual sweet spots: performance.

Really I should qualify the previous statement, I'm talking about performance in multi-threaded programs where you actually want your software to work in some sane fashion as opposed to just flailing around and crashing. In order to accomplish this you need to be making liberal use of boost::shared_ptr or something similar in order to avoid having the rug pulled out from under you on a regular basis¹.

Turns out that the shared_ptr has a dirty little secret: copying one in a multi-threaded program is orders of magnitude slower than copying a bare pointer. The reason for this is that the shared_ptr has a reference count in it. When you copy the shared_ptr that reference count needs to be incremented in a thread-safe manner. On platforms that support it this is done using atomic integer operations, otherwise OS synchronization primitives come into play. Either way you're talking a lot more cycles than just a pointer copy since at minimum you need to wait for the atomic operations to maintain some semblance of processor cache sanity.

So does this mean that we abandon shared_ptr in multi-threaded programs? In C++ the alternative (bare pointers) is much, much worse so I would say no.

Now if we had a modern² garbage collected system we wouldn't need to sacrifice cycles to the cache sanity gods. Instead we could copy pointers to our hearts content. Occasionally we would need to give the garbage man some cycles so in a way the garbage collector is amortizing the cycles we need to spend on memory management. But if well implemented this is a much lower tax to pay. In extreme circumstances you could even take control of the collector and only allow it to run when it's not going to get in the way of more important things.

So how do we mitigate things? One option would be to hook up the Boehm garbage collector to our programs. At some point I'm going to sit down and benchmark the collector versus boost::shared_ptr. But that might not be an option depending on how much legacy code we're talking about.

If you're stuck with no garbage collection then the thing to do is minimize how often a shared_ptr is being copied. In the vast majority of cases you should take a const reference to a shared_ptr as parameters. Unless you're doing something wrong then somewhere either on the stack or shudder in a global variable³ there is an instance of the shared_ptr so you shouldn't have to worry about losing the object pointed at while your function is running. The exception to this is when passing the shared_ptr to another thread. In that case you will need to create a copy of the shared_ptr for the other thread. Note that taking a const reference saves you from a double-whammy. If you take the pointer by value then you pay for the new shared_ptr instance when the function is called and again when the shared_ptr instance is destroyed as the function exits.

You should also prefer plain old dynamic_cast over dynamic_pointer_cast. The latter will increment the reference count on the pointer being casted. If you have the original shared_ptr then you know the object being pointed at isn't going away so just cast the underlying pointer. Obviously if you are going to store the pointer away some where you'll have to pay for the copy.

Another thing to watch out for is locking boost::weak_ptr. You pay every time since the reference count will be locked. If you are by chance using the pointer value without using the pointed to object all the time then consider storing the bare pointer along with the weak_pointer. Then you can use the pointer value whenever without paying to lock but still have the object lifetime monitoring of weak_ptr when you need it.

boost::shared_ptr is great but be aware that it's use isn't free. And while it isn't free the alternative is worse, as long as we're not referring to real garbage collection, which is better.

I suppose one could argue that by using a reference counted smart pointer one is using garbage collection. Of course if you're driving a Model T you're driving car also.↩
read non-reference counted.↩
I'm looking at you singleton.↩

Saturday, November 17, 2007

Poor-man's Closure

So C++ is lacking in some areas of modern programming language design, that's not a big secret. But given the flexibility of C++ the question becomes: how hard is it to add missing features? Here I'm going to take a look at closures and lambda expressions.

First off we can talk about closures. As far as I know there it isn't possible to add full closures to C++. But we can get a poor man's closure using the boost::bind library. If you don't know about this library yet you should run (not walk) over to boost.org and check it out. If you work at all with STL algorithms it will cut down on the amount of code you write immensely. So for a demonstration of the usefulness of boost::bind as a sort of closure mechanism lets say that you want to add two to all the elements in some std::vector<int>. You could do this using std::binder1st and std::mem_fun but that's kind of a pain in the ass. Instead you can do

std::transform(values.begin(), values.end(), 
               values.begin(),
               boost::bind(std::plus<int>(), _1, 2));

Things start getting interesting when you want to use bind to modify a value from outside of your STL algorithm. Take a look at this:

int lastVal = 0;
std::for_each(values.begin(), values.end(),
              boost::bind(sum, boost::ref(lastVal), _1));

Assuming that sum is a function that takes two ints and puts the result into it's first argument then this would leave the sum of the vector in lastVal ¹.

This last example also can be used to illustrate why boost::bind doesn't create a true closure. Suppose that instead of using our bind expression immediately we return it using boost::function ². When we try to use the bind object very bad things will happen, a crash if we're lucky. This happens because the boost::ref object in the bind object is referencing a variable that no longer exists³. If we had a true closure then the variable would continue to exist as long as the closure did and we'd be safe. Without garbage collection I don't see how to make this work⁴.

Once you have closures then lambda expressions (or local function definitions) become very useful. Unfortunately C++ has neither. There is the boost::lambda library which can be useful in this area but it is somewhat limited and in my experience it doesn't work so hot with VC++, even the newest incarnation of the compiler⁵. We can get local function definitions though by making a local structure definition⁶:

int sum(const std::vector<int>& values)
{   
    int total = 0;

    struct Calculate
{
    Calculate(int& total_) : m_total(total_) {}

    int& m_total;

    void operator()(int v2)
    {
    m_total += v2;
    }           
}

std::for_each(values.begin(), values.end(), Calculate(total));

return total;
}

One thing to note about the above is that in order to pull total in to the Calculate closure we had to have Calculate hold a reference to it⁷. Local structures like Calculate can refer to local variables in the enclosing function but only if they are static. Now we probably don't want to go declaring total static since that will lead to reentrancy problems. This is unfortunate, if we didn't have this static only restriction then Calculate would be much more compact. We can use bind to build the closure though and make things a bit more compact⁸:

int sum(const std::vector<int>& values)
{   
    int total = 0;

    struct Calculate
{
    static void run(int& total_, int v)
    {
    total_ += v;
        }           
}

std::for_each(values.begin(), values.end(), 
              boost::bind(Calculate::run, boost::ref(total), _1));

return total;
}

While this makes things a bit more compact the difference when you're binding more variables is much more dramatic. If we could make local function definitions the change in size would be slightly better still. If we didn't have the refer only to static local variables restriction it would be even better.

So we can get pretty close to closures and local function definitions in C++ without having them built into the language. Obviously you have to be careful to keep your local function definitions short in order to maintain readability. While it's nice to have a short block of code that you're passing into a function defined right where you pass it in, once the code gets too long it will start obscuring the enclosing functions flow and really belongs off on its own. But even if the block of code in question is defined outside of your function you can still close over it with bind.

I realize this is a really contrived example and that there are much better ways to sum a sequence (std::accumulate comes to mind). But this example is a simple illustration of a technique that is useful when combined with the poor man's lambda that I talk about later.↩
Note that we can't return the bind expression directly because the type of boost::bind(sum, boost::ref(lastVal), _1)) is some hideous thing that the documentation calls various variations on unspecified. Instead we need to capture the bind expression in a boost::function<void(int)> and return that.↩
You might be debating the usefulness of this construct and you'd be right. In this case it's not so useful to create a bind the binds to a local variable and is returned from the function where that local variable exists. But we could instead bind to a member variable in an object and register that bind object as a callback somewhere. This is useful but puts you on dangerous ground where we need the object that contains the bound member variable to outlast the callback registration. ↩
One could address this by cluttering up the code with boost::shared_ptr's but that can quickly get messy. I've heard that the C++ standardization committee is talking about adding closures to the language but for the life of me I can't figure out how that's going to work without garbage collection. Maybe they're only talking about adding closures to Managed C++.↩
Visual C++ * with the service pack at the time of this writing.↩
Yep, you can make structure definitions within a function definition. I didn't know about this until a few months ago and it has made my life a lot easier. One thing to note is that you cannot have static members in a local structure so their usefulness can be somewhat curtailed in some situations. You also can't use templates at all in the local structure.↩
The underscore on the end of total_ is only there to make explicit that the constructor parameter is different from the total local variable. The constructor parameter could just as well have been called total since nothing in Calculate can refer to the local variable total.↩
Note that if you're using VC++ 2005 without the service pack a compiler bug could prevent this code from compiling.↩

Thursday, September 13, 2007

Missing the point

OK, I'll admit. I've jumped on the SICP bandwagon. I've worked through about half the book¹ and am really liking it. Of course I've come to it a bit late to the book not being a computer science graduate, though it seems these days one probably wouldn't get exposed to it anyway or so I hear. It seems like one of those books that's great but people dread taking a class that uses it, kind of like Jackson or Goldstein for physics majors. Anyways, had the bandwagon not come by for me to jump on I probably wouldn't have ever found it so here's to bandwagons.

Enough blowing smoke up Abelman and Sussman's asses about their book. It's great, we all get it. Though it seems like some people have decided they liked the book but not the language and are going to do something about it. And to me it seems like that's missing the point. Sort of like how people go all ga-ga over peanut butter and chocolate together², like somehow these two things are just meant to go together. Well in this case they are. A&B chose lisp for a reason, and not just because they're lisp weenies.

The way I feel about it is that there are certain language categories that everyone programmer should learn at least one language from. They are machine level ³, static imperative ⁴, dynamic ⁵, functional ⁶ and lisp ⁷.

Learning a machine level language will help you understand what the machine is doing with your code. Learning lisp will teach you what the compiler is doing with your code. Think of it as the assembly language of computation, whereas assembly is the, well, assembly language of computers. That's why A&B chose it. While there are other languages that look more like lambda calculus (ML anyone?), when you're writing lisp you're writing out the abstract syntax trees for your program. And understanding how you're programs goes from source code to a computation is powerful stuff.

Now I don't want to bag on the folks at SICP in other language but if you use another language when working through SICP then you're sort of missing the point. Going back and using the exercises later when you're learning a new language is fine, but the first time through use lisp. It won't bite but you might wear out your ( and ) keys.

Kind of slow going, but when you've got a two year old at home you don't get a many solid blocks of time to too deeply into things.↩
Never understood that one myself. Sure they taste OK together but I'm actually happier keeping them apart. But what do I know, I don't Even like my ice cream with chunky bits in it and have problems using my microwave.↩
C is probably good enough but some assembly wouldn't hurt.↩
Java, C#, C++, etc. For better or worse it's where the jobs are so you probably can't avoid learning at least one of these language.↩
Python, Ruby, Javascript, etc.
↩
Preferably pure, lazy and strongly typed, but I guess we can compromise on a point or two.↩

Monday, June 25, 2007

pointer-less?

One thing I remember about Java back when I was working on enterprisey sorts of things is that for a language that supposedly has no pointers it sure seemed like we got a lot of NullPointerExceptions. NullPointerException was definitely the biggest cause of production bugs followed closely by ClassCastException. So if a language is purported to not have pointers why do they seem to be causing so many problems? Isn't Java supposed to be better than C++ due to this lack of pointers?

The problem is that Java does have pointers. In fact everything that isn't a primitive type is handled through pointers. There is no stack/heap dichotomy like there is in C/C++: you either have a primitive type that lives on the stack (or is part of a non-primitive object) or you have a non-primitive type that lives ~~on the heap~~ in the object store. These non-pointer pointers do have some advantages over their C/C++ cousins, the biggest being that dereferencing a null pointer doesn't bring your whole program crashing down around you. Ah the advantages of a managed run-time where an invalid pointer dereference doesn't have to treated as a possible attack and require the slaying of the perpetrator with extreme prejudice.

While it is nice that you can catch NullPointerExceptions, it would be nicer if people actually did it more than half the time. Like I said above: when I worked on enterprisey stuff NullPointerException was the single biggest cause of production problems. Now I don't think that me or any of the people I was working with in said shop were morons. And it seemed like failure to account for the possibility of null pointers was one of the biggest problems we saw in programming tests we gave during interviews. I'd say that this is a product of the "java has no pointers" mentality but C/C++ programmers make their fair share of the same sort of error. Maybe certain languages that have done away with pointers altogether are on to something. Anyway, maybe keep this in mind the next time you're bagging on C/C++ pointers compared to Java/C# keep in mind that you're not really that far ahead of the game, a few steps maybe but not leaps and bounds.

placeholder