PrevUpHomeNext

Use Cases And Alternatives

Place Holders
Generated Code
Work-in-progress
Notes to Self
Latent Defects
Grand Designs
Future Optimisations
The Hideous Hack

Place Holders

In another article, I describe my use of an Emacs elisp program to generate a skeleton C++ file, which—amongst other things—inserts placeholders for Doxygen comments.

These placeholders had TODOs in them for me to fill in. Or at least they used to! I have now decided I would rather leave a class undocumented than have it ever look like this:


/**
 * TODO write a description of MyNewClass.
 */
class MyNewClass {
    ....
};

The TODO in the example above serves no useful purpose—I'm well aware the new class needs describing, Doxygen will warn me should I forget—and as a consequence my elisp now generates smaller but better-formed skeleton files. This reflects more accurately the way I aspire to work: no broken code.

While we're on the subject, note that editor macros make it all too easy to create unfinished or extravagent comments. A block of asterisks is a poor way to fill a source file:


/* * * * * * * * * * * * * * * * * * * * * * * * * * * *
 *                                                     *
 *           M Y   N E W    C L A S S                  *
 *                                                     *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * */

Generated Code

My elisp metaprogram is of course a code-generator. When I realised it was producing broken output I stepped in and fixed it. With third-party code generators we may not be so lucky.

Consider a GUI builder which allows us to design our user interface and map buttons events to actions. The output of this builder is some auto-generated code with placeholders:


void MyDialog::OnButtonClick(Button button) {
    // TODO. Insert button action code here.
}

Clearly there's a potential problem if the auto-generation is repeated. Will any callbacks we implement be replaced with TODO's?

There's not much we can do about this—unless we are GUI builder writers, in which case we might consider better ways to separate the generated code from the application-specific implementation. (See [Reference Brown] for a more detailed discussion of this issue.)

Perhaps this is why many seasoned GUI developers drop the framework early on, preferring the control they get from hand-crafted code?

Work-in-progress

Suppose a programmer creates a new class, NewClass, whose interface offers clients, amongst other things, a pair of serialization functions:


NewClass {
    ....
public: // Serialization 
    void put(std::ostream & put_to) const;
    void get(std::istream & get_from);
    ....
};

The output function is quickly coded up. The input function turns out to be rather trickier. Now, the programmer sensibly wishes to put her work-in-progress into the source management system as soon as possible. She checks in an empty implementation:


void NewClass::get(std::istream & get_from) {
    // TODO
}

Here, we understand the TODO to mean that, although the code compiles, the implementation is incomplete. Client code shouldn't try to get instances of this new class just yet.

The trouble is, as far as client code is concerned—at both compile and run time—the comment has no effect. It's not hard to imagine a scenario where we accidentally end up reading in a NewClass, leading to some unwanted effect downstream, possibly in an apparently unrelated piece of code.

Any time spent tracking down such a problem has been wasted. The moment the TODO was written, we were aware of its exact location and cause.

A better technique is to call a function:


void NewClass::get(std::istream & from) const {
    NotYetImplemented();
}

Here, NotYetImplemented() might fire an assertion, raise an exception, or log an error message. At the very simplest, we could put a rough and ready macro into service:


#define NotYetImplemented() assert(!"Not yet implemented!")

As usual, the move from comment to code improves things. Now the system offers more useful diagnostics in the event of an unimplemented function being called: but it won't save us from trouble if this event happens once the software has been shipped. We are still reliant on someone remembering to finish what got started. Here, test-driven development techniques are invaluable. They are so important they merit fuller discussion later in this article.

Notes to Self

Consider the following struct:


struct FileSize { // TODO 64 bit
    unsigned long ms_4_bytes;
    unsigned long ls_4_bytes;  
};

I classify this TODO as a note-to-self since it may not be obvious to anyone but the programmer who wrote the comment exactly what should be done to the code on a 64 bit platform. If we grep around surrounding source files, we'll probably find a few similar comments and get a better idea of what's required.

Sometimes it's even more clear that we're dealing with a note-to-self.


struct FileSize { // TODO 64 bit. TAG
    ....
};

Here, the author uses his initials to stake a claim on the struct. He seems to be saying, "I know what needs doing, and I'll probably be the one who does it". Ideally, of course, when we undertake a 64 bit port, he will still be around to tie up these loose ends; even if he isn't, he has left us some useful pointers.

This isn't a bad use of TODO. It's certainly preferable to the programmer keeping the TODO list in his head, or even in his log book. The information is where it needs to be, often down to individual lines of code. My only quibble is with the note-to-self attitude. When we check in code, we release it to a wider audience; we are publishing our work. These notes-to-self should really be notes-to-everyone.

In an ideal world, then, there will be a little more documentation explaining the porting task in more detail. Pair-programming, peer review and open source development can help us maintain a disciplined approach.

Latent Defects

Suppose we're digging around some legacy code and we discover the following bizarre integer literal in an assignment:


flags &= 0xFFFFFFF7;

What to do? Evidently bit three of the unsigned int flags variable is being cleared, but so are bits thirty-two and above. Is this the intended behaviour?

Now, the code in question is regarded as sound. It works. Target platforms have always had thirty-two bit ints, and will continue to do so for the forseeable future.

We could make a note in the code and move on:


flags &= 0xFFFFFFF7; // FIXME this assumes 32 bit ints

If we're feeling less confident, the note might read:


flags &= 0xFFFFFFF7; // TODO doesn't this assume 32 bit ints !?

Or, if we're feeling more confident, we make the fix directly:


flags &= ~(1U << 3);

Personally, I dislike adding comments as shown above. Yes, it's better than ignoring the problem, but only marginally. By adding a comment we're ducking the issue. On the other hand it's also risky to modify legacy code even when we think we're fixing it. Who knows what compensating code there might be elsewhere in the system?

A more responsible reaction is to dig a little deeper and find exactly how pervasive the problem is. The assumption that ints occupy thirty-two bits may cut across many lines in many files—and yet, as stated earlier, this assumption may actually be reasonable, at least for the forseeable future. Not all code needs to be portable.

Even if we are prepared to continue with this assumption about target platforms, the important thing is not to throw away our insight. With barely any extra effort we can replace the TODO with a compile-time assertion:


BOOST_STATIC_ASSERT(sizeof(int) * CHAR_BIT == 32);

With this assertion in place, the code won't even compile if we try and port to a platform which doesn't meet this assumption.

On the other hand, if digging deeper reveals more latent defects, and if we already have reason to believe the legacy code is in poor shape, then more radical action may be needed (see Sidebar, Legacy Code).

Grand Designs

At the other end of the scale are the TODOs which claim a glorious future for a sound—if straightforward—block of code.


// TODO use an adaptive search algorithm:
// 1) keep a log of past commands
// 2) when the system is idle, review this log to predict the next command
// 3) pre-fetch the results of the predicted command.
// This should make the UI more responsive.

Let's hope this ingenious scheme isn't attempted until careful analysis shows that it really is necessary and calibrated test runs prove that it really can improve response times.

Until then, this particular TODO is simply an incitement to over-engineering.

Future Optimisations

Closely related to Grand Designs are those points in a code-base where a progammer uses TODO to indicate where a sub-optimal solution has been used. Maybe a hand-crafted container might offer client code quicker lookups than the one which was picked, for convenience, from the C++ Standard Library.


typedef std::map<person, phone_number> phone_book; 
// TODO replace with a hash map for efficiency

This TODO looks reasonable enough, but again I think it may lead us astray. Why should we consider making this replacement? Why should the suggestion be repeated every time we look at this code?

If we do find our code runs too slowly we need to measure first. The hand-rolled hash map might make no perceptible difference; but simplifying the arithmetic in that innocent looking loop the profiler warned us about might realise a 50% speed-up with far less effort.

The Hideous Hack

Consider the following scenario. Testing reveals that a peculiar -- but reproducible—combination of events can lead to deadlock. A SHOWSTOPPER defect is raised and assigned to some unfortunate programmer. The programmer must delay her current task to investigate. After a couple of frustrating days of poring over log files she narrows the problem down. It looks very like a race condition in a particular function.

To test her suspicions, she injects a ten millisecond delay into one of the calling threads. The defect goes away!

Armed with this evidence, she consults the programmers involved. The original author of the function has nothing helpful to say—clearly it's the client code (which does the synchronisation) which is at fault. Equally unhelpfully, the authors of the client code suggest the proper solution is to move responsibility for synchronisation into the function itself.

All this blame-storming is holding up development. The project manager makes the call: check in the ten millisecond delay, change the defect to LOW priority, get on with new features.

I don't think it will come as a surprise to find out that the code still reads as follows:


Sleep(10); // HACK. Workaround a race condition.
           // See DEFECT 5678 for details.

Nor should it surprise us to learn that the defect hasn't really gone away, it has just gone under cover. The software still deadlocks. Less often, less reproducibly, but just as disastrously.

This article is about the use of TODO, not dysfunctional development teams. A proper solution will involve the organisation as much as the code-base, and will have to remain beyond the scope of this article.

Clearly, though, something very wrong has happened. I'm not claiming that TODO—or, in this case, its unsavoury relative, HACK—is to blame. What I do suggest is that this use of HACK opens the door (breaks the window perhaps?) to similar abuse in future. When we introduce such code into our system we sanction the approach it takes, inviting more of the same.

Sure enough, as features continue to be added, we find more and more Sleep()s, HACKs and TODOs attempting to disguise a broken threading model.

Copyright © 2004-2006 Thomas Guest

PrevUpHomeNext