Code Rot

2009-09-03, Comments

Those of us who have to tiptoe around non-standard or ancient compilers will know that template template parameters are off limits.

Hubert Matthews (PDF)

Dvbcodec Fail

Long ago, way back in 2004, I wrote an article for Overload describing how to use the Boost Spirit parser framework to generate C++ code which could convert structured binary data to text. I went on to republish this article on my own website, where I also included a source distribution.

Much has changed since then. The C++ language may not have, but compiler and platform support for it has improved considerably. Boost survives — indeed, many of its libraries will feed into the next version of C++. Overload thrives, adapting to an age when printed magazines about programming are all but extinct. My old website proved less durable: I’ve changed domain name and shuffled things around more than once. But you can still find the article online if you look hard enough, and recently someone did indeed find it. He, let’s call him Rick, downloaded the source code archive, dvbcodec-1.0.zip, extracted it, scanned the README, typed:

$ make

… and discovered the code didn’t even build.

At this point many of us would assume (correctly) the code had not been maintained. We’d delete it and write off the few minutes it took to evaluate it. Rick decided instead to contact me and let me know my code was broken. He even offered a fix for one problem.

Code Rot

Sad to say, I wasn’t entirely surprised. I no longer use this code. Unused code stops working. It decays.

I’m not talking about a compiled executable, which the compiler has tied to a particular platform, and which therefore progressively degrades as the platform advances. (I’ve heard stories about device drivers for which the source code has long been lost, and which require ever more elaborate emulation shims to keep them alive.) I’m talking about source code. And the decay isn’t usually literal, though I suppose you might have a source listing on a mouldy printout, or an unreadable floppy disk.

No, the code itself is usually a pristine copy of the original. Publishers often attach checksums to source distributions so readers can verify their download is correct. I hadn’t taken this precaution with my dvbcodec-1.0.zip but I’m certain the version Rick downloaded was exactly the same as the one I created 5 years ago. Yet in that time it had stopped working. Why?

Standard C++

As already mentioned, this was C++ code. C++ is backed by an ISO standard, ratified in 1998, with corrigenda published in 2003. You might expect C++ code to improve with age, compiling and running more quickly, less likely to run out of resources.

Not so. My favourite counter-example comes from a nice paper “CheckedInt: A policy-based range-checked integer” (PDF) published by Hubert Matthews in 2004 which discusses how to use C++ templates to implement a range-checked integer. The paper includes a source code listing together with some notes to help readers forced to “tiptoe around non-standard or ancient compilers” (think: MSVC6). Yet when I experimented with this code in 2005 I found myself tripped up by a strict and up-to-date compiler.

$ g++ -Wall -c checked_int.cpp
checked_int.cpp: In constructor `CheckedInt<low, high, ValueChecker>::CheckedInt(int)':
checked_int.cpp:45: error: there are no arguments to `RangeCheck' that
depend on a template parameter, so a declaration of `RangeCheck' must
be available
checked_int.cpp:45: error: (if you use `-fpermissive', G++ will accept
your code, but allowing the use of an undeclared name is deprecated)

I emailed Hubert Matthews using the address included at the top of his paper. He swiftly and kindly put me straight on how to fix the problem.

What’s interesting here is that this code is pure C++, just over a page of it. It has no dependencies on third party libraries. Hubert Matthews is a C++ expert and he acknowledges the help of two more experts, Andrei Alexandrescu and Kevlin Henney, in his paper. Yet the code fails to build using both ancient and modern compilers. In its published form it has the briefest of shelf-lives.

Support Rot

Code alone is of limited use. What really matters for its ongoing health is that someone cares about it — someone exercises, maintains and supports it. Hubert Matthews included an email address in his paper and I was able to contact him using that address.

How well would my code shape up on this front? Putting myself in Rick’s position, I unzipped the source distribution I’d archived 5 years ago. I was pleased to find a README which, at the very top, provides a URL for updates, http://homepage.ntlworld.com/thomas.guest. I was less pleased to find this URL gave me a 404 Not Found error. Similarly, when I tried emailling the project maintainer mentioned in the README, I got a 550 Invalid recipient error: the attempted delivery to thomas.guest@ntlworld.com had failed permanently.

NTL World 404

Cool URIs don’t change but my old NTL homepage was anything but cool; it came for free with a dial-up connection I’ve happily since abandoned. Looking back, maybe I should have found a more stable location for my code. If I’d set up (e.g.) a Sourceforge project then my dvbcodec project might still be alive and supported, possibly even by a new maintainer.

How did this ever compile?

Wise hindsights wouldn’t resurrect my code. If I wanted to continue I’d have to go it alone. Here’s what the README had to say about platform requirements.

REQUIREMENTS and PLATFORMS

To build the dvbcodec you will need Version 1.31.0 of Boost, or later.

You will also need a good C++ compiler. The dvbcodec has been built and tested on the Windows operating system using: GCC 3.3.1, MSVC 7.1

A “good C++ compiler”, eh? As we’ve already seen, GCC 3.3.1 may be good but my platform has GCC 4.0.1 installed, which is better. If my records can be believed, this upperCase() function compiled cleanly using both GCC 3.3.1 and MSVC 7.1.

std::string
upperCase(std::string const & lower)
{
    std::string upper = lower;
    
    for (std::string<char>::iterator cc = upper.begin();
         cc != upper.end(); ++cc)
    {
        * cc = std::toupper(* cc);
    }
    
    return upper;
}

Huh? Std::string is a typedef for std::basic_string<char> and there’s no such thing as a std::basic_string<char><char>::iterator, which is what GCC 4.0.1 says:

stringutils.cpp:58: error: 'std::string' is not a template

The simple fix is to write std::string::iterator instead of std::string<char>::iterator. A better fix, suggested by Rick, is to use std::transform(). I wonder why I missed this first time round?

std::string
upperCase(std::string const & lower)
{
    std::string upper = lower;
    std::transform(upper.begin(), upper.end(), upper.begin(), ::toupper);
    return upper;
}

Boost advances

GCC has become stricter about what it accepts even though the formal specification of what it should do (the C++ standard) has stayed put. The Boost C++ libraries have more freedom to evolve, and the next round of build problems I encountered relate to Boost.Spirit’s evolution. Whilst it would be possible to require dvbcodec users to build against Boost 1.31 (which can still be downloaded from the Boost website) it wouldn’t be reasonable. So I updated my machine (using Macports) to make sure I had an up to date version of Boost, 1.38 at the time of writing.

$ sudo port upgrade boost

Boost’s various dependencies triggered an upgrade of boost-jam, gperf, libiconv, ncursesw, ncurses, gettext, zlib, bzip2, and this single command took over an hour to complete.

I discovered that Boost.Spirit, the C++ parser framework on which dvbcodec is based, has gone through an overhaul. According to the change log the flavour of Spirit used by dvbcodec is now known respectfully as Spirit Classic. A clever use of namespaces and include path forwarding meant my “classic” client code would at least compile, at the expense of some deprecation warnings.

Computing dependencies for decodeout.cpp...
Compiling decodeout.cpp...
In file included from codectypedefs.hpp:11,
                 from decodecontext.hpp:10,
                 from decodeout.cpp:8:
/opt/local/include/boost/spirit/tree/ast.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/classic_ast.hpp"
In file included from codectypedefs.hpp:12,
                 from decodecontext.hpp:10,
                 from decodeout.cpp:8:

To suppress these warnings I included the preferred header. I then had to change namespace directives from boost::spirit to boost::spirit::classic. I fleetingly considered porting my code to Spirit V2, but decided against it: for even after this first round of changes, I still had a build problem.

Changing behaviour

Actually, this was a second level build problem. The dvbcodec build has multiple phases:

  1. it builds a program to generate code. This generator can parse binary format syntax descriptions and emit C++ code which will convert data formatted according to these descriptions
  2. it runs this generator with the available syntax descriptions as inputs
  3. it compiles the emitted C++ code into a final dvbcodec executable
Dvbcodec build process

I ran into a problem during the second phase of this process. The dvbcodec generator no longer parsed all of the supplied syntax descriptions. Specifically, I was seeing this conditional test raise an exception when trying to parse section format syntax descriptions.

    if (!parse(section_format,
               section_grammar,
               space_p).full)
    {
        throw SectionFormatParseException(section_format);
    }

Here, parse is boost::spirit::classic::parse, which parses something — the section format syntax description, passed as a string in this case — according to the supplied grammar. The third parameter, boost::spirit::classic::space_p, is a skip parser which tells parse to skip whitespace between tokens. Parse returns a parse_info struct whose full field is a boolean which will be set to true if the input section format has been fully consumed.

I soon figured out that the parse call was failing to fully consume binary syntax descriptions with trailing spaces, such as the the one shown below.

" program_association_section() {"
"    table_id                   8"
"    section_syntax_indicator   1"
"    '0'                        1"
....
"    CRC_32                    32"
" }                              "

If I stripped the trailing whitespace after the closing brace before calling parse() all would be fine. I wasn’t fine about this fix though. The Spirit documentation is very good but it had been a while since I’d read it and, as already mentioned, my code used the “classic” version of Spirit, in danger of becoming the “legacy” then “deprecated” and eventually the “dead” version. Re-reading the documentation it wasn’t clear to me exactly what the correct behaviour of parse() should be in this case. Should it fully consume trailing space? Had my program ever worked?

I went back in time, downloading and building against Boost 1.31, and satisfied myself that my code used to work, though maybe it worked due to a bug in the old version of Spirit. Stripping trailing spaces before parsing allowed my code to work with Spirit past and present, so I curtailed my investigation and made the fix.

(Interestingly, Boost 1.31 found a way to warn me I was using a compiler it didn’t know about.

boost_1_31_0/boost/config/compiler/gcc.hpp:92:7: warning: 
#warning "Unknown compiler version - please run the configure tests and report the results"

I ignored this warning.)

Code inaction

Apologies for the lengthy explanation in the previous section. The point is, few software projects stand alone, and changes in any dependencies, including bug fixes, can have knock on effects. In this instance, I consider myself lucky; dvbcodec’s unusual three phase build enabled me to catch a runtime error before generating the final product. Of course, to actually catch that error, I needed to at least try building my code.

More simply: if you don’t use your code, it rots.

Rotten artefacts

It wasn’t just the code which had gone off. My source distribution included documentation — the plain text version of the article I’d written for Overload — and the Makefile had a build target to generate an HTML version of this documentation. This target depended on Quickbook, another Boost tool. Quickbook generates Docbook XML from plain text source, and Docbook is a good starting point for HTML, PDF and other standard output formats.

This is quite a sophisticated toolchain. It’s also one I no longer use. Most of what I write goes straight to the web and I don’t need such a fiddly process just to produce HTML. So I decided to freshen up dead links, leave the original documentation as a record, and simply cut the documentation target from the Makefile.

Stopping the rot

As we’ve seen, software, like other soft organic things, breaks down over time. How can we stop the rot?

Freezing software to a particular executable built against a fixed set of dependencies to run on a single platform is one way — and maybe some of us still have an aging Windows 95 machine, kept alive purely to run some such frozen program.

A better solution is to actively tend the software and ensure it stays in shape. Exercise it regularly on a build server. Record test results. Fix faults as and when they appear. Review the architecture. Upgrade the platform and dependencies. Prune unused features, splice in new ones. This is the path taken by the Boost project, though certainly the growth far outpaces any pruning (the Boost 1.39 download is 5 times bigger than its 1.31 ancestor). Boost takes forwards and backwards compatibility seriously, hence the ongoing support for Spirit classic and the compiler version certification headers. Maintaining compatibility can be at odds with simplicity.

There is another way too. Although the dvbcodec project has collapsed into disrepair the idea behind it certainly hasn’t. I’ve taken this same idea — of parsing formal syntax descriptions to generate code which handles binary formatted data — and enhanced it to work more flexibly and with a wider range of inputs. Whenever I come across a new binary data structure, I paste its syntax into a text file, regenerate the code, and I can work with this structure. Unfortunately I can’t show you any code (it’s proprietary) but I hope I’ve shown you the idea. Effectively, the old C++ code has been left to rot but the idea within it remains green, recoded in Python. Maybe I should find a way to humanely destroy the C++ and all links to it, but for now I’ll let it degrade, an illustration of its time.

Is it possible that software is not like anything else, that it is meant to be discarded: that the whole point is to see it as a soap bubble? — Alan J. Perlis

Thanks

I would like to thank to Rick Engelbrecht for reporting and helping to fix the bugs discussed in this article.

This article first appeared in Overload 92, and I would like to thank the team at Overload for their expert help.