Problem

A Flawed Versioning Recipe
Issues
Keyword Expansion
What gets tested anyway?

A Flawed Versioning Recipe

First, let's consider a sensible but flawed recipe for creating a versioned software release. To make the discussion easier to follow, let's suppose it's version 2.0 we want to release. Let's also suppose that a file named VERSION is the sole point of version information for the software ^[1].

At some suitable point, branch the software to isolate the release from noise on the main development trunk. Edit VERSION so its contents read 2.0.0 RELEASE BRANCH and check this change in.
Create a build from the release branch.
Test this build.
If the tests pass, edit VERSION so its contents read 2.0, check the change in, and go on to the next step. If the tests haven't passed, we need to fix all the bugs we've found and return to step 2 (or even, in extreme circumstances, to step 1).
Tag the release branch. Checkout this tagged version of the code and create our final release build from it.

Issues

There are several problems with this procedure.

I haven't explicitly stated how the builds are being created and how they're being tested, but reading between the lines suggests that it's a little ad hoc. It could well be that one of the developers generates the builds from a personal working copy, runs a few sanity checks, then throws the code across to the test team for a more thorough thrashing.

Such an approach exposes us to an unacceptable level of human error. Instead, we need a machine to ensure that our builds are clean and reproducible. Before worrying about how we version the software, we must ensure we have a build server to automatically generate builds for us, and to run as many tests as a machine can on these builds, collating and publishing the results. Inevitably, there will still be a need for manual testing; but this build server should become the single source of builds for the manual testers.

Even when this build server is in place and doing its job, the procedure described suffers two major problems.

There is still too much manual intervention: somebody has to remember to edit the VERSION file, and that somebody had better get it right. Typically, as release dates close in, the pressure increases, and editing fingers become less steady.
The final tagged build isn't the build which actually passed all the manual tests. We rebuilt it! What's worse, we changed the code before rebuilding — we changed the VERSION file, and we checked the code out of the repository in a different way.

Keyword Expansion

We might turn to our version control system to help us overcome the manual intervention problem. Rather than edit a file by hand, shouldn't the version control system provide the version number directly? This is a seductive argument but I'm going to suggest it's wrong. Before explaining why I think it's wrong, let's show how you can indeed derive a version number directly from the metadata stored in the code repository.

To provide a concrete example, assume we're using Subversion for version control. A typical Subversion repository layout would be:

`-- trunk
  |   |-- file1
  |   `-- file2
  |-- branches
  |   |-- 1.0
  |   |   |-- file1
  |   |   `-- file2
  |   `-- 2.0
  |       |-- file1
  |       `-- file2
  `-- tags
      |-- 1.0
      |   |-- file1
      |   `-- file2
      `-- 2.0
          |-- file1
          `-- file2

The trunk area is where most development goes on. When we want to branch the code before making a release, we copy the trunk into the branches area; and when we finally freeze the release, we tag it by copying it into the tags area. To check out release 2.0 of the software we'd issue the command:

svn checkout svn://svnserver/tags/2.0

As you can see, the repository URL embeds the desired version string, 2.0. If we want to get the VERSION file to reflect the URL it was checked out from, we must enable keyword expansion and set its contents to read:

$URL: $

When we update this file on the trunk, the magic $URL: $ keyword expands to read something like:

$URL: svn://svnserver/trunk/VERSION $

When we copy this file to our 2.0 branch and update, we'll see:

$URL: svn://svnserver/branches/2.0/VERSION $

and in the tagged release area we get:

$URL: svn://svnserver/tags/2.0/VERSION $

With some simple text parsing we can extract this information. Here's a minimal Python program which parses the repository URL it came from in order to display version information.


def version():
    " Return the software version. "
    import re
    match = re.search("svn://svnserver/tags/([^/]*)", "$URL$")
    return match.group(1) if match else "Development"

print "Version:", version()

If this program has been checked out from a base URL svn://svnserver/tags/2.0, running it yields the output:

Version: 2.0

Running it checked out from the trunk, we'll see:

Version: Development

Note, incidentally, that the CVS keyword designed for this purpose is "$Name$" — this keyword won't even expand unless we checked out a tagged version of the code.

A Misuse of Keyword Expansion

Look at what's happening here: we tag the software to ensure we can recover exactly what went into a build; but by enabling keyword expansion, the code we check out differs depending on the repository URL we use to access it. By tangling the software with version control meta-data we're changing the very thing we want to stabilise.

What gets tested anyway?

It may appear that some judicious use of keyword expansion will help us automate the software version generation, but as we can see, it does so at the expense of amplifying the second problem — which I argue is the more serious.

Let's revisit this second problem. We've created a chicken and egg situation: we don't want to award the software its final version number until we've tested it; but the version number is part of the software, and we can't test the final version of the software until we've set its version number. Which comes first?

We may convince ourselves that we're making a fuss over nothing important. How big a change is it to change the software version and nothing else? A few text strings, perhaps; the contents of a dialog box. Maybe it has an effect on the license sub-system. Oh, and the documentation too. Surely nothing much can go wrong with these simple changes and a quick set of sanity checks should confirm they have been correctly applied? If we're really worried, we could always re-run the full set of tests.

These arguments don't convince me. When we get close to a release, impatience and carelessness can set in. It would be foolish to think the testers wouldn't baulk at repeating the full set of system tests for no good reason. And it would be equally foolish to assume the version change has had no side-effects.

^[1]in other words, any part of the system which needs the version number must derive it, either at build- or run- time, from this single file. This typically includes the user interface, the documentation, the licensing system. By enforcing a single point of version information, we at least ensure consistency.