Keyword Substitution - Just say No!

2006-08-02 • Subversion, CVS, cvs2svn • Comments

Subversion supports keyword substitution — and has to, really, since it claims to be “a compelling replacement for CVS”.

Now, by default, when you import a text file into CVS, keyword substitution is enabled. And by default, when you use cvs2svn to convert your CVS repository into a Subversion repository these particular CVS settings are honoured, and keyword substitution properties are enabled. Out of the box, however, Subversion does not apply keyword substitution to text files. Yes, you can configure it to automatically turn keyword substitution on for certain file types, but by default, the file you check out is the same as the file you checked in.

Let’s repeat that: the file you check out is the same as the file you checked in. Surely this is the behaviour we really want from a version control system?

So, while Subversion does maintain backwards compatability with CVS, moving forwards, we should leave keyword substitution disabled. If you are migrating from CVS to Subversion, take the opportunity to use cvs2svn’s --keywords-off switch.

What Keyword Substitution Does

If a file has keyword substitution enabled, then tags within that file of the form $Id: $ will be expanded when the file is checked in. That is to say, they end up reading something like:

$Id: calc.c 148 2002-07-28 21:30:43Z sally $

which is interpreted to mean that the file calc.c was last changed in revision 148 on the evening of July 28, 2002 by the user sally.

Why Keyword Substitution is a Bad Idea

Clearly, the substituted text duplicates information which properly belongs within the source control system. By enabling it, we introduce unnecessary differences in file comparisons.
Keyword substitution modifies the file during check in, leading to a subsequent re-rebuild. Not good if lots of files depend on the file you’ve modified.
When performing a diff/patch between source trees, it often happens that a patch will result in dozens of conflicts over the $Id:$ field alone, because the patch tool is not intelligent enough to ignore it.

Feedback

Ben Collins-Sussman 2006-11-24

I agree that keywords are a bad idea most of the time, but your last two points aren't actually true, with respect to how subversion works. Keyword substitution doesn't modify the file on checkin, it modifies only the working file upon checkout. In other words, the file stored in the repository is 'contracted' (no keywords), something like $Id:$. Only when you checkout the file to a working copy, does the *working* verson of the file expand to $Id: blah blah$. When you run 'svn diff', the working file is contracted and compared against the contracted text-base; the same contraction happens when you 'svn commit'. So it turns out that you don't see spurious diffs either in the commit diffs or when comparing two trees against each other. I still hate keywords. :-)
Thomas Guest 2006-11-24

Thanks for the correction Ben. For point 2), I guess I should have said that the modification -- and hence the rebuild -- happens when you (or others) update their working copies.

Point 3) does hold if you're using the basic Unix diff and patch tools rather than svn diff and svn merge. So the message is: prefer to use svn diff and svn merge!
David P Thomas 2007-02-10

In general I agree... but I have come across some legitimate usage models for keyword expansion (if ever they could exist ;)

This is an actual case from a development shop with ~1,000 developers. They develop core software that is delivered to customers as source. Every so often, they release a new version of the source code to the same customers. The customers then take these "vendor drops" and customize them internally. Of course, they use their own VCS tool to manage their changes. Over time, the customers will hack/slash/handmerge multiple vendor releases into a "working copy." Not ideal and not recommended but the fact remains... the original vendor deals with frequent support calls and is unable to get an answer to "what version are you using?"

Their solution is to embed keyword substituted variables into each source code file... so regardless of the source-jumble that results on a customer site, the customer can reference $Id$, $Log$, $Author$, etc and the vendor can use that to trace back to the original version of code.

I don't like using the built-in VCS mechanism for keyword expansion per your aforementioned reasons. Rather, I prefer to do a post-release, pre-deployment keyword filter on source code being packaged. This can optionally be version controlled locally as well, but no need to have the keywords embedded in the code from the beginning.
Thomas Guest 2007-02-14

David -- thanks for the note. What you describe is on a scale way beyond my experience. Here are some more thoughts spawned by your comment.
Joel Smith 2007-02-27

Although you make some valid points against keyword substitution, there are occasions when it is extremely useful. We use SVN for controlling changes to config files for the servers we maintain. Using the Id field allows us to track the changes that have been made on the files and the versions/admin who made the changes.

We could use RCS, on each server, but use SVN for a more centralised approach to manage the ~100 servers.

Most of the files we are using are config text files or simple shell scripts. We could manually add the history log, but using SVN and the keyword substitution gets around this, and allows us to see at once on a server which version we are dealing with, even if there is no direct SVN access.
Thomas Guest 2007-02-28

Thanks for the feedback Joel -- though I'm not quite sure I understand it! Are you saying that your servers are all running independently, and that each runs its own flavour of software? How many SVN repositories do you have and where are they located?

If you need to know who changed what when, then the Id field is only the tip of the iceberg of information available in the version control system (admittedly, the tip is often the interesting bit), so surely what you really need is SVN access?

In any case, I take your point: keyword substitution is helping you out.

Word Aligned

tales from the code face