The Problem

Binary Files
Acceptance Tests
Recovery

Binary Files

The problem we had was with binary files which had (wrongly) been checked into CVS as text files. On import, by default, cvs2svn does a couple of things to text files which can seriously damage binary files:

keyword-expansion is enabled ^[3]—meaning that byte sequences which match patterns such as "$Id: $" get changed when you check the file out.
the end-of-line style property is set to native, meaning again that the binary file you check out may not be the one you checked in, since Subversion makes sure end-of-line sequences are the ones preferred by your client platform.

We'd messed up but fortunately we'd messed up in an immediately obvious way: a number of binaries were broken, to the point that they wouldn't even execute.

This is one of those mistakes you only make once (until you make it the next time and kick yourself even harder, that is). I guess we were lulled into a false sense of security: everything seemed to be working so smoothly ... Subversion is better than CVS at handling binary files ... everything had been working fine with CVS, so our CVS repository must be fine ... cvs2svn would spot any problems.

Of course, our CVS repository wasn't fine. We'd got away with binary files marked as text for the simple reason that most of these files had been used on Linux only.

Acceptance Tests

What makes this mistake so chastening is the fact that a basic acceptance test of the new repository would have been both simple and scriptable:

#!/bin/sh
cvs co CVSARCHIVE  fromcvs             # Checkout from CVS, on the trunk
svn co SVNREPOS/trunk fromsvn          # Checkout from SVN, on the trunk
diff -q -r fromcvs fromsvn > all_diffs # Spot the difference

If the all_diffs file is empty, the CVS and Subversion checkouts are byte-for-byte compatible.

Unfortunately the all_diffs file wasn't empty. Remember those keyword expansions? Subversion is clever enough to replace CVS version numbers with its own revision numbers and as a result the files differ when checked out. Keyword expansion really is a bad idea! Similarly, a number of text files were different because Subversion had tidied up inconsistent line endings.

So, there were plenty of false hits as well as a list of files we needed to run cvsadmin -kb on.

Incidentally, we could have chosen to clean up the files during import by passing some more parameters to cvs2svn: a suitable combination of --mime-types=FILE, --eol-from-mime-type and --no-default-eol options would have done the job. We decided, though, that the proper solution was to fix the root cause of the problem.

Recovery

So, we had to delay by a day to reinstate CVS, run the text-to-binary corrections, re-run the migration, perform acceptance tests. This time we were more cautious and we also tested builds made from the clean Subversion checkout.

^[3]Strictly speaking, cvs2svn sets svn:keywords on CVS files to author id date if the mode of the RCS file in question is either kv, kvl or not kb.