Readable Code

2006-08-06, , , , Comments

In an earlier post I described how I got started with Ruby not by studying the language, but by reading then adapting some existing code. Of course I was lucky in that the code I started from was good. (At least I’m pretty sure it was: it came from a trusted source, it had unit tests, looked clean — I think I can recognise good code even without knowing the language it’s been written in.) This approach of learning how to program by reading code is far from radical but is perhaps better suited to some languages than others.

Learning to Program by Reading

The suggestion that we can learn how to program by reading code is far from original. In his essay How to Become a Hacker Eric Raymond says:

Learning to program is like learning to write good natural language. The best way to do it is to read some stuff written by masters of the form, write some things yourself, read a lot more, write a little more, read a lot more, write some more … and repeat until your writing begins to develop the kind of strength and economy you see in your models.

In Teach Yourself Programming in Ten Years Peter Norvig recommends:

Talk to other programmers; read other programs. This is more important than any book or training course.

We must also remember learning never stops — meaning that we should always be reading good code.

Finding Good Code

Where, then, do we find good code to read? Maybe you’re lucky enough to work with some excellent programmers — I guess many of us put in more time reading code written by colleagues than by anyone else, since that’s what we’re paid to do. Aside from that, you’re probably looking at code you found somewhere on the internet.

Of course, the code will have to be open source (meaning, in this case, that you have access to source code, not compiled binaries) and, if you wish to adapt it, suitably licensed.

Dynamic Languages

One thing I like about the dynamic languages (Python, Ruby, Perl, etc.) is their open nature. It may be possible to scramble a Python program so it can’t be read but I don’t know how to do it — and it’s certainly not part of the language tradition.

Another thing I like is the tradition of, and indeed support for, unit testing in these languages. Some form of reflection makes unit testing much easier. As does the ability to dynamically execute code. Unit tests actually make code easy to read: if you want to know how to use a library, look at its unit tests. Python’s doctest presses this point home by blurring the boundaries between code, tests and documentation.

So, if, for example, you want to learn how to program using Python, the Python standard library is a great starting point. You’ll find it in your Python installation. It’s the code you actually run when you use Python, it’s of excellent quality, and of course there are comprehensive unit tests.

Finally, dynamic languages are terse, so there’s less code to read. Have a look, for example, at Peter Norvig’s Sudoku solver — or even my own!

Not So Dynamic Languages

To be fair, Java also has a fine tradition of openness. It’s far from my favourite language but you don’t have to look to hard to find superb Java source code published by the likes of Sun and Apache.

You can also find good C code without trouble. C has been around long enough that:

  1. the language is stable, and
  2. we know how to use it

C is often used as a portability layer for open source projects. Good starting points to find good, readable C code would be GNU, the Linux kernel, the C-Python implementation.

Readable C++

Good C++ is rather harder to find — or at least C++ which is both good and readable. Part of the reason for this is that there’s no single way to write good C++. A C++ program which looked OK ten years ago probably looks dated now (_”That’s not exception safe!”_, “Why ever didn’t they use the STL?”, “Surely we need a bit of template metaprograming here?”). If the code hasn’t been actively maintained, it probably doesn’t even compile: even though the standard is mature, different implementations interpret it in different ways — and their interpretations are subject to change.

You can probably examine much of your standard library implementation — much of it is templated code delivered in header files — but some of the platform specific ifs and buts may make it hard to read. This stuff is heavily optimised, and, when optimisation and readability are in opposition, as they often are, your standard library implementation is likely to prefer the former.

Boost is packed with superb, peer-reviewed, tested, open-source C++ code; but I wouldn’t describe it as an easy read: certainly, it’s not for beginners.

And Finally

I’m going to return to this subject. For now, I’ll close with a favourite quotation, taken from the preface to the Wizard Book.

Programs should be written for people to read, and only incidentally for machines to execute.

Happy reading!

Feedback